diff --git a/INSTALL b/INSTALL index 26484435d..d1a8870a8 100644 --- a/INSTALL +++ b/INSTALL @@ -1,646 +1,663 @@ CDS Invenio v0.99.1 INSTALL =========================== About ===== This document specifies how to build, customize, and install CDS Invenio v0.99.1 for the first time. See RELEASE-NOTES if you are upgrading from a previous CDS Invenio release. Contents ======== 0. Prerequisites 1. Quick instructions for the impatient CDS Invenio admin 2. Detailed instructions for the patient CDS Invenio admin 0. Prerequisites ================ Here is the software you need to have around before you start installing CDS Invenio: a) Unix-like operating system. The main development and production platforms for CDS Invenio at CERN are GNU/Linux distributions Debian, Gentoo, Scientific Linux (aka RHEL), Ubuntu, but we also develop on Mac OS X. Basically any Unix system supporting the software listed below should do. If you are using Debian GNU/Linux ``Lenny'' or later, then you can install most of the below-mentioned prerequisites and recommendations by running: $ sudo aptitude install python-dev apache2-mpm-prefork \ mysql-server mysql-client python-mysqldb \ python-4suite-xml python-simplejson python-xml \ python-libxml2 python-libxslt1 gnuplot poppler-utils \ - gs-common antiword catdoc wv html2text ppthtml xlhtml \ - clisp gettext libapache2-mod-wsgi unzip python-numpy \ - python-rdflib python-gnuplot python-magic pdftk \ - html2text giflib-tools pstotext + gs-common clisp gettext libapache2-mod-wsgi unzip python-rdflib \ + python-gnuplot python-magic pdftk html2text giflib-tools \ + pstotext netpbm You may also want to install some of the following packages, if you have them available on your concrete architecture: $ sudo aptitude install rxp python-psyco sbcl cmucl \ pylint pychecker pyflakes python-profiler python-epydoc \ - libapache2-mod-xsendfile + libapache2-mod-xsendfile openoffice.org Moreover, you should install some Message Transfer Agent (MTA) such as Postfix so that CDS Invenio can email notification alerts or registration information to the end users, contact moderators and reviewers of submitted documents, inform administrators about various runtime system information, etc: $ sudo aptitude install postfix After running the above-quoted aptitude command(s), you can proceed to configuring your MySQL server instance (max_allowed_packet in my.cnf, see item 0b below) and then to installing the CDS Invenio software package in the section 1 below. If you are using another operating system, then please continue reading the rest of this prerequisites section, and please consult our wiki pages for any concrete hints for your specific operating system. b) MySQL server (may be on a remote machine), and MySQL client (must be available locally too). MySQL versions 4.1 or 5.0 are supported. Please set the variable "max_allowed_packet" in your "my.cnf" init file to at least 4M. You may perhaps also want to run your MySQL server natively in UTF-8 mode by setting "default-character-set=utf8" in various parts of your "my.cnf" file, such as in the "[mysql]" part and elsewhere; but this is not really required. c) Apache 2 server, with support for loading DSO modules, and optionally with SSL support for HTTPS-secure user authentication, and mod_xsendfile for off-loading file downloads away from Invenio processes to Apache. d) Python v2.4 or above: as well as the following Python modules: - (mandatory) MySQLdb (version >= 1.2.1_p2; see below) - (recommended) PyXML, for XML processing: - (recommended) PyRXP, for very fast XML MARC processing: - (recommended) libxml2-python, for XML/XLST processing: - (recommended) simplejson, for AJAX apps: Note that if you are using Python-2.6, you don't need to install simplejson, because the module is already included in the main Python distribution. - (recommended) Gnuplot.Py, for producing graphs: - (recommended) Snowball Stemmer, for stemming: - (recommended) py-editdist, for record merging: - (recommended) numpy, for citerank methods: - (recommended) magic, for full-text file handling: - (optional) 4suite, slower alternative to PyRXP and libxml2-python: - (optional) feedparser, for web journal creation: - (optional) Psyco, if you are running on a 32-bit OS: - (optional) RDFLib, to use RDF ontologies and thesauri: - (optional) mechanize, to run regression web test suite: Note: MySQLdb version 1.2.1_p2 or higher is recommended. If you are using an older version of MySQLdb, you may get into problems with character encoding. e) mod_wsgi Apache module. Note: for the time being, the WSGI daemon must be run with threads=1, because Invenio is not fully thread safe yet. This will come later. The Apache configuration example snippets (created below) will use threads=1. Note: if you are using Python 2.4 or earlier, then you should also install the wsgiref Python module, available from: (As of Python 2.5 this module is included in standard Python distribution.) f) If you want to be able to extract references from PDF fulltext files, then you need to install pdftotext version 3 at least. g) If you want to be able to search for words in the fulltext files (i.e. to have fulltext indexing) or to stamp submitted files, then you need as well to install some of the following tools: + - for Microsoft Office/OpenOffice.org document conversion: + OpenOffice.org + - for PDF file stamping: pdftk, pdf2ps - for PDF files: pdftotext or pstotext - for PostScript files: pstotext or ps2ascii - - for MS Word files: antiword, catdoc, or wvText - - - - - for MS PowerPoint files: pptHtml and html2text - - - - for MS Excel files: xlhtml and html2text - - + - for DjVu creation, elaboration: DjVuLibre + + - to perform OCR: OCRopus (tested only with release 0.3.1) + + - to perform different image elaborations: ImageMagick + + - to generate PDF after OCR: ReportLab + + - to analyze images to generate PDF after OCR: netpbm + h) If you have chosen to install fast XML MARC Python processors in the step d) above, then you have to install the parsers themselves: - (optional) 4suite: i) (recommended) Gnuplot, the command-line driven interactive plotting program. It is used to display download and citation history graphs on the Detailed record pages on the web interface. Note that Gnuplot must be compiled with PNG output support, that is, with the GD library. Note also that Gnuplot is not required, only recommended. j) (recommended) A Common Lisp implementation, such as CLISP, SBCL or CMUCL. It is used for the web server log analysing tool and the metadata checking program. Note that any of the three implementations CLISP, SBCL, or CMUCL will do. CMUCL produces fastest machine code, but it does not support UTF-8 yet. Pick up CLISP if you don't know what to do. Note that a Common Lisp implementation is not required, only recommended. k) GNU gettext, a set of tools that makes it possible to translate the application in multiple languages. This is available by default on many systems. Note that the configure script checks whether you have all the prerequisite software installed and that it won't let you continue unless everything is in order. It also warns you if it cannot find some optional but recommended software. 1. Quick instructions for the impatient CDS Invenio admin ========================================================= 1a. Installation ---------------- $ cd $HOME/src/ $ wget http://cdsware.cern.ch/download/cds-invenio-0.99.1.tar.gz $ wget http://cdsware.cern.ch/download/cds-invenio-0.99.1.tar.gz.md5 $ wget http://cdsware.cern.ch/download/cds-invenio-0.99.1.tar.gz.sig $ md5sum -v -c cds-invenio-0.99.1.tar.gz.md5 $ gpg --verify cds-invenio-0.99.1.tar.gz.sig cds-invenio-0.99.1.tar.gz $ tar xvfz cds-invenio-0.99.1.tar.gz $ cd cds-invenio-0.99.1 $ ./configure $ make $ make install $ make install-jsmath-plugin ## optional $ make install-jquery-plugins ## optional $ make install-fckeditor-plugin ## optional + $ make install-pdfa-helper-files ## optional 1b. Configuration ----------------- $ sudo chown -R www-data.www-data /opt/cds-invenio $ sudo -u www-data emacs /opt/cds-invenio/etc/invenio-local.conf $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --update-all $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --create-tables $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --load-webstat-conf $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --create-apache-conf $ sudo /etc/init.d/apache2 restart $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --create-demo-site $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --load-demo-records $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --run-unit-tests $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --run-regression-tests $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --run-web-tests $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --remove-demo-records $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --drop-demo-site $ firefox http://your.site.com/help/admin/howto-run 2. Detailed instructions for the patient CDS Invenio admin ========================================================== 2a. Installation ---------------- The CDS Invenio uses standard GNU autoconf method to build and install its files. This means that you proceed as follows: $ cd $HOME/src/ Change to a directory where we will build the CDS Invenio sources. (The built files will be installed into different "target" directories later.) $ wget http://cdsware.cern.ch/download/cds-invenio-0.99.1.tar.gz $ wget http://cdsware.cern.ch/download/cds-invenio-0.99.1.tar.gz.md5 $ wget http://cdsware.cern.ch/download/cds-invenio-0.99.1.tar.gz.sig Fetch CDS Invenio source tarball from the CDS Software Consortium distribution server, together with MD5 checksum and GnuPG cryptographic signature files useful for verifying the integrity of the tarball. $ md5sum -v -c cds-invenio-0.99.1.tar.gz.md5 Verify MD5 checksum. $ gpg --verify cds-invenio-0.99.1.tar.gz.sig cds-invenio-0.99.1.tar.gz Verify GnuPG cryptographic signature. Note that you may first have to import my public key into your keyring, if you haven't done that already: $ gpg --keyserver wwwkeys.eu.pgp.net --recv-keys 0xBA5A2B67 The output of the gpg --verify command should then read: Good signature from "Tibor Simko " You can safely ignore any trusted signature certification warning that may follow after the signature has been successfully verified. $ tar xvfz cds-invenio-0.99.1.tar.gz Untar the distribution tarball. $ cd cds-invenio-0.99.1 Go to the source directory. $ ./configure Configure CDS Invenio software for building on this specific platform. You can use the following optional parameters: --prefix=/opt/cds-invenio Optionally, specify the CDS Invenio general installation directory (default is /opt/cds-invenio). It will contain command-line binaries and program libraries containing the core CDS Invenio functionality, but also store web pages, runtime log and cache information, document data files, etc. Several subdirs like `bin', `etc', `lib', or `var' will be created inside the prefix directory to this effect. Note that the prefix directory should be chosen outside of the Apache htdocs tree, since only one its subdirectory (prefix/var/www) is to be accessible directly via the Web (see below). Note that CDS Invenio won't install to any other directory but to the prefix mentioned in this configuration line. --with-python=/opt/python/bin/python2.4 Optionally, specify a path to some specific Python binary. This is useful if you have more than one Python installation on your system. If you don't set this option, then the first Python that will be found in your PATH will be chosen for running CDS Invenio. --with-mysql=/opt/mysql/bin/mysql Optionally, specify a path to some specific MySQL client binary. This is useful if you have more than one MySQL installation on your system. If you don't set this option, then the first MySQL client executable that will be found in your PATH will be chosen for running CDS Invenio. --with-clisp=/opt/clisp/bin/clisp Optionally, specify a path to CLISP executable. This is useful if you have more than one CLISP installation on your system. If you don't set this option, then the first executable that will be found in your PATH will be chosen for running CDS Invenio. --with-cmucl=/opt/cmucl/bin/lisp Optionally, specify a path to CMUCL executable. This is useful if you have more than one CMUCL installation on your system. If you don't set this option, then the first executable that will be found in your PATH will be chosen for running CDS Invenio. --with-sbcl=/opt/sbcl/bin/sbcl Optionally, specify a path to SBCL executable. This is useful if you have more than one SBCL installation on your system. If you don't set this option, then the first executable that will be found in your PATH will be chosen for running CDS Invenio. + --with-openoffice-python + + Optionally, specify the path to the Python interpreter + embedded with OpenOffice.org. This is normally not + contained in the normal path. If you don't specify this + it won't be possible to use OpenOffice.org to convert from and + to Microsoft Office and OpenOffice.org documents. + This configuration step is mandatory. Usually, you do this step only once. (Note that if you are building CDS Invenio not from a released tarball, but from the Git sources, then you have to generate the configure file via autotools: $ sudo aptitude install automake1.9 autoconf $ aclocal-1.9 $ automake-1.9 -a $ autoconf after which you proceed with the usual configure command.) $ make Launch the CDS Invenio build. Since many messages are printed during the build process, you may want to run it in a fast-scrolling terminal such as rxvt or in a detached screen session. During this step all the pages and scripts will be pre-created and customized based on the config you have edited in the previous step. Note that on systems such as FreeBSD or Mac OS X you have to use GNU make ("gmake") instead of "make". $ make install Install the web pages, scripts, utilities and everything needed for CDS Invenio runtime into respective installation directories, as specified earlier by the configure command. Note that if you are installing CDS Invenio for the first time, you will be asked to create symbolic link(s) from Python's site-packages system-wide directory(ies) to the installation location. This is in order to instruct Python where to find CDS Invenio's Python files. You will be hinted as to the exact command to use based on the parameters you have used in the configure command. $ make install-jsmath-plugin ## optional This will automatically download and install in the proper place jsMath, a Javascript library to render LaTeX formulas in the client browser. Note that in order to enable the rendering you will have to set the variable CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS in invenio-local.conf to a suitable list of output format codes. For example: CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS = hd,hb $ make install-jquery-plugins ## optional This will automatically download and install in the proper place jQuery and related plugins. They are used for AJAX applications such as the record editor. Note that `unzip' is needed when installing jquery plugins. $ make install-fckeditor-plugin ## optional This will automatically download and install in the proper place FCKeditor, a WYSIWYG Javascript-based editor (e.g. for the WebComment module). Note that in order to enable the editor you have to set the CFG_WEBCOMMENT_USE_FCKEDITOR to True. + $ make install-pdfa-helper-files ## optional + + This will automatically download and install in the proper + place the helper files needed to create PDF/A files out of + existing PDF files. + 2b. Configuration ----------------- Once the basic software installation is done, we proceed to configuring your Invenio system. $ sudo chown -R www-data.www-data /opt/cds-invenio For the sake of simplicity, let us assume that your CDS Invenio installation will run under the `www-data' user process identity. The above command changes ownership of installed files to www-data, so that we shall run everything under this user identity from now on. For production purposes, you would typically enable Apache server to read all files from the installation place but to write only to the `var' subdirectory of your installation place. You could achieve this by configuring Unix directory group permissions, for example. $ sudo -u www-data emacs /opt/cds-invenio/etc/invenio-local.conf Customize your CDS Invenio installation. Please read the 'invenio.conf' file located in the same directory that contains the vanilla default configuration parameters of your CDS Invenio installation. If you want to customize some of these parameters, you should create a file named 'invenio-local.conf' in the same directory where 'invenio.conf' lives and write there only the customizations that you want to be different from the vanilla defaults. Here is a minimalist example what you would put there: $ cat /opt/cds-invenio/etc/invenio-local.conf [Invenio] CFG_SITE_URL = http://your.site.com CFG_SITE_SECURE_URL = https://your.site.com CFG_SITE_ADMIN_EMAIL = john.doe@your.site.com CFG_SITE_SUPPORT_EMAIL = john.doe@your.site.com You should override at least the parameters from the top of invenio.conf file in order to define some very essential runtime parameters such as the visible URL of your document server (look for CFG_SITE_URL and CFG_SITE_SECURE_URL), the database credentials (look for CFG_DATABASE_*), the name of your document server (look for CFG_SITE_NAME and CFG_SITE_NAME_INTL_*), or the email address of the local CDS Invenio administrator (look for CFG_SITE_SUPPORT_EMAIL and CFG_SITE_ADMIN_EMAIL). The CDS Invenio system will then read both the default invenio.conf file and your customized invenio-local.conf file and it will override any default options with the ones you have specifield in your local file. This cascading of configuration parameters will ease your future upgrades. $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --update-all Make the rest of the Invenio system aware of your invenio-local.conf changes. This step is mandatory each time you edit your conf files. $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --create-tables If you are installing CDS Invenio for the first time, you have to create database tables. Note that this step checks for potential problems such as the database connection rights and may ask you to perform some more administrative steps in case it detects a problem. Notably, it may ask you to set up database access permissions, based on your configure values. If you are installing CDS Invenio for the first time, you have to create a dedicated database on your MySQL server that the CDS Invenio can use for its purposes. Please contact your MySQL administrator and ask him to execute the commands this step proposes you. At this point you should now have successfully completed the "make install" process. We continue by setting up the Apache web server. $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --load-webstat-conf Load the configuration file of webstat module. It will create the tables in the database for register customevents, such as basket hits. $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --create-apache-conf Running this command will generate Apache virtual host configurations matching your installation. You will be instructed to check created files (usually they are located under /opt/cds-invenio/etc/apache/) and edit your httpd.conf to activate Invenio virtual hosts. If you are using Debian GNU/Linux ``Lenny'' or later, then you can do the following to create your SSL certificate and to activate your Invenio vhosts: ## make SSL certificate: $ sudo aptitude install ssl-cert $ sudo mkdir /etc/apache2/ssl $ sudo /usr/sbin/make-ssl-cert /usr/share/ssl-cert/ssleay.cnf \ /etc/apache2/ssl/apache.pem ## add Invenio web sites: $ sudo ln -s /opt/cds-invenio/etc/apache/invenio-apache-vhost.conf \ /etc/apache2/sites-available/invenio $ sudo ln -s /opt/cds-invenio/etc/apache/invenio-apache-vhost-ssl.conf \ /etc/apache2/sites-available/invenio-ssl ## disable Debian's default web site: $ sudo /usr/sbin/a2dissite default ## enable Invenio web sites: $ sudo /usr/sbin/a2ensite invenio $ sudo /usr/sbin/a2ensite invenio-ssl ## enable SSL module: $ sudo /usr/sbin/a2enmod ssl ## if you are using xsendfile module, enable it too: $ sudo /usr/sbin/a2enmod xsendfile If you are using another operating system, you should do the equivalent, for example edit your system-wide httpd.conf and put the following include statements: Include /opt/cds-invenio/etc/apache/invenio-apache-vhost.conf Include /opt/cds-invenio/etc/apache/invenio-apache-vhost-ssl.conf Note that you may need to adapt generated vhost file snippets to match your concrete operating system specifics. Note also that you may want to tweak the generated example configurations, especially with respect to the WSGIDaemonProcess parameters. E.g. increase the `processes' parameter if you have lots of RAM and many concurrent users accessing your site in parallel. $ sudo /etc/init.d/apache2 restart Please ask your webserver administrator to restart the Apache server after the above "httpd.conf" changes. $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --create-demo-site This step is recommended to test your local CDS Invenio installation. It should give you our "Atlantis Institute of Science" demo installation, exactly as you see it at . $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --load-demo-records Optionally, load some demo records to be able to test indexing and searching of your local CDS Invenio demo installation. $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --run-unit-tests Optionally, you can run the unit test suite to verify the unit behaviour of your local CDS Invenio installation. Note that this command should be run only after you have installed the whole system via `make install'. $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --run-regression-tests Optionally, you can run the full regression test suite to verify the functional behaviour of your local CDS Invenio installation. Note that this command requires to have created the demo site and loaded the demo records. Note also that running the regression test suite may alter the database content with junk data, so that rebuilding the demo site is strongly recommended afterwards. $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --run-web-tests Optionally, you can run additional automated web tests running in a real browser. This requires to have Firefox with the Selenium IDE extension installed. $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --remove-demo-records Optionally, remove the demo records loaded in the previous step, but keeping otherwise the demo collection, submission, format, and other configurations that you may reuse and modify for your own production purposes. $ sudo -u www-data /opt/cds-invenio/bin/inveniocfg --drop-demo-site Optionally, drop also all the demo configuration so that you'll end up with a completely blank CDS Invenio system. However, you may want to find it more practical not to drop the demo site configuration but to start customizing from there. $ firefox http://your.site.com/help/admin/howto-run In order to start using your CDS Invenio installation, you can start indexing, formatting and other daemons as indicated in the "HOWTO Run" guide on the above URL. You can also use the Admin Area web interfaces to perform further runtime configurations such as the definition of data collections, document types, document formats, word indexes, etc. Good luck, and thanks for choosing CDS Invenio. - CDS Development Group diff --git a/Makefile.am b/Makefile.am index bc01a3afc..8cec18305 100644 --- a/Makefile.am +++ b/Makefile.am @@ -1,409 +1,425 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. confignicedir = $(sysconfdir)/build confignice_SCRIPTS=config.nice SUBDIRS = po config modules EXTRA_DIST = UNINSTALL THANKS RELEASE-NOTES configure-tests.py config.nice.in # current jsMath version and packages JSMV = 3.6b JSMFV = 1.3 JSMATH = jsMath-$(JSMV).zip JSMATHFONTS = jsMath-fonts-$(JSMFV).zip # current FCKeditor version FCKV = 2.6.6 FCKEDITOR = FCKeditor_$(FCKV).zip all: @echo "****************************************************" @echo "** CDS Invenio has been successfully built. **" @echo "** **" @echo "** You may proceed to 'make install' now. **" @echo "****************************************************" check-custom-templates: $(PYTHON) $(top_srcdir)/modules/webstyle/lib/template.py --check-custom-templates $(top_srcdir) kwalitee-check: @$(PYTHON) $(top_srcdir)/modules/miscutil/lib/kwalitee.py --stats $(top_srcdir) kwalitee-check-errors-only: @find $(top_srcdir) -name '*.py' -exec pylint -e {} \; 2> /dev/null kwalitee-check-variables: @find $(top_srcdir) -name '*.py' -exec pylint --reports=n --enable-checker=variables {} \; 2> /dev/null kwalitee-check-indentation: @find $(top_srcdir) -name '*.py' -exec pylint --reports=n --enable-checker=format {} 2> /dev/null \; | grep -E '(^\*|indentation)' kwalitee-check-sql-queries: @echo "* Listing potentially dangerous SQL queries:" @echo "** SQL SELECT queries without explicit column list:" @find $(top_srcdir) -name '*.py' -exec grep -HEin 'SELECT \* FROM' {} \; 2> /dev/null @echo "** SQL INSERT queries without explicit column list:" @find $(top_srcdir) -name '*.py' -exec grep -HEin 'INSERT INTO ([[:alnum:]]|_)+[[:space:]]*VALUES' {} \; 2> /dev/null @find $(top_srcdir) -name '*.py' -exec grep -HEin 'INSERT INTO ([[:alnum:]]|_)+[[:space:]]*$$' {} \; 2> /dev/null @echo "** SQL queries using charset-ignorant escape_string():" @find $(top_srcdir) -name '*.py' -exec grep -HEin 'escape_string' {} \; 2> /dev/null @echo "** SQL queries using literal '%s':" @find $(top_srcdir) -name '*.py' -exec grep -HEin "run_sql.*'%[dfis]'" {} \; 2> /dev/null @find $(top_srcdir) -name '*.py' -exec grep -HEin 'run_sql.*"%[dfis]"' {} \; 2> /dev/null @echo "** SQL queries with potentially unescaped arguments:" @find $(top_srcdir) -name '*.py' -exec grep -HEin 'run_sql.* % ' {} \; 2> /dev/null @echo "* Done." etags: \rm -f $(top_srcdir)/TAGS (cd $(top_srcdir) && find $(top_srcdir) -name "*.py" -print | xargs etags) install-data-local: for d in / /cache /log /tmp /data /run ; do \ mkdir -p $(localstatedir)$$d ; \ done @echo "************************************************************" @echo "** CDS Invenio software has been successfully installed! **" @echo "** **" @echo "** You may proceed to customizing your installation now. **" @echo "************************************************************" install-jsmath-plugin: @echo "***********************************************************" @echo "** Installing jsMath plugin, please wait... **" @echo "***********************************************************" rm -rf /tmp/invenio-jsmath-plugin mkdir /tmp/invenio-jsmath-plugin (cd /tmp/invenio-jsmath-plugin && \ wget 'http://downloads.sourceforge.net/jsmath/$(JSMATH)' && \ wget 'http://downloads.sourceforge.net/jsmath/$(JSMATHFONTS)' && \ wget 'http://www.math.union.edu/~dpvc/jsMath/download/extra-fonts/msam10/msam10.zip' && \ wget 'http://www.math.union.edu/~dpvc/jsMath/download/extra-fonts/msbm10/msbm10.zip' && \ unzip -u -d ${prefix}/var/www $(JSMATH) && \ unzip -u -d ${prefix}/var/www $(JSMATHFONTS) && \ unzip -u -d ${prefix}/var/www/jsMath/fonts msam10.zip && \ unzip -u -d ${prefix}/var/www/jsMath/fonts msbm10.zip) rm -fr /tmp/invenio-jsmath-plugin @echo "* Installing Invenio-specific jsMath config..." (cd $(top_srcdir)/modules/webstyle/etc && make install) @echo "***********************************************************" @echo "** The jsMath plugin was successfully installed. **" @echo "** Please do not forget to properly set the option **" @echo "** CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS in invenio.conf. **" @echo "***********************************************************" uninstall-jsmath-plugin: @rm -rvf ${prefix}/var/www/jsMath @echo "***********************************************************" @echo "** The jsMath plugin was successfully uninstalled. **" @echo "***********************************************************" install-jscalendar-plugin: @echo "***********************************************************" @echo "** Installing jsCalendar plugin, please wait... **" @echo "***********************************************************" rm -rf /tmp/invenio-jscalendar-plugin mkdir /tmp/invenio-jscalendar-plugin (cd /tmp/invenio-jscalendar-plugin && \ wget 'http://www.dynarch.com/static/jscalendar-1.0.zip' && \ unzip -u jscalendar-1.0.zip && \ mkdir -p ${prefix}/var/www/jsCalendar && \ cp jscalendar-1.0/img.gif ${prefix}/var/www/jsCalendar/jsCalendar.gif && \ cp jscalendar-1.0/calendar.js ${prefix}/var/www/jsCalendar/ && \ cp jscalendar-1.0/calendar-setup.js ${prefix}/var/www/jsCalendar/ && \ cp jscalendar-1.0/lang/calendar-en.js ${prefix}/var/www/jsCalendar/ && \ cp jscalendar-1.0/calendar-blue.css ${prefix}/var/www/jsCalendar/) rm -fr /tmp/invenio-jscalendar-plugin @echo "***********************************************************" @echo "** The jsCalendar plugin was successfully installed. **" @echo "***********************************************************" uninstall-jscalendar-plugin: @rm -rvf ${prefix}/var/www/jsCalendar @echo "***********************************************************" @echo "** The jsCalendar plugin was successfully uninstalled. **" @echo "***********************************************************" install-jquery-plugins: uninstall-jquery-plugins @echo "***********************************************************" @echo "** Installing various jQuery plugins, please wait... **" @echo "***********************************************************" mkdir -p ${prefix}/var/www/js (cd ${prefix}/var/www/js && \ wget http://jqueryjs.googlecode.com/files/jquery-1.3.1.min.js && \ mv jquery-1.3.1.min.js jquery.min.js && \ wget http://jquery-ui.googlecode.com/svn/tags/latest/ui/minified/jquery.effects.core.min.js && \ wget http://jquery-ui.googlecode.com/svn/tags/latest/ui/minified/jquery.effects.highlight.min.js && \ wget http://www.appelsiini.net/download/jquery.jeditable.mini.js && \ wget http://plugins.jquery.com/files/jquery.autogrow-1.2.2.zip && \ wget http://tablesorter.com/jquery.tablesorter.zip && \ unzip jquery.tablesorter.zip && \ rm jquery.tablesorter.zip && \ unzip jquery.autogrow-1.2.2.zip jquery.autogrow.js && \ rm jquery.autogrow-1.2.2.zip && \ wget -O json2.js.tmp http://json.org/json2.js && \ grep -v 'alert.*IMPORTANT: Remove this line' json2.js.tmp > json2.js && \ rm json2.js.tmp && \ wget http://jquery-ui.googlecode.com/svn/tags/1.7.2/ui/minified/ui.datepicker.min.js && \ wget -O jquery.hotkeys.min.js http://js-hotkeys.googlecode.com/files/jquery.hotkeys-0.7.8-packed.js && \ wget http://plugins.jquery.com/files/jquery.treeview_3.zip && \ unzip jquery.treeview_3.zip && \ rm jquery.treeview_3.zip && \ wget http://plugins.jquery.com/files/jquery.ajaxPager.js_5.txt && \ mv jquery.ajaxPager.js_5.txt jquery.ajaxPager.js && \ wget http://jqueryui.com/download/jquery-ui-1.7.2.custom.zip && \ unzip jquery-ui-1.7.2.custom.zip development-bundle/ui/ui.core.js && \ mv development-bundle/ui/ui.core.js ui.core.js && \ rm -rf development-bundle) mkdir -p ${prefix}/var/www/img && \ (cd ${prefix}/var/www/img && \ wget http://jquery-ui.googlecode.com/svn/tags/1.7.2/themes/redmond/jquery-ui.css && \ wget http://jquery-ui.googlecode.com/svn/tags/1.7.2/demos/images/calendar.gif && \ wget -r -np -nH --cut-dirs=5 -A "png" http://jquery-ui.googlecode.com/svn/tags/1.7.2/themes/redmond/images/) @echo "***********************************************************" @echo "** The jQuery plugins were successfully installed. **" @echo "***********************************************************" uninstall-jquery-plugins: (cd ${prefix}/var/www/js && \ rm -f jquery.min.js && \ rm -f jquery.effects.core.min.js && \ rm -f jquery.effects.highlight.min.js && \ rm -f jquery.jeditable.mini.js && \ rm -f jquery.tablesorter.js && \ rm -f jquery.tablesorter.pager.js && \ rm -f ui.datepicker.min.js && \ rm -f jquery.autogrow.js && \ rm -f json2.js && \ rm -rf tablesorter && \ rm -f jquery.hotkeys.min.js && \ rm -rf jquery-treeview && \ rm -f jquery.ajaxPager.js && \ rm -f ui.core.js) @echo "***********************************************************" @echo "** The jquery plugins were successfully uninstalled. **" @echo "***********************************************************" install-fckeditor-plugin: @echo "***********************************************************" @echo "** Installing FCKeditor plugin, please wait... **" @echo "***********************************************************" rm -rf ${prefix}/lib/python/invenio/fckeditor/ rm -rf /tmp/invenio-fckeditor-plugin mkdir /tmp/invenio-fckeditor-plugin (cd /tmp/invenio-fckeditor-plugin && \ wget 'http://downloads.sourceforge.net/fckeditor/$(FCKEDITOR)' && \ unzip -u -d ${prefix}/var/www $(FCKEDITOR)) && \ mkdir -p ${prefix}/lib/python/invenio/fckeditor/editor/filemanager/connectors/py && \ mv -f ${prefix}/var/www/fckeditor/fckeditor.py ${prefix}/lib/python/invenio/fckeditor/ && \ mv -f ${prefix}/var/www/fckeditor/editor/filemanager/connectors/py/*.py ${prefix}/lib/python/invenio/fckeditor/editor/filemanager/connectors/py/ && \ rm -f ${prefix}/var/www/fckeditor/editor/filemanager/connectors/py/upload.py && \ rm -f ${prefix}/var/www/fckeditor/editor/filemanager/connectors/py/zope.py && \ find ${prefix}/lib/python/invenio/fckeditor -type d -exec touch {}/__init__.py \; && \ find ${prefix}/var/www/fckeditor/ -depth -name '_*' -exec rm -rf {} \; && \ rm -r ${prefix}/var/www/fckeditor/editor/filemanager/connectors && \ find ${prefix}/var/www/fckeditor/fckeditor* -maxdepth 0 -exec rm -r {} \; && \ rm -fr /tmp/invenio-fckeditor-plugin @echo "* Installing Invenio-specific FCKeditor config..." (cd $(top_srcdir)/modules/webstyle/etc && make install) @echo "***********************************************************" @echo "** The FCKeditor plugin was successfully installed. **" @echo "** Please do not forget to properly set the option **" @echo "** CFG_WEBCOMMENT_USE_RICH_TEXT_EDITOR in invenio.conf. **" @echo "***********************************************************" uninstall-fckeditor-plugin: @rm -rvf ${prefix}/var/www/fckeditor @rm -rvf ${prefix}/lib/python/invenio/fckeditor @echo "***********************************************************" @echo "** The FCKeditor plugin was successfully uninstalled. **" @echo "***********************************************************" +install-pdfa-helper-files: + @echo "***********************************************************" + @echo "** Installing PDF/A helper files, please wait... **" + @echo "***********************************************************" + wget 'http://cdsware.cern.ch/download/invenio-demo-site-files/ISOCoatedsb.icc' -O ${prefix}/etc/websubmit/file_converter_templates/ISOCoatedsb.icc + @echo "***********************************************************" + @echo "** The PDF/A helper files were successfully installed. **" + @echo "***********************************************************" + +uninstall-pdfa-helper-files: + rm -f ${prefix}/etc/websubmit/file_converter_templates/ISOCoatedsb.icc + @echo "***********************************************************" + @echo "** The PDF/A helper files were successfully uninstalled. **" + @echo "***********************************************************" + update-v0.3.0-tables update-v0.3.1-tables: echo "ALTER TABLE idxINDEXNAME CHANGE id_idxINDEX id_idxINDEX mediumint(9) unsigned NOT NULL FIRST;" | ${prefix}/bin/dbexec echo "ALTER TABLE rnkMETHODNAME CHANGE id_rnkMETHOD id_rnkMETHOD mediumint(9) unsigned NOT NULL FIRST;" | ${prefix}/bin/dbexec echo "ALTER TABLE collectionname CHANGE id_collection id_collection mediumint(9) unsigned NOT NULL FIRST;" | ${prefix}/bin/dbexec echo "ALTER TABLE formatname CHANGE id_format id_format mediumint(9) unsigned NOT NULL FIRST;" | ${prefix}/bin/dbexec echo "ALTER TABLE fieldname CHANGE id_field id_field mediumint(9) unsigned NOT NULL FIRST;" | ${prefix}/bin/dbexec echo "INSERT INTO accACTION (id,name,description,allowedkeywords,optional) VALUES (NULL,'runbibrank','run BibRank','','no');" | ${prefix}/bin/dbexec echo "INSERT INTO accACTION (id,name,description,allowedkeywords,optional) VALUES (NULL,'cfgbibrank','configure BibRank','','no');" | ${prefix}/bin/dbexec update-v0.3.2-tables: echo "ALTER TABLE sbmCOLLECTION_sbmDOCTYPE CHANGE id_son id_son char(10) NOT NULL default '0';" | ${prefix}/bin/dbexec update-v0.3.3-tables: ${prefix}/bin/dbexec < $(top_srcdir)/modules/miscutil/sql/tabcreate.sql echo "ALTER TABLE flxLINKTYPEPARAMS CHANGE pname pname varchar(78) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE rnkMETHOD DROP star_category_ranges;" | ${prefix}/bin/dbexec echo "DROP TABLE rnkSET;" | ${prefix}/bin/dbexec echo "ALTER TABLE schTASK CHANGE arguments arguments LONGTEXT;" | ${prefix}/bin/dbexec echo "ALTER TABLE schTASK CHANGE status status varchar(50);" | ${prefix}/bin/dbexec update-v0.5.0-tables: ${prefix}/bin/dbexec < $(top_srcdir)/modules/miscutil/sql/tabcreate.sql echo "ALTER TABLE session ADD INDEX uid (uid);" | ${prefix}/bin/dbexec echo "UPDATE idxINDEXNAME SET ln='cs' WHERE ln='cz';" | ${prefix}/bin/dbexec echo "UPDATE rnkMETHODNAME SET ln='cs' WHERE ln='cz';" | ${prefix}/bin/dbexec echo "UPDATE collectionname SET ln='cs' WHERE ln='cz';" | ${prefix}/bin/dbexec echo "UPDATE collection_portalbox SET ln='cs' WHERE ln='cz';" | ${prefix}/bin/dbexec echo "UPDATE formatname SET ln='cs' WHERE ln='cz';" | ${prefix}/bin/dbexec echo "UPDATE fieldname SET ln='cs' WHERE ln='cz';" | ${prefix}/bin/dbexec echo "UPDATE idxINDEXNAME SET ln='sv' WHERE ln='se';" | ${prefix}/bin/dbexec echo "UPDATE rnkMETHODNAME SET ln='sv' WHERE ln='se';" | ${prefix}/bin/dbexec echo "UPDATE collectionname SET ln='sv' WHERE ln='se';" | ${prefix}/bin/dbexec echo "UPDATE collection_portalbox SET ln='sv' WHERE ln='se';" | ${prefix}/bin/dbexec echo "UPDATE formatname SET ln='sv' WHERE ln='se';" | ${prefix}/bin/dbexec echo "UPDATE fieldname SET ln='sv' WHERE ln='se';" | ${prefix}/bin/dbexec update-v0.7.1-tables: echo "DROP TABLE oaiHARVEST;" | ${prefix}/bin/dbexec ${prefix}/bin/dbexec < $(top_srcdir)/modules/miscutil/sql/tabcreate.sql echo "INSERT INTO accACTION (id,name,description,allowedkeywords,optional) VALUES (NULL,'cfgbibharvest','configure BibHarvest','','no');" | ${prefix}/bin/dbexec echo "INSERT INTO accACTION (id,name,description,allowedkeywords,optional) VALUES (NULL,'runoaiharvest','run BibHarvest oaiharvest','','no');" | ${prefix}/bin/dbexec echo "INSERT INTO accACTION (id,name,description,allowedkeywords,optional) VALUES (NULL,'cfgwebcomment','configure WebComment','','no');" | ${prefix}/bin/dbexec echo "INSERT INTO accACTION (id,name,description,allowedkeywords,optional) VALUES (NULL,'runoaiarchive','run BibHarvest oaiarchive','','no');" | ${prefix}/bin/dbexec echo "INSERT INTO accACTION (id,name,description,allowedkeywords,optional) VALUES (NULL,'runbibedit','run BibEdit','','no');" | ${prefix}/bin/dbexec echo "ALTER TABLE user ADD nickname varchar(255) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE user ADD last_login datetime NOT NULL default '0000-00-00 00:00:00';" | ${prefix}/bin/dbexec echo "ALTER TABLE user ADD INDEX nickname (nickname);" | ${prefix}/bin/dbexec echo "ALTER TABLE sbmFIELD CHANGE subname subname varchar(13) default NULL;" | ${prefix}/bin/dbexec echo "ALTER TABLE user_query_basket CHANGE alert_name alert_name varchar(30) NOT NULL default '';" | ${prefix}/bin/dbexec echo "TRUNCATE TABLE session;" | ${prefix}/bin/dbexec @echo "**********************************************************" @echo "** Do not forget to run the basket migration now: **" @echo "** @PYTHON@ modules/webbasket/lib/webbasket_migration_kit.py " @echo "** Please see the RELEASE-NOTES for details. **" @echo "**********************************************************" @echo "INSERT INTO oaiARCHIVE (id, setName, setSpec, setDescription, setDefinition, setRecList) SELECT id, setName, setSpec, CONCAT_WS('', setDescription), setDefinition, setRecList FROM oaiSET;" update-v0.90.0-tables: ${prefix}/bin/dbexec < $(top_srcdir)/modules/miscutil/sql/tabcreate.sql echo "ALTER TABLE format ADD COLUMN (description varchar(255) default '');" | ${prefix}/bin/dbexec echo "ALTER TABLE format ADD COLUMN (content_type varchar(255) default '');" | ${prefix}/bin/dbexec update-v0.90.1-tables: ${prefix}/bin/dbexec < $(top_srcdir)/modules/miscutil/sql/tabcreate.sql echo "ALTER TABLE schTASK ADD INDEX status (status);" | ${prefix}/bin/dbexec echo "ALTER TABLE schTASK ADD INDEX runtime (runtime);" | ${prefix}/bin/dbexec echo "ALTER TABLE sbmCATEGORIES ADD COLUMN score TINYINT UNSIGNED NOT NULL DEFAULT 0;" | ${prefix}/bin/dbexec echo "ALTER TABLE sbmCATEGORIES ADD PRIMARY KEY (doctype, sname);" | ${prefix}/bin/dbexec echo "ALTER TABLE sbmCATEGORIES ADD KEY doctype (doctype);" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiHARVEST ADD COLUMN setspecs TEXT NOT NULL DEFAULT '';" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiARCHIVE CHANGE setDescription setDescription text NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiARCHIVE CHANGE p1 p1 text NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiARCHIVE CHANGE f1 f1 text NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiARCHIVE CHANGE m1 m1 text NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiARCHIVE CHANGE p2 p2 text NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiARCHIVE CHANGE f2 f2 text NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiARCHIVE CHANGE m2 m2 text NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiARCHIVE CHANGE p3 p3 text NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiARCHIVE CHANGE f3 f3 text NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiARCHIVE CHANGE m3 m3 text NOT NULL default '';" | ${prefix}/bin/dbexec echo "UPDATE bibdoc SET status=0 WHERE status='';" | ${prefix}/bin/dbexec echo "UPDATE bibdoc SET status=1 WHERE status='deleted';" | ${prefix}/bin/dbexec echo "ALTER TABLE fmtKNOWLEDGEBASES add COLUMN kbtype char default NULL;" | ${prefix}/bin/dbexec update-v0.92.0-tables: echo "UPDATE bibdoc SET status=0 WHERE status='';" | ${prefix}/bin/dbexec echo "UPDATE bibdoc SET status=1 WHERE status='deleted';" | ${prefix}/bin/dbexec echo "ALTER TABLE schTASK CHANGE arguments arguments mediumblob;" | ${prefix}/bin/dbexec echo "UPDATE user SET note=1 WHERE nickname='admin' AND note IS NULL;" | ${prefix}/bin/dbexec echo "ALTER TABLE usergroup CHANGE name name varchar(255) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE usergroup ADD login_method varchar(255) NOT NULL default 'INTERNAL';" | ${prefix}/bin/dbexec echo "ALTER TABLE usergroup ADD UNIQUE KEY login_method_name (login_method(70), name);" | ${prefix}/bin/dbexec echo "ALTER TABLE user CHANGE settings settings blob default NULL;" | ${prefix}/bin/dbexec echo "INSERT INTO sbmALLFUNCDESCR VALUES ('Get_Recid', 'This function gets the recid for a document with a given report-number (as stored in the global variable rn).');" | ${prefix}/bin/dbexec update-v0.92.1-tables: echo "DROP TABLE rnkCITATIONDATA;" | ${prefix}/bin/dbexec ${prefix}/bin/dbexec < $(top_srcdir)/modules/miscutil/sql/tabcreate.sql echo "UPDATE bibdoc SET status='DELETED' WHERE status='1';" | ${prefix}/bin/dbexec echo "UPDATE bibdoc SET status='' WHERE status='0';" | ${prefix}/bin/dbexec echo "ALTER TABLE bibrec ADD KEY creation_date (creation_date);" | ${prefix}/bin/dbexec echo "ALTER TABLE bibrec ADD KEY modification_date (modification_date);" | ${prefix}/bin/dbexec echo "ALTER TABLE bibdoc ADD KEY creation_date (creation_date);" | ${prefix}/bin/dbexec echo "ALTER TABLE bibdoc ADD KEY modification_date (modification_date);" | ${prefix}/bin/dbexec echo "ALTER TABLE bibdoc ADD KEY docname (docname);" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiHARVEST CHANGE postprocess postprocess varchar(20) NOT NULL default 'h';" | ${prefix}/bin/dbexec echo "ALTER TABLE oaiHARVEST ADD COLUMN bibfilterprogram varchar(255) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE idxINDEXNAME CHANGE ln ln char(5) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE idxINDEX ADD COLUMN stemming_language VARCHAR(10) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE rnkMETHODNAME CHANGE ln ln char(5) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE rnkDOWNLOADS CHANGE id_bibdoc id_bibdoc mediumint(9) unsigned default NULL;" | ${prefix}/bin/dbexec echo "ALTER TABLE rnkDOWNLOADS CHANGE file_format file_format varchar(10) NULL default NULL;" | ${prefix}/bin/dbexec echo "ALTER TABLE collectionname CHANGE ln ln char(5) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE collection_portalbox CHANGE ln ln char(5) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE format ADD COLUMN visibility TINYINT NOT NULL default 1;" | ${prefix}/bin/dbexec echo "ALTER TABLE formatname CHANGE ln ln char(5) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE fieldname CHANGE ln ln char(5) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE accROLE ADD COLUMN firerole_def_ser blob NULL;" | ${prefix}/bin/dbexec echo "ALTER TABLE accROLE ADD COLUMN firerole_def_src text NULL;" | ${prefix}/bin/dbexec echo "ALTER TABLE user_accROLE ADD COLUMN expiration datetime NOT NULL default '9999-12-31 23:59:59';" | ${prefix}/bin/dbexec echo "ALTER TABLE user DROP INDEX id, ADD PRIMARY KEY id (id);" | ${prefix}/bin/dbexec echo -e 'from invenio.dbquery import run_sql;\ map(lambda index_id: run_sql("ALTER TABLE idxPHRASE%02dF CHANGE term term TEXT NULL DEFAULT NULL, DROP INDEX term, ADD INDEX term (term (50))" % index_id[0]), run_sql("select id from idxINDEX"))' | $(PYTHON) echo "INSERT INTO rnkCITATIONDATA VALUES (1,'citationdict','','');" | ${prefix}/bin/dbexec echo "INSERT INTO rnkCITATIONDATA VALUES (2,'reversedict','','');" | ${prefix}/bin/dbexec echo "INSERT INTO rnkCITATIONDATA VALUES (3,'selfcitdict','','');" | ${prefix}/bin/dbexec update-v0.99.0-tables: ${prefix}/bin/dbexec < $(top_srcdir)/modules/miscutil/sql/tabcreate.sql echo "ALTER TABLE bibdoc ADD COLUMN more_info mediumblob NULL default NULL;" | ${prefix}/bin/dbexec echo "ALTER TABLE schTASK ADD COLUMN priority tinyint(4) NOT NULL default 0;" | ${prefix}/bin/dbexec echo "ALTER TABLE schTASK ADD KEY priority (priority);" | ${prefix}/bin/dbexec echo "ALTER TABLE rnkCITATIONDATA DROP PRIMARY KEY;" | ${prefix}/bin/dbexec echo "ALTER TABLE rnkCITATIONDATA ADD PRIMARY KEY (id);" | ${prefix}/bin/dbexec echo "ALTER TABLE rnkCITATIONDATA CHANGE id id mediumint(8) unsigned NOT NULL auto_increment;" | ${prefix}/bin/dbexec echo "ALTER TABLE rnkCITATIONDATA ADD UNIQUE KEY object_name (object_name);" | ${prefix}/bin/dbexec echo "ALTER TABLE sbmPARAMETERS CHANGE value value text NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE sbmAPPROVAL ADD note text NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE hstDOCUMENT CHANGE docsize docsize bigint(15) unsigned NOT NULL;" | ${prefix}/bin/dbexec echo "ALTER TABLE cmtACTIONHISTORY CHANGE client_host client_host int(10) unsigned default NULL;" | ${prefix}/bin/dbexec update-v0.99.1-tables: echo "RENAME TABLE oaiARCHIVE TO oaiREPOSITORY;" | ${prefix}/bin/dbexec ${prefix}/bin/dbexec < $(top_srcdir)/modules/miscutil/sql/tabcreate.sql echo "INSERT INTO knwKB (id,name,description,kbtype) SELECT id,name,description,'' FROM fmtKNOWLEDGEBASES;" | ${prefix}/bin/dbexec echo "INSERT INTO knwKBRVAL (id,m_key,m_value,id_knwKB) SELECT id,m_key,m_value,id_fmtKNOWLEDGEBASES FROM fmtKNOWLEDGEBASEMAPPINGS;" | ${prefix}/bin/dbexec echo "ALTER TABLE sbmPARAMETERS CHANGE name name varchar(40) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE bibdoc CHANGE docname docname varchar(250) COLLATE utf8_bin NOT NULL default 'file';" | ${prefix}/bin/dbexec + echo "ALTER TABLE bibdoc ADD COLUMN text_extraction_date datetime NOT NULL default '0000-00-00';" | ${prefix}/bin/dbexec echo "ALTER TABLE collection DROP COLUMN restricted;" | ${prefix}/bin/dbexec echo "ALTER TABLE schTASK CHANGE host host varchar(255) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE hstTASK CHANGE host host varchar(255) NOT NULL default '';" | ${prefix}/bin/dbexec echo "ALTER TABLE bib85x DROP INDEX kv, ADD INDEX kv (value(100));" | ${prefix}/bin/dbexec echo "UPDATE clsMETHOD SET location='http://cdsware.cern.ch/download/invenio-demo-site-files/HEP.rdf' WHERE name='HEP' AND location='';" | ${prefix}/bin/dbexec echo "UPDATE clsMETHOD SET location='http://cdsware.cern.ch/download/invenio-demo-site-files/NASA-subjects.rdf' WHERE name='NASA-subjects' AND location='';" | ${prefix}/bin/dbexec echo "UPDATE accACTION SET name='runoairepository', description='run oairepositoryupdater task' WHERE name='runoaiarchive';" | ${prefix}/bin/dbexec echo "UPDATE accACTION SET name='cfgoaiharvest', description='configure OAI Harvest' WHERE name='cfgbibharvest';" | ${prefix}/bin/dbexec echo "ALTER TABLE accARGUMENT CHANGE value value varchar(255);" | ${prefix}/bin/dbexec echo "UPDATE accACTION SET allowedkeywords='doctype,act,categ' WHERE name='submit';" | ${prefix}/bin/dbexec echo "INSERT INTO accARGUMENT(keyword,value) VALUES ('categ','*');" | ${prefix}/bin/dbexec echo "INSERT INTO accROLE_accACTION_accARGUMENT(id_accROLE,id_accACTION,id_accARGUMENT,argumentlistid) SELECT DISTINCT raa.id_accROLE,raa.id_accACTION,accARGUMENT.id,raa.argumentlistid FROM accROLE_accACTION_accARGUMENT as raa JOIN accACTION on id_accACTION=accACTION.id,accARGUMENT WHERE accACTION.name='submit' and accARGUMENT.keyword='categ' and accARGUMENT.value='*';" | ${prefix}/bin/dbexec echo "UPDATE accACTION SET allowedkeywords='name,with_editor_rights' WHERE name='cfgwebjournal';" | ${prefix}/bin/dbexec echo "INSERT INTO accARGUMENT(keyword,value) VALUES ('with_editor_rights','yes');" | ${prefix}/bin/dbexec echo "INSERT INTO accROLE_accACTION_accARGUMENT(id_accROLE,id_accACTION,id_accARGUMENT,argumentlistid) SELECT DISTINCT raa.id_accROLE,raa.id_accACTION,accARGUMENT.id,raa.argumentlistid FROM accROLE_accACTION_accARGUMENT as raa JOIN accACTION on id_accACTION=accACTION.id,accARGUMENT WHERE accACTION.name='cfgwebjournal' and accARGUMENT.keyword='with_editor_rights' and accARGUMENT.value='yes';" | ${prefix}/bin/dbexec echo "ALTER TABLE bskEXTREC CHANGE id id int(15) unsigned NOT NULL auto_increment;" | ${prefix}/bin/dbexec echo "ALTER TABLE bskEXTREC ADD external_id int(15) NOT NULL default '0';" | ${prefix}/bin/dbexec echo "ALTER TABLE bskEXTREC ADD collection_id int(15) unsigned NOT NULL default '0';" | ${prefix}/bin/dbexec echo "ALTER TABLE bskEXTREC ADD original_url text;" | ${prefix}/bin/dbexec echo "ALTER TABLE cmtRECORDCOMMENT ADD status char(2) NOT NULL default 'ok';" | ${prefix}/bin/dbexec echo "ALTER TABLE cmtRECORDCOMMENT ADD KEY status (status);" | ${prefix}/bin/dbexec CLEANFILES = *~ *.pyc *.tmp diff --git a/THANKS b/THANKS index 9123df53f..aa50e2723 100644 --- a/THANKS +++ b/THANKS @@ -1,135 +1,149 @@ CDS Invenio v0.99.1 THANKS ========================== Several people outside the CDS Invenio Development Team contributed to the project: - Thierry Thomas Patches for compiling old CDSware 0.3.x sources on FreeBSD. - Guido Pelzer Contributions to the German translation. German stopword list. - Valerio Gracco Contributions to the Italian translation. - Tullio Basaglia Contributions to the Italian translation. - Flavio C. Coelho Contributions to the Portuguese translation. - Lyuba Vasilevskaya Contributions to the Russian translation. - Maria Gomez Marti Contributions to the Spanish translation. - Magaly Bascones Dominguez Contributions to the Spanish translation. - Urban Andersson Contributions to the Swedish translation. - Eric Grand Contributions to the French translation. - Theodoros Theodoropoulos Contributions to the Greek translation, Greek stopword list, XML RefWorks output format. - Vasyl Ostrovskyi Contributions to the Ukrainian translation. - Ferran Jorba Contributions to the Catalan and Spanish translations. Cleanup of the old PHP-based BibFormat Admin Guide. Several minor patches. - Beatriu Piera Translation of the Search Guide into Catalan and Spanish. - Anonymous contributor (name withheld by request) Contributions to the Japanese translation. - Anonymous contributor (name withheld by request) Contributions to the Spanish translation. - Alen Vodopijevec Contributions to the Croatian translation. - Jasna Marković Contributions to the Croatian translation. - Kam-ming Ku Contributions to the Chinese translations (zh_CN, zh_TW). - Benedikt Koeppel Contributions to the German translation. - Toru Tsuboyama Contributions to the Japanese translation. - Mike Marino Several minor patches and suggestions. - Zbigniew Szklarz Contributions to the Polish translation. - Iaroslav Gaponenko Contributions to the Russian translation. - Yana Osborne Contributions to the Russian translation. - Zbigniew Leonowicz Contributions to the Polish translation. - Makiko Matsumoto and Takao Ishigaki Contributions to the Japanese translation. - Eva Papp Contributions to the Hungarian translation. The URL handler was inspired by the Quixote Web Framework which is ``Copyright (c) 2004 Corporation for National Research Initiatives; All Rights Reserved''. The session handler was adapted from the mod_python session implementation. Javascript Quicktags scripts from Alex King are used to provide additional capabilities to the edition of BibFormat templates through the web admin interface. The indexer engine uses the Martin Porter Stemming Algorithm and its Vivake Gupta free Python implementation. The CSS style for rounded corners box used in detailed record pages adapted from Francky Lleyneman liquidcorners CSS. The NASA_Subjects.rdf files has been retrieved from the American National Aeronautics and Space Administration (NASA) who kindly provide this for free re-use. The tiger test picure used in automated demo picture submission was converted from Ghostscript's 'tiger.eps'. Some icon images were taken from (i) the Silk icon set, (ii) the Function icon set, and (iii) the activity indicator icon. +The unoconv.py script has been adapted from UNOCONV by Dag Wieers. + + +PDFA_def.ps has been adapted from the GPL distribution of GhostScript. + + +The ISOCoatedsb.icc ICC profile has been retrieved from the European Color +Initiative. + + The PEP8 conformance checking script (pep8.py) was written by Johann C. Rocholl . The pep8.py version included with CDS Invenio was downloaded from on 2009-06-14. -- end of file - \ No newline at end of file +The asyncproc module to manage asynchronous processes with timeout support +was written by Thomas Bellman . The asyncproc.py +version included with CDS Invenio was downloaded from + on 2009-07-13. +- end of file - diff --git a/config.nice.in b/config.nice.in index 3225a59e3..5d9e0e198 100644 --- a/config.nice.in +++ b/config.nice.in @@ -1,8 +1,9 @@ #!/bin/sh # Automatically created during the building process. Do not edit. ./configure --prefix=@prefix@ \ --with-python=@PYTHON@ \ --with-mysql=@MYSQL@ \ --with-clisp=@CLISP@ \ --with-cmucl=@CMUCL@ \ - --with-sbcl=@SBCL@ + --with-sbcl=@SBCL@ \ + --with-openoffice-python=@OPENOFFICE_PYTHON@ diff --git a/config/invenio-autotools.conf.in b/config/invenio-autotools.conf.in index 563d03023..37897bd71 100644 --- a/config/invenio-autotools.conf.in +++ b/config/invenio-autotools.conf.in @@ -1,76 +1,81 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## DO NOT EDIT THIS FILE. ## YOU SHOULD NOT EDIT THESE VALUES. THEY WERE AUTOMATICALLY ## CALCULATED BY AUTOTOOLS DURING THE "CONFIGURE" STAGE. [Invenio] ## Invenio version: CFG_VERSION = @VERSION@ ## directories detected from 'configure --prefix ...' parameters: CFG_PREFIX = @prefix@ CFG_BINDIR = @prefix@/bin CFG_PYLIBDIR = @prefix@/lib/python CFG_LOGDIR = @localstatedir@/log CFG_ETCDIR = @prefix@/etc CFG_LOCALEDIR = @prefix@/share/locale CFG_TMPDIR = @localstatedir@/tmp CFG_CACHEDIR = @localstatedir@/cache CFG_WEBDIR = @localstatedir@/www ## path to interesting programs: CFG_PATH_MYSQL = @MYSQL@ CFG_PATH_PHP = @PHP@ -CFG_PATH_ACROREAD = @ACROREAD@ CFG_PATH_GZIP = @GZIP@ CFG_PATH_GUNZIP = @GUNZIP@ CFG_PATH_TAR = @TAR@ -CFG_PATH_DISTILLER = @PS2PDF@ CFG_PATH_GFILE = @FILE@ CFG_PATH_CONVERT = @CONVERT@ CFG_PATH_PDFTOTEXT = @PDFTOTEXT@ CFG_PATH_PDFTK = @PDFTK@ +CFG_PATH_PDFTOPS = @PDFTOPS@ CFG_PATH_PDF2PS = @PDF2PS@ +CFG_PATH_PDFINFO = @PDFINFO@ +CFG_PATH_PDFTOPPM = @PDFTOPPM@ +CFG_PATH_PAMFILE = @PAMFILE@ +CFG_PATH_GS = @GS@ +CFG_PATH_PS2PDF = @PS2PDF@ +CFG_PATH_PDFOPT = @PDFOPT@ CFG_PATH_PSTOTEXT = @PSTOTEXT@ CFG_PATH_PSTOASCII = @PSTOASCII@ -CFG_PATH_ANTIWORD = @ANTIWORD@ -CFG_PATH_CATDOC = @CATDOC@ -CFG_PATH_WVTEXT = @WVTEXT@ -CFG_PATH_PPTHTML = @PPTHTML@ -CFG_PATH_XLHTML = @XLHTML@ -CFG_PATH_HTMLTOTEXT = @HTMLTOTEXT@ +CFG_PATH_ANY2DJVU = @ANY2DJVU@ +CFG_PATH_DJVUPS = @DJVUPS@ +CFG_PATH_DJVUTXT = @DJVUTXT@ +CFG_PATH_TIFF2PDF = @TIFF2PDF@ +CFG_PATH_OCROSCRIPT = @OCROSCRIPT@ +CFG_PATH_OPENOFFICE_PYTHON = @OPENOFFICE_PYTHON@ CFG_PATH_WGET = @WGET@ CFG_PATH_MD5SUM = @MD5SUM@ ## CFG_BIBINDEX_PATH_TO_STOPWORDS_FILE -- path to the stopwords file. You ## probably don't want to change this path, although you may want to ## change the content of that file. Note that the file is used by the ## rank engine internally, so it should be given even if stopword ## removal in the indexes is not used. CFG_BIBINDEX_PATH_TO_STOPWORDS_FILE = @prefix@/etc/bibrank/stopwords.kb ## helper style of variables for WebSubmit: CFG_WEBSUBMIT_COUNTERSDIR = @localstatedir@/data/submit/counters CFG_WEBSUBMIT_STORAGEDIR = @localstatedir@/data/submit/storage CFG_WEBSUBMIT_FILEDIR = @localstatedir@/data/files CFG_WEBSUBMIT_BIBCONVERTCONFIGDIR = @prefix@/etc/bibconvert/config ## - end of file - diff --git a/config/invenio.conf b/config/invenio.conf index 60dcb2752..cf1888419 100644 --- a/config/invenio.conf +++ b/config/invenio.conf @@ -1,1038 +1,1082 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ################################################### ## About 'invenio.conf' and 'invenio-local.conf' ## ################################################### ## The 'invenio.conf' file contains the vanilla default configuration ## parameters of a CDS Invenio installation, as coming from the ## distribution. The file should be self-explanatory. Once installed ## in its usual location (usually /opt/cds-invenio/etc), you could in ## principle go ahead and change the values according to your local ## needs. ## ## However, you can also create a file named 'invenio-local.conf' in ## the same directory where 'invenio.conf' lives and put there only ## the localizations you need to have different from the default ones. ## For example: ## ## $ cat /opt/cds-invenio/etc/invenio-local.conf ## [Invenio] ## CFG_SITE_URL = http://your.site.com ## CFG_SITE_SECURE_URL = https://your.site.com ## CFG_SITE_ADMIN_EMAIL = john.doe@your.site.com ## CFG_SITE_SUPPORT_EMAIL = john.doe@your.site.com ## ## The Invenio system will then read both the default invenio.conf ## file and your customized invenio-local.conf file and it will ## override any default options with the ones you have set in your ## local file. This cascading of configuration parameters will ease ## you future upgrades. [Invenio] ################################### ## Part 1: Essential parameters ## ################################### ## This part defines essential CDS Invenio internal parameters that ## everybody should override, like the name of the server or the email ## address of the local CDS Invenio administrator. ## CFG_DATABASE_* - specify which MySQL server to use, the name of the ## database to use, and the database access credentials. CFG_DATABASE_HOST = localhost CFG_DATABASE_PORT = 3306 CFG_DATABASE_NAME = cdsinvenio CFG_DATABASE_USER = cdsinvenio CFG_DATABASE_PASS = my123p$ss ## CFG_SITE_URL - specify URL under which your installation will be ## visible. For example, use "http://your.site.com". Do not leave ## trailing slash. CFG_SITE_URL = http://localhost ## CFG_SITE_SECURE_URL - specify secure URL under which your ## installation secure pages such as login or registration will be ## visible. For example, use "https://your.site.com". Do not leave ## trailing slash. If you don't plan on using HTTPS, then you may ## leave this empty. CFG_SITE_SECURE_URL = https://localhost ## CFG_SITE_NAME -- the visible name of your CDS Invenio installation. CFG_SITE_NAME = Atlantis Institute of Fictive Science ## CFG_SITE_NAME_INTL -- the international versions of CFG_SITE_NAME ## in various languages. (See also CFG_SITE_LANGS below.) CFG_SITE_NAME_INTL_en = Atlantis Institute of Fictive Science CFG_SITE_NAME_INTL_fr = Atlantis Institut des Sciences Fictives CFG_SITE_NAME_INTL_de = Atlantis Institut der fiktiven Wissenschaft CFG_SITE_NAME_INTL_es = Atlantis Instituto de la Ciencia Fictive CFG_SITE_NAME_INTL_ca = Institut Atlantis de Ciència Fictícia CFG_SITE_NAME_INTL_pt = Instituto Atlantis de Ciência Fictícia CFG_SITE_NAME_INTL_it = Atlantis Istituto di Scienza Fittizia CFG_SITE_NAME_INTL_ru = Атлантис Институт фиктивных Наук CFG_SITE_NAME_INTL_sk = Atlantis Inštitút Fiktívnych Vied CFG_SITE_NAME_INTL_cs = Atlantis Institut Fiktivních Věd CFG_SITE_NAME_INTL_no = Atlantis Institutt for Fiktiv Vitenskap CFG_SITE_NAME_INTL_sv = Atlantis Institut för Fiktiv Vetenskap CFG_SITE_NAME_INTL_el = Ινστιτούτο Φανταστικών Επιστημών Ατλαντίδος CFG_SITE_NAME_INTL_uk = Інститут вигаданих наук в Атлантісі CFG_SITE_NAME_INTL_ja = Fictive 科学のAtlantis の協会 CFG_SITE_NAME_INTL_pl = Instytut Fikcyjnej Nauki Atlantis CFG_SITE_NAME_INTL_bg = Институт за фиктивни науки Атлантис CFG_SITE_NAME_INTL_hr = Institut Fiktivnih Znanosti Atlantis CFG_SITE_NAME_INTL_zh_CN = 阿特兰提斯虚拟科学学院 CFG_SITE_NAME_INTL_zh_TW = 阿特蘭提斯虛擬科學學院 CFG_SITE_NAME_INTL_hu = Kitalált Tudományok Atlantiszi Intézete CFG_SITE_NAME_INTL_af = Atlantis Instituut van Fiktiewe Wetenskap CFG_SITE_NAME_INTL_gl = Instituto Atlantis de Ciencia Fictive CFG_SITE_NAME_INTL_ro = Institutul Atlantis al Ştiinţelor Fictive CFG_SITE_NAME_INTL_rw = Atlantis Ishuri Rikuru Ry'ubuhanga ## CFG_SITE_LANG -- the default language of the interface: CFG_SITE_LANG = en ## CFG_SITE_LANGS -- list of all languages the user interface should ## be available in, separated by commas. The order specified below ## will be respected on the interface pages. A good default would be ## to use the alphabetical order. Currently supported languages ## include Afrikaans, Bulgarian, Catalan, Czech, German, Greek, ## English, Spanish, French, Croatian, Hungarian, Galician, Italian, ## Japanese, Kinyarwanda, Norwegian, Polish, Portuguese, Romanian, ## Russian, Slovak, Swedish, Ukrainian, Chinese (China), Chinese ## (Taiwan), so that the eventual maximum you can currently select is ## "af,bg,ca,cs,de,el,en,es,fr,hr,gl,it,rw,hu,ja,no,pl,pt,ro,ru,sk,sv,uk,zh_CN,zh_TW". CFG_SITE_LANGS = af,bg,ca,cs,de,el,en,es,fr,hr,gl,it,rw,hu,ja,no,pl,pt,ro,ru,sk,sv,uk,zh_CN,zh_TW ## CFG_SITE_SUPPORT_EMAIL -- the email address of the support team for ## this installation: CFG_SITE_SUPPORT_EMAIL = cds.support@cern.ch ## CFG_SITE_ADMIN_EMAIL -- the email address of the 'superuser' for ## this installation. Enter your email address below and login with ## this address when using CDS Invenio administration modules. You ## will then be automatically recognized as superuser of the system. CFG_SITE_ADMIN_EMAIL = cds.support@cern.ch ## CFG_SITE_EMERGENCY_PHONE_NUMBERS -- list of mobile phone numbers to ## which an sms should be sent in case of emergency (e.g. bibsched queue ## has been stopped because of an error). ## Note that in order to use this function, if CFG_CERN_SITE is set to 0, ## the function send_sms in errorlib should be reimplemented. CFG_SITE_EMERGENCY_PHONE_NUMBERS = ## CFG_CERN_SITE -- do we want to enable CERN-specific code? ## Put "1" for "yes" and "0" for "no". CFG_CERN_SITE = 0 ## CFG_INSPIRE_SITE -- do we want to enable INSPIRE-specific code? ## Put "1" for "yes" and "0" for "no". CFG_INSPIRE_SITE = 0 ## CFG_ADS_SITE -- do we want to enable ADS-specific code? ## Put "1" for "yes" and "0" for "no". CFG_ADS_SITE = 0 ## CFG_DEVEL_SITE -- is this a development site? If it is, you might ## prefer that it doesn't do certain things. For example, you might ## not want WebSubmit to send certain emails or trigger certain ## processes on a development site. ## Put "1" for "yes" (this is a development site) or "0" for "no" ## (this isn't a development site.) CFG_DEVEL_SITE = 0 ################################ ## Part 2: Web page style ## ################################ ## The variables affecting the page style. The most important one is ## the 'template skin' you would like to use and the obfuscation mode ## for your email addresses. Please refer to the WebStyle Admin Guide ## for more explanation. The other variables are listed here mostly ## for backwards compatibility purposes only. ## CFG_WEBSTYLE_TEMPLATE_SKIN -- what template skin do you want to ## use? CFG_WEBSTYLE_TEMPLATE_SKIN = default ## CFG_WEBSTYLE_EMAIL_ADDRESSES_OBFUSCATION_MODE. How do we "protect" ## email addresses from undesired automated email harvesters? This ## setting will not affect 'support' and 'admin' emails. ## NOTE: there is no ultimate solution to protect against email ## harvesting. All have drawbacks and can more or less be ## circumvented. Choose you preferred mode ([t] means "transparent" ## for the user): ## -1: hide all emails. ## [t] 0 : no protection, email returned as is. ## foo@example.com => foo@example.com ## 1 : basic email munging: replaces @ by [at] and . by [dot] ## foo@example.com => foo [at] example [dot] com ## [t] 2 : transparent name mangling: characters are replaced by ## equivalent HTML entities. ## foo@example.com => foo@example.com ## [t] 3 : javascript insertion. Requires Javascript enabled on client ## side. ## 4 : replaces @ and . characters by gif equivalents. ## foo@example.com => foo [at] example [dot] com CFG_WEBSTYLE_EMAIL_ADDRESSES_OBFUSCATION_MODE = 2 ## CFG_WEBSTYLE_INSPECT_TEMPLATES -- Do we want to debug all template ## functions so that they would return HTML results wrapped in ## comments indicating which part of HTML page was created by which ## template function? Useful only for debugging Pythonic HTML ## template. See WebStyle Admin Guide for more information. CFG_WEBSTYLE_INSPECT_TEMPLATES = 0 ## (deprecated) CFG_WEBSTYLE_CDSPAGEBOXLEFTTOP -- eventual global HTML ## left top box: CFG_WEBSTYLE_CDSPAGEBOXLEFTTOP = ## (deprecated) CFG_WEBSTYLE_CDSPAGEBOXLEFTBOTTOM -- eventual global ## HTML left bottom box: CFG_WEBSTYLE_CDSPAGEBOXLEFTBOTTOM = ## (deprecated) CFG_WEBSTYLE_CDSPAGEBOXRIGHTTOP -- eventual global ## HTML right top box: CFG_WEBSTYLE_CDSPAGEBOXRIGHTTOP = ## (deprecated) CFG_WEBSTYLE_CDSPAGEBOXRIGHTBOTTOM -- eventual global ## HTML right bottom box: CFG_WEBSTYLE_CDSPAGEBOXRIGHTBOTTOM = ## CFG_WEBSTYLE_HTTP_STATUS_ALERT_LIST -- when certain HTTP status ## codes are raised to the WSGI handler, the corresponding exceptions ## and error messages can be sent to the system administrator for ## inspecting. This is useful to detect and correct errors. The ## variable represents a comma-separated list of HTTP statuses that ## should alert admin. Wildcards are possible. If the status is ## followed by an "r", it means that a referer is required to exist ## (useful to distinguish broken known links from URL typos when 404 ## errors are raised). CFG_WEBSTYLE_HTTP_STATUS_ALERT_LIST = 404r,400,5*,41* ################################## ## Part 3: WebSearch parameters ## ################################## ## This section contains some configuration parameters for WebSearch ## module. Please note that WebSearch is mostly configured on ## run-time via its WebSearch Admin web interface. The parameters ## below are the ones that you do not probably want to modify very ## often during the runtime. (Note that you may modify them ## afterwards too, though.) ## CFG_WEBSEARCH_SEARCH_CACHE_SIZE -- how many queries we want to ## cache in memory per one Apache httpd process? This cache is used ## mainly for "next/previous page" functionality, but it caches also ## "popular" user queries if more than one user happen to search for ## the same thing. Note that large numbers may lead to great memory ## consumption. We recommend a value not greater than 100. CFG_WEBSEARCH_SEARCH_CACHE_SIZE = 100 ## CFG_WEBSEARCH_FIELDS_CONVERT -- if you migrate from an older ## system, you may want to map field codes of your old system (such as ## 'ti') to CDS Invenio/MySQL ("title"). Use Python dictionary syntax ## for the translation table, e.g. {'wau':'author', 'wti':'title'}. ## Usually you don't want to do that, and you would use empty dict {}. CFG_WEBSEARCH_FIELDS_CONVERT = {} ## CFG_WEBSEARCH_LIGHTSEARCH_PATTERN_BOX_WIDTH -- width of the ## search pattern window in the light search interface, in ## characters. CFG_WEBSEARCH_LIGHTSEARCH_PATTERN_BOX_WIDTH = 60 CFG_WEBSEARCH_LIGHTSEARCH_PATTERN_BOX_WIDTH = 60 ## CFG_WEBSEARCH_SIMPLESEARCH_PATTERN_BOX_WIDTH -- width of the search ## pattern window in the simple search interface, in characters. CFG_WEBSEARCH_SIMPLESEARCH_PATTERN_BOX_WIDTH = 40 ## CFG_WEBSEARCH_ADVANCEDSEARCH_PATTERN_BOX_WIDTH -- width of the ## search pattern window in the advanced search interface, in ## characters. CFG_WEBSEARCH_ADVANCEDSEARCH_PATTERN_BOX_WIDTH = 30 ## CFG_WEBSEARCH_NB_RECORDS_TO_SORT -- how many records do we still ## want to sort? For higher numbers we print only a warning and won't ## perform any sorting other than default 'latest records first', as ## sorting would be very time consuming then. We recommend a value of ## not more than a couple of thousands. CFG_WEBSEARCH_NB_RECORDS_TO_SORT = 1000 ## CFG_WEBSEARCH_CALL_BIBFORMAT -- if a record is being displayed but ## it was not preformatted in the "HTML brief" format, do we want to ## call BibFormatting on the fly? Put "1" for "yes" and "0" for "no". ## Note that "1" will display the record exactly as if it were fully ## preformatted, but it may be slow due to on-the-fly processing; "0" ## will display a default format very fast, but it may not have all ## the fields as in the fully preformatted HTML brief format. Note ## also that this option is active only for old (PHP) formats; the new ## (Python) formats are called on the fly by default anyway, since ## they are much faster. When usure, please set "0" here. CFG_WEBSEARCH_CALL_BIBFORMAT = 0 ## CFG_WEBSEARCH_USE_ALEPH_SYSNOS -- do we want to make old SYSNOs ## visible rather than MySQL's record IDs? You may use this if you ## migrate from a different e-doc system, and you store your old ## system numbers into 970__a. Put "1" for "yes" and "0" for ## "no". Usually you don't want to do that, though. CFG_WEBSEARCH_USE_ALEPH_SYSNOS = 0 ## CFG_WEBSEARCH_I18N_LATEST_ADDITIONS -- Put "1" if you want the ## "Latest Additions" in the web collection pages to show ## internationalized records. Useful only if your brief BibFormat ## templates contains internationalized strings. Otherwise put "0" in ## order not to slow down the creation of latest additions by WebColl. CFG_WEBSEARCH_I18N_LATEST_ADDITIONS = 0 ## CFG_WEBSEARCH_INSTANT_BROWSE -- the number of records to display ## under 'Latest Additions' in the web collection pages. CFG_WEBSEARCH_INSTANT_BROWSE = 10 ## CFG_WEBSEARCH_INSTANT_BROWSE_RSS -- the number of records to ## display in the RSS feed. CFG_WEBSEARCH_INSTANT_BROWSE_RSS = 25 ## CFG_WEBSEARCH_RSS_I18N_COLLECTIONS -- comma-separated list of ## collections that feature an internationalized RSS feed on their ## main seach interface page created by webcoll. Other collections ## will have RSS feed using CFG_SITE_LANG. CFG_WEBSEARCH_RSS_I18N_COLLECTIONS = ## CFG_WEBSEARCH_RSS_TTL -- number of minutes that indicates how long ## a feed cache is valid. CFG_WEBSEARCH_RSS_TTL = 360 ## CFG_WEBSEARCH_RSS_MAX_CACHED_REQUESTS -- maximum number of request kept ## in cache. If the cache is filled, following request are not cached. CFG_WEBSEARCH_RSS_MAX_CACHED_REQUESTS = 1000 ## CFG_WEBSEARCH_AUTHOR_ET_AL_THRESHOLD -- up to how many author names ## to print explicitely; for more print "et al". Note that this is ## used in default formatting that is seldomly used, as usually ## BibFormat defines all the format. The value below is only used ## when BibFormat fails, for example. CFG_WEBSEARCH_AUTHOR_ET_AL_THRESHOLD = 3 ## CFG_WEBSEARCH_NARROW_SEARCH_SHOW_GRANDSONS -- whether to show or ## not collection grandsons in Narrow Search boxes (sons are shown by ## default, grandsons are configurable here). Use 0 for no and 1 for ## yes. CFG_WEBSEARCH_NARROW_SEARCH_SHOW_GRANDSONS = 1 ## CFG_WEBSEARCH_CREATE_SIMILARLY_NAMED_AUTHORS_LINK_BOX -- shall we ## create help links for Ellis, Nick or Ellis, Nicholas and friends ## when Ellis, N was searched for? Useful if you have one author ## stored in the database under several name formats, namely surname ## comma firstname and surname comma initial cataloging policy. Use 0 ## for no and 1 for yes. CFG_WEBSEARCH_CREATE_SIMILARLY_NAMED_AUTHORS_LINK_BOX = 1 ## CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS -- jsMath is a JavaScript ## library that renders (La)TeX mathematical formulas in the client ## browser. This parameter must contain a comma-separated list of ## output formats for which to apply the jsMath rendering, for example ## "hb,hd". If the list is empty, jsMath is disabled. CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS = ## CFG_WEBSEARCH_EXTERNAL_COLLECTION_SEARCH_TIMEOUT -- when searching ## external collections (e.g. SPIRES, CiteSeer, etc), how many seconds ## do we wait for reply before abandonning? CFG_WEBSEARCH_EXTERNAL_COLLECTION_SEARCH_TIMEOUT = 5 ## CFG_WEBSEARCH_EXTERNAL_COLLECTION_SEARCH_MAXRESULTS -- how many ## results do we fetch? CFG_WEBSEARCH_EXTERNAL_COLLECTION_SEARCH_MAXRESULTS = 10 ## CFG_WEBSEARCH_SPLIT_BY_COLLECTION -- do we want to split the search ## results by collection or not? Use 0 for not, 1 for yes. CFG_WEBSEARCH_SPLIT_BY_COLLECTION = 1 ## CFG_WEBSEARCH_DEF_RECORDS_IN_GROUPS -- the default number of ## records to display per page in the search results pages. CFG_WEBSEARCH_DEF_RECORDS_IN_GROUPS = 10 ## CFG_WEBSEARCH_MAX_RECORDS_IN_GROUPS -- in order to limit denial of ## service attacks the total number of records per group displayed as a ## result of a search query will be limited to this number. Only the superuser ## queries will not be affected by this limit. CFG_WEBSEARCH_MAX_RECORDS_IN_GROUPS = 200 ## CFG_WEBSEARCH_PERMITTED_RESTRICTED_COLLECTIONS_LEVEL -- logged in users ## might have rights to access some restricted collections. This variable ## tweaks the kind of support the system will automatically provide to the ## user with respect to searching into these restricted collections. ## Set this to 0 in order to have the user to explicitly activate restricted ## collections in order to search into them. Set this to 1 in order to ## propose to the user the list of restricted collections to which he/she has ## rights (note: this is not yet implemented). Set this to 2 in order to ## silently add all the restricted collections to which the user has rights to ## to any query. ## Note: the system will discover which restricted collections a user has ## rights to, at login time. The time complexity of this procedure is ## proportional to the number of restricted collections. E.g. for a system ## with ~50 restricted collections, you might expect ~1s of delay in the ## login time, when this variable is set to a value higher than 0. CFG_WEBSEARCH_PERMITTED_RESTRICTED_COLLECTIONS_LEVEL = 0 ## CFG_WEBSEARCH_SHOW_COMMENT_COUNT -- do we want to show the 'N comments' ## links on the search engine pages? (useful only when you have allowed ## commenting) CFG_WEBSEARCH_SHOW_COMMENT_COUNT = 1 ## CFG_WEBSEARCH_SHOW_REVIEW_COUNT -- do we want to show the 'N reviews' ## links on the search engine pages? (useful only when you have allowed ## reviewing) CFG_WEBSEARCH_SHOW_REVIEW_COUNT = 1 ## CFG_WEBSEARCH_FULLTEXT_SNIPPETS -- how many full-text snippets to ## display for full-text searches? CFG_WEBSEARCH_FULLTEXT_SNIPPETS = 0 ## CFG_WEBSEARCH_FULLTEXT_SNIPPETS_WORDS -- how many context words ## to display around the pattern in the snippet? CFG_WEBSEARCH_FULLTEXT_SNIPPETS_WORDS = 4 ####################################### ## Part 4: BibHarvest OAI parameters ## ####################################### ## This part defines parameters for the CDS Invenio OAI gateway. ## Useful if you are running CDS Invenio as OAI data provider. ## CFG_OAI_ID_FIELD -- OAI identifier MARC field: CFG_OAI_ID_FIELD = 909COo ## CFG_OAI_SET_FIELD -- OAI set MARC field: CFG_OAI_SET_FIELD = 909COp ## CFG_OAI_DELETED_POLICY -- OAI deletedrecordspolicy ## (no/transient/persistent). CFG_OAI_DELETED_POLICY = no ## CFG_OAI_ID_PREFIX -- OAI identifier prefix: CFG_OAI_ID_PREFIX = atlantis.cern.ch ## CFG_OAI_SAMPLE_IDENTIFIER -- OAI sample identifier: CFG_OAI_SAMPLE_IDENTIFIER = oai:atlantis.cern.ch:CERN-TH-4036 ## CFG_OAI_IDENTIFY_DESCRIPTION -- description for the OAI Identify verb: CFG_OAI_IDENTIFY_DESCRIPTION = oai atlantis.cern.ch : oai:atlantis.cern.ch:CERN-TH-4036 http://atlantis.cern.ch/ Free and unlimited use by anybody with obligation to refer to original record Full content, i.e. preprints may not be harvested by robots Submission restricted. Submitted documents are subject of approval by OAI repository admins. ## CFG_OAI_LOAD -- OAI number of records in a response: CFG_OAI_LOAD = 1000 ## CFG_OAI_EXPIRE -- OAI resumptionToken expiration time: CFG_OAI_EXPIRE = 90000 ## CFG_OAI_SLEEP -- service unavailable between two consecutive ## requests for CFG_OAI_SLEEP seconds: CFG_OAI_SLEEP = 10 ################################## ## Part 5: WebSubmit parameters ## ################################## ## This section contains some configuration parameters for WebSubmit ## module. Please note that WebSubmit is mostly configured on ## run-time via its WebSubmit Admin web interface. The parameters ## below are the ones that you do not probably want to modify during ## the runtime. ## CFG_WEBSUBMIT_FILESYSTEM_BIBDOC_GROUP_LIMIT -- the fulltext ## documents are stored under "/opt/cds-invenio/var/data/files/gX/Y" ## directories where X is 0,1,... and Y stands for bibdoc ID. Thusly ## documents Y are grouped into directories X and this variable ## indicates the maximum number of documents Y stored in each ## directory X. This limit is imposed solely for filesystem ## performance reasons in order not to have too many subdirectories in ## a given directory. CFG_WEBSUBMIT_FILESYSTEM_BIBDOC_GROUP_LIMIT = 5000 ## CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS -- a comma-separated ## list of document extensions not listed in Python standard mimetype ## library that should be recognized by Invenio. CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS = hpg,link,lis,llb,mat,mpp,msg,docx,docm,xlsx,xlsm,xlsb,pptx,pptm,ppsx,ppsm ## CFG_BIBDOCFILE_USE_XSENDFILE -- if your web server supports ## XSendfile header, you may want to enable this feature in order for ## to Invenio tell the web server to stream files for download (after ## proper authorization checks) by web server's means. This helps to ## liberate Invenio worker processes from being busy with sending big ## files to clients. The web server will take care of that. Note: ## this feature is still somewhat experimental. Note: when enabled ## (set to 1), then you have to also regenerate Apache vhost conf ## snippets (inveniocfg --update-config-py --create-apache-conf). CFG_BIBDOCFILE_USE_XSENDFILE = 0 ## CFG_BIBDOCFILE_MD5_CHECK_PROBABILITY -- a number between 0 and ## 1 that indicates probability with which MD5 checksum will be ## verified when streaming bibdocfile-managed files. (0.1 will cause ## the check to be performed once for every 10 downloads) CFG_BIBDOCFILE_MD5_CHECK_PROBABILITY = 0.1 +## CFG_OPENOFFICE_SERVER_HOST -- the host where an OpenOffice Server is +## listening to. If localhost an OpenOffice server will be started +## automatically if it is not already running. +## Note: if you set this to an empty value this will disable the usage of +## OpenOffice for converting documents. +## If you set this to something different than localhost you'll have to take +## care to have an OpenOffice server running on the corresponding host and +## to install the same OpenOffice release both on the client and on the server +## side. +## In order to launch an OpenOffice server on a remote machine, just start +## the usual 'soffice' executable in this way: +## $> soffice -headless -nologo -nodefault -norestore -nofirststartwizard \ +## .. -accept=socket,host=HOST,port=PORT;urp;StarOffice.ComponentContext +CFG_OPENOFFICE_SERVER_HOST = localhost + +## CFG_OPENOFFICE_SERVER_PORT -- the port where an OpenOffice Server is +## listening to. +CFG_OPENOFFICE_SERVER_PORT = 2002 + +## CFG_OPENOFFICE_USER -- the user that will be used to launch the OpenOffice +## client. It is recommended to set this to a user who don't own files, like +## e.g. 'nobody'. You should also authorize your Apache server user to be +## able to become this user, e.g. by adding to your /etc/sudoers the following +## line: +## "apache ALL=(nobody) NOPASSWD: ALL" +## provided that apache is the username corresponding to the Apache user. +## On some machine this might be apache2 or www-data. +CFG_OPENOFFICE_USER = nobody + ################################# ## Part 6: BibIndex parameters ## ################################# ## This section contains some configuration parameters for BibIndex ## module. Please note that BibIndex is mostly configured on run-time ## via its BibIndex Admin web interface. The parameters below are the ## ones that you do not probably want to modify very often during the ## runtime. ## CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY -- when fulltext indexing, do ## you want to index locally stored files only, or also external URLs? ## Use "0" to say "no" and "1" to say "yes". CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY = 0 ## CFG_BIBINDEX_REMOVE_STOPWORDS -- when indexing, do we want to remove ## stopwords? Use "0" to say "no" and "1" to say "yes". CFG_BIBINDEX_REMOVE_STOPWORDS = 0 ## CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS -- characters considered as ## alphanumeric separators of word-blocks inside words. You probably ## don't want to change this. CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS = \!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ ## CFG_BIBINDEX_CHARS_PUNCTUATION -- characters considered as punctuation ## between word-blocks inside words. You probably don't want to ## change this. CFG_BIBINDEX_CHARS_PUNCTUATION = \.\,\:\;\?\!\" ## CFG_BIBINDEX_REMOVE_HTML_MARKUP -- should we attempt to remove HTML markup ## before indexing? Use 1 if you have HTML markup inside metadata ## (e.g. in abstracts), use 0 otherwise. CFG_BIBINDEX_REMOVE_HTML_MARKUP = 0 ## CFG_BIBINDEX_REMOVE_LATEX_MARKUP -- should we attempt to remove LATEX markup ## before indexing? Use 1 if you have LATEX markup inside metadata ## (e.g. in abstracts), use 0 otherwise. CFG_BIBINDEX_REMOVE_LATEX_MARKUP = 0 ## CFG_BIBINDEX_MIN_WORD_LENGTH -- minimum word length allowed to be added to ## index. The terms smaller then this amount will be discarded. ## Useful to keep the database clean, however you can safely leave ## this value on 0 for up to 1,000,000 documents. CFG_BIBINDEX_MIN_WORD_LENGTH = 0 ## CFG_BIBINDEX_URLOPENER_USERNAME and CFG_BIBINDEX_URLOPENER_PASSWORD -- ## access credentials to access restricted URLs, interesting only if ## you are fulltext-indexing files located on a remote server that is ## only available via username/password. But it's probably better to ## handle this case via IP or some convention; the current scheme is ## mostly there for demo only. CFG_BIBINDEX_URLOPENER_USERNAME = mysuperuser CFG_BIBINDEX_URLOPENER_PASSWORD = mysuperpass ## CFG_INTBITSET_ENABLE_SANITY_CHECKS -- ## Enable sanity checks for integers passed to the intbitset data ## structures. It is good to enable this during debugging ## and to disable this value for speed improvements. CFG_INTBITSET_ENABLE_SANITY_CHECKS = False +## CFG_BIBINDEX_PERFORM_OCR_ON_DOCNAMES -- regular expression that matches +## docnames for which OCR is desired (set this to .* in order to enable +## OCR in general, set this to empty in order to disable it.) +CFG_BIBINDEX_PERFORM_OCR_ON_DOCNAMES = scan-.* + +## CFG_BIBINDEX_SPLASH_PAGES -- regular expression that matches URLs +## that are not to be indexed but that indirectly refers to documents +## that are supposed to be indexed. +CFG_BIBINDEX_SPLASH_PAGES = http://documents\.cern\.ch/setlink\?.* + ####################################### ## Part 7: Access control parameters ## ####################################### ## This section contains some configuration parameters for the access ## control system. Please note that WebAccess is mostly configured on ## run-time via its WebAccess Admin web interface. The parameters ## below are the ones that you do not probably want to modify very ## often during the runtime. (If you do want to modify them during ## runtime, for example te deny access temporarily because of backups, ## you can edit access_control_config.py directly, no need to get back ## here and no need to redo the make process.) ## CFG_ACCESS_CONTROL_LEVEL_SITE -- defines how open this site is. ## Use 0 for normal operation of the site, 1 for read-only site (all ## write operations temporarily closed), 2 for site fully closed, ## 3 for also disabling any database connection. ## Useful for site maintenance. CFG_ACCESS_CONTROL_LEVEL_SITE = 0 ## CFG_ACCESS_CONTROL_LEVEL_GUESTS -- guest users access policy. Use ## 0 to allow guest users, 1 not to allow them (all users must login). CFG_ACCESS_CONTROL_LEVEL_GUESTS = 0 ## CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS -- account registration and ## activation policy. When 0, users can register and accounts are ## automatically activated. When 1, users can register but admin must ## activate the accounts. When 2, users cannot register nor update ## their email address, only admin can register accounts. When 3, ## users cannot register nor update email address nor password, only ## admin can register accounts. When 4, the same as 3 applies, nor ## user cannot change his login method. When 5, then the same as 4 ## applies, plus info about how to get an account is hidden from the ## login page. CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS = 0 ## CFG_ACCESS_CONTROL_LIMIT_REGISTRATION_TO_DOMAIN -- limit account ## registration to certain email addresses? If wanted, give domain ## name below, e.g. "cern.ch". If not wanted, leave it empty. CFG_ACCESS_CONTROL_LIMIT_REGISTRATION_TO_DOMAIN = ## CFG_ACCESS_CONTROL_NOTIFY_ADMIN_ABOUT_NEW_ACCOUNTS -- send a ## notification email to the administrator when a new account is ## created? Use 0 for no, 1 for yes. CFG_ACCESS_CONTROL_NOTIFY_ADMIN_ABOUT_NEW_ACCOUNTS = 0 ## CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_NEW_ACCOUNT -- send a ## notification email to the user when a new account is created in order to ## to verify the validity of the provided email address? Use ## 0 for no, 1 for yes. CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_NEW_ACCOUNT = 1 ## CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_ACTIVATION -- send a ## notification email to the user when a new account is activated? ## Use 0 for no, 1 for yes. CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_ACTIVATION = 0 ## CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_DELETION -- send a ## notification email to the user when a new account is deleted or ## account demand rejected? Use 0 for no, 1 for yes. CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_DELETION = 0 ## CFG_APACHE_PASSWORD_FILE -- the file where Apache user credentials ## are stored. Must be an absolute pathname. If the value does not ## start by a slash, it is considered to be the filename of a file ## located under prefix/var/tmp directory. This is useful for the ## demo site testing purposes. For the production site, if you plan ## to restrict access to some collections based on the Apache user ## authentication mechanism, you should put here an absolute path to ## your Apache password file. CFG_APACHE_PASSWORD_FILE = demo-site-apache-user-passwords ## CFG_APACHE_GROUP_FILE -- the file where Apache user groups are ## defined. See the documentation of the preceding config variable. CFG_APACHE_GROUP_FILE = demo-site-apache-user-groups ################################### ## Part 8: WebSession parameters ## ################################### ## This section contains some configuration parameters for tweaking ## session handling. ## CFG_WEBSESSION_EXPIRY_LIMIT_DEFAULT -- number of days after which a session ## and the corresponding cookie is considered expired. CFG_WEBSESSION_EXPIRY_LIMIT_DEFAULT = 2 ## CFG_WEBSESSION_EXPIRY_LIMIT_REMEMBER -- number of days after which a session ## and the corresponding cookie is considered expired, when the user has ## requested to permanently stay logged in. CFG_WEBSESSION_EXPIRY_LIMIT_REMEMBER = 365 ## CFG_WEBSESSION_RESET_PASSWORD_EXPIRE_IN_DAYS -- when user requested ## a password reset, for how many days is the URL valid? CFG_WEBSESSION_RESET_PASSWORD_EXPIRE_IN_DAYS = 3 ## CFG_WEBSESSION_ADDRESS_ACTIVATION_EXPIRE_IN_DAYS -- when an account ## activation email was sent, for how many days is the URL valid? CFG_WEBSESSION_ADDRESS_ACTIVATION_EXPIRE_IN_DAYS = 3 ## CFG_WEBSESSION_NOT_CONFIRMED_EMAIL_ADDRESS_EXPIRE_IN_DAYS -- when ## user won't confirm his email address and not complete ## registeration, after how many days will it expire? CFG_WEBSESSION_NOT_CONFIRMED_EMAIL_ADDRESS_EXPIRE_IN_DAYS = 10 ## CFG_WEBSESSION_DIFFERENTIATE_BETWEEN_GUESTS -- when set to 1, the session ## system allocates the same uid=0 to all guests users regardless of where they ## come from. 0 allocate a unique uid to each guest. CFG_WEBSESSION_DIFFERENTIATE_BETWEEN_GUESTS = 0 ################################ ## Part 9: BibRank parameters ## ################################ ## This section contains some configuration parameters for the ranking ## system. ## CFG_BIBRANK_SHOW_READING_STATS -- do we want to show reading ## similarity stats? ('People who viewed this page also viewed') CFG_BIBRANK_SHOW_READING_STATS = 1 ## CFG_BIBRANK_SHOW_DOWNLOAD_STATS -- do we want to show the download ## similarity stats? ('People who downloaded this document also ## downloaded') CFG_BIBRANK_SHOW_DOWNLOAD_STATS = 1 ## CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS -- do we want to show download ## history graph? CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS = 1 ## CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS_CLIENT_IP_DISTRIBUTION -- do we ## want to show a graph representing the distribution of client IPs ## downloading given document? CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS_CLIENT_IP_DISTRIBUTION = 0 ## CFG_BIBRANK_SHOW_CITATION_LINKS -- do we want to show the 'Cited ## by' links? (useful only when you have citations in the metadata) CFG_BIBRANK_SHOW_CITATION_LINKS = 1 ## CFG_BIBRANK_SHOW_CITATION_STATS -- de we want to show citation ## stats? ('Cited by M recors', 'Co-cited with N records') CFG_BIBRANK_SHOW_CITATION_STATS = 1 ## CFG_BIBRANK_SHOW_CITATION_GRAPHS -- do we want to show citation ## history graph? CFG_BIBRANK_SHOW_CITATION_GRAPHS = 1 #################################### ## Part 10: WebComment parameters ## #################################### ## This section contains some configuration parameters for the ## commenting and reviewing facilities. ## CFG_WEBCOMMENT_ALLOW_COMMENTS -- do we want to allow users write ## public comments on records? CFG_WEBCOMMENT_ALLOW_COMMENTS = 1 ## CFG_WEBCOMMENT_ALLOW_REVIEWS -- do we want to allow users write ## public reviews of records? CFG_WEBCOMMENT_ALLOW_REVIEWS = 1 ## CFG_WEBCOMMENT_ALLOW_SHORT_REVIEWS -- do we want to allow short ## reviews, that is just the attribution of stars without submitting ## detailed review text? CFG_WEBCOMMENT_ALLOW_SHORT_REVIEWS = 0 ## CFG_WEBCOMMENT_NB_REPORTS_BEFORE_SEND_EMAIL_TO_ADMIN -- if users ## report a comment to be abusive, how many they have to be before the ## site admin is alerted? CFG_WEBCOMMENT_NB_REPORTS_BEFORE_SEND_EMAIL_TO_ADMIN = 5 ## CFG_WEBCOMMENT_NB_COMMENTS_IN_DETAILED_VIEW -- how many comments do ## we display in the detailed record page upon welcome? CFG_WEBCOMMENT_NB_COMMENTS_IN_DETAILED_VIEW = 1 ## CFG_WEBCOMMENT_NB_REVIEWS_IN_DETAILED_VIEW -- how many reviews do ## we display in the detailed record page upon welcome? CFG_WEBCOMMENT_NB_REVIEWS_IN_DETAILED_VIEW = 1 ## CFG_WEBCOMMENT_ADMIN_NOTIFICATION_LEVEL -- do we notify the site ## admin after every comment? CFG_WEBCOMMENT_ADMIN_NOTIFICATION_LEVEL = 1 ## CFG_WEBCOMMENT_TIMELIMIT_PROCESSING_COMMENTS_IN_SECONDS -- how many ## elapsed seconds do we consider enough when checking for possible ## multiple comment submissions by a user? CFG_WEBCOMMENT_TIMELIMIT_PROCESSING_COMMENTS_IN_SECONDS = 20 ## CFG_WEBCOMMENT_TIMELIMIT_PROCESSING_REVIEWS_IN_SECONDS -- how many ## elapsed seconds do we consider enough when checking for possible ## multiple review submissions by a user? CFG_WEBCOMMENT_TIMELIMIT_PROCESSING_REVIEWS_IN_SECONDS = 20 ## CFG_WEBCOMMENT_USE_RICH_EDITOR -- enable the WYSIWYG ## Javascript-based editor when user edits comments? CFG_WEBCOMMENT_USE_RICH_TEXT_EDITOR = False ## CFG_WEBCOMMENT_ALERT_ENGINE_EMAIL -- the email address from which the ## alert emails will appear to be sent: CFG_WEBCOMMENT_ALERT_ENGINE_EMAIL = cds.support@cern.ch ## CFG_WEBCOMMENT_DEFAULT_MODERATOR -- if no rules are ## specified to indicate who is the comment moderator of ## a collection, this person will be used as default CFG_WEBCOMMENT_DEFAULT_MODERATOR = cds.support@cern.ch ## CFG_WEBCOMMENT_USE_JSMATH_IN_COMMENTS -- do we want to allow the use ## of jsmath plugin to render latex input in comments? CFG_WEBCOMMENT_USE_JSMATH_IN_COMMENTS = 1 ## CFG_WEBCOMMENT_AUTHOR_DELETE_COMMENT_OPTION -- allow comment author to ## delete its own comment? CFG_WEBCOMMENT_AUTHOR_DELETE_COMMENT_OPTION = 1 ################################## ## Part 11: BibSched parameters ## ################################## ## This section contains some configuration parameters for the ## bibliographic task scheduler. ## CFG_BIBSCHED_REFRESHTIME -- how often do we want to refresh ## bibsched monitor? (in seconds) CFG_BIBSCHED_REFRESHTIME = 5 ## CFG_BIBSCHED_LOG_PAGER -- what pager to use to view bibsched task ## logs? CFG_BIBSCHED_LOG_PAGER = /bin/more ## CFG_BIBSCHED_GC_TASKS_OLDER_THAN -- after how many days to perform the ## gargbage collector of BibSched queue (i.e. removing/moving task to archive). CFG_BIBSCHED_GC_TASKS_OLDER_THAN = 30 ## CFG_BIBSCHED_GC_TASKS_TO_REMOVE -- list of BibTask that can be safely ## removed from the BibSched queue once they are DONE. CFG_BIBSCHED_GC_TASKS_TO_REMOVE = bibindex,bibreformat,webcoll,bibrank,inveniogc ## CFG_BIBSCHED_GC_TASKS_TO_ARCHIVE -- list of BibTasks that should be safely ## archived out of the BibSched queue once they are DONE. CFG_BIBSCHED_GC_TASKS_TO_ARCHIVE = bibupload,oaiarchive ## CFG_BIBSCHED_MAX_NUMBER_CONCURRENT_TASKS -- maximum number of BibTasks ## that can run concurrently. ## NOTE: concurrent tasks are still considered as an experimental ## feature. Please keep this value set to 1 on production environments. CFG_BIBSCHED_MAX_NUMBER_CONCURRENT_TASKS = 1 ## CFG_BIBSCHED_PROCESS_USER -- bibsched and bibtask processes must ## usually run under the same identity as the Apache web server ## process in order to share proper file read/write privileges. If ## you want to force some other bibsched/bibtask user, e.g. because ## you are using a local `invenio' user that belongs to your ## `www-data' Apache user group and so shares writing rights with your ## Apache web server process in this way, then please set its username ## identity here. Otherwise we shall check whether your ## bibsched/bibtask processes are run under the same identity as your ## Apache web server process (in which case you can leave the default ## empty value here). CFG_BIBSCHED_PROCESS_USER = ################################### ## Part 12: WebBasket parameters ## ################################### ## CFG_WEBBASKET_MAX_NUMBER_OF_DISPLAYED_BASKETS -- a safety limit for ## a maximum number of displayed baskets CFG_WEBBASKET_MAX_NUMBER_OF_DISPLAYED_BASKETS = 20 ## CFG_WEBBASKET_USE_RICH_TEXT_EDITOR -- enable the WYSIWYG ## Javascript-based editor when user edits comments in WebBasket? CFG_WEBBASKET_USE_RICH_TEXT_EDITOR = False ################################## ## Part 13: WebAlert parameters ## ################################## ## This section contains some configuration parameters for the ## automatic email notification alert system. ## CFG_WEBALERT_ALERT_ENGINE_EMAIL -- the email address from which the ## alert emails will appear to be sent: CFG_WEBALERT_ALERT_ENGINE_EMAIL = cds.support@cern.ch ## CFG_WEBALERT_MAX_NUM_OF_RECORDS_IN_ALERT_EMAIL -- how many records ## at most do we send in an outgoing alert email? CFG_WEBALERT_MAX_NUM_OF_RECORDS_IN_ALERT_EMAIL = 20 ## CFG_WEBALERT_MAX_NUM_OF_CHARS_PER_LINE_IN_ALERT_EMAIL -- number of ## chars per line in an outgoing alert email? CFG_WEBALERT_MAX_NUM_OF_CHARS_PER_LINE_IN_ALERT_EMAIL = 72 ## CFG_WEBALERT_SEND_EMAIL_NUMBER_OF_TRIES -- when sending alert ## emails fails, how many times we retry? CFG_WEBALERT_SEND_EMAIL_NUMBER_OF_TRIES = 3 ## CFG_WEBALERT_SEND_EMAIL_SLEEPTIME_BETWEEN_TRIES -- when sending ## alert emails fails, what is the sleeptime between tries? (in ## seconds) CFG_WEBALERT_SEND_EMAIL_SLEEPTIME_BETWEEN_TRIES = 300 #################################### ## Part 14: WebMessage parameters ## #################################### ## CFG_WEBMESSAGE_MAX_SIZE_OF_MESSAGE -- how large web messages do we ## allow? CFG_WEBMESSAGE_MAX_SIZE_OF_MESSAGE = 20000 ## CFG_WEBMESSAGE_MAX_NB_OF_MESSAGES -- how many messages for a ## regular user do we allow in its inbox? CFG_WEBMESSAGE_MAX_NB_OF_MESSAGES = 30 ## CFG_WEBMESSAGE_DAYS_BEFORE_DELETE_ORPHANS -- how many days before ## we delete orphaned messages? CFG_WEBMESSAGE_DAYS_BEFORE_DELETE_ORPHANS = 60 ################################## ## Part 15: MiscUtil parameters ## ################################## ## CFG_MISCUTIL_SQL_MAX_CACHED_QUERIES -- maximum number of cached SQL ## queries possible. After reaching this number the cache is pruned ## by deleting half of the older queries. CFG_MISCUTIL_SQL_MAX_CACHED_QUERIES = 10000 ## CFG_MISCUTIL_SQL_USE_SQLALCHEMY -- whether to use SQLAlchemy.pool ## in the DB engine of CDS Invenio. It is okay to enable this flag ## even if you have not installed SQLAlchemy. Note that Invenio will ## loose some perfomance if this option is enabled. CFG_MISCUTIL_SQL_USE_SQLALCHEMY = False ## CFG_MISCUTIL_SQL_RUN_SQL_MANY_LIMIT -- how many queries can we run ## inside run_sql_many() in one SQL statement? The limit value ## depends on MySQL's max_allowed_packet configuration. CFG_MISCUTIL_SQL_RUN_SQL_MANY_LIMIT = 10000 ## CFG_MISCUTIL_SMTP_HOST -- which server to use as outgoing mail server to ## send outgoing emails generated by the system, for example concerning ## submissions or email notification alerts. CFG_MISCUTIL_SMTP_HOST = localhost ## CFG_MISCUTIL_SMTP_PORT -- which port to use on the outgoing mail server ## defined in the previous step. CFG_MISCUTIL_SMTP_PORT = 25 +## CFG_MISCUTILS_DEFAULT_PROCESS_TIMEOUT -- the default number of seconds after +## which a process launched trough shellutils.run_process_with_timeout will +## be killed. This is useful to catch runaway processes. +CFG_MISCUTIL_DEFAULT_PROCESS_TIMEOUT = 300 + ################################# ## Part 16: BibEdit parameters ## ################################# ## CFG_BIBEDIT_TIMEOUT -- when a user edits a record, this record is ## locked to prevent other users to edit it at the same time. ## How many seconds of inactivity before the locked record again will be free ## for other people to edit? CFG_BIBEDIT_TIMEOUT = 3600 ## CFG_BIBEDIT_LOCKLEVEL -- when a user tries to edit a record which there ## is a pending bibupload task for in the queue, this shouldn't be permitted. ## The lock level determines how thouroughly the queue should be investigated ## to determine if this is the case. ## Level 0 - always permits editing, doesn't look at the queue ## (unsafe, use only if you know what you are doing) ## Level 1 - permits editing if there are no queued bibedit tasks for this record ## (safe with respect to bibedit, but not for other bibupload maintenance jobs) ## Level 2 - permits editing if there are no queued bibupload tasks of any sort ## (safe, but may lock more than necessary if many cataloguers around) ## Level 3 - permits editing if no queued bibupload task concerns given record ## (safe, most precise locking, but slow, ## checks for 001/EXTERNAL_SYSNO_TAG/EXTERNAL_OAIID_TAG) ## The recommended level is 3 (default) or 2 (if you use maintenance jobs often). CFG_BIBEDIT_LOCKLEVEL = 3 ## CFG_BIBEDIT_PROTECTED_FIELDS -- a comma-separated list of fields that BibEdit ## will not allow to be added, edited or deleted. Wildcards are not supported, ## but conceptually a wildcard is added at the end of every field specification. ## Examples: ## 500A - protect all MARC fields with tag 500 and first indicator A ## 5 - protect all MARC fields in the 500-series. ## 909C_a - protect subfield a in tag 909 with first indicator C and empty ## second indicator ## Note that 001 is protected by default, but if protection of other ## identifiers or automated fields is a requirement, they should be added to ## this list. CFG_BIBEDIT_PROTECTED_FIELDS = ################################### ## Part 17: BibUpload parameters ## ################################### ## CFG_BIBUPLOAD_REFERENCE_TAG -- where do we store references? CFG_BIBUPLOAD_REFERENCE_TAG = 999 ## CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG -- where do we store external ## system numbers? Useful for matching when our records come from an ## external digital library system. CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG = 970__a ## CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG -- where do we store OAI ID tags ## of harvested records? Useful for matching when we harvest stuff ## via OAI that we do not want to reexport via Invenio OAI; so records ## may have only the source OAI ID stored in this tag (kind of like ## external system number too). CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG = 035__a ## CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG -- where do we store OAI SRC ## tags of harvested records? Useful for matching when we harvest stuff ## via OAI that we do not want to reexport via Invenio OAI; so records ## may have only the source OAI SRC stored in this tag (kind of like ## external system number too). Note that the field should be the same of ## CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG. CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG = 035__9 ## CFG_BIBUPLOAD_STRONG_TAGS -- a comma-separated list of tags that ## are strong enough to resist the replace mode. Useful for tags that ## might be created from an external non-metadata-like source, ## e.g. the information about the number of copies left. CFG_BIBUPLOAD_STRONG_TAGS = 964 ## CFG_BIBUPLOAD_CONTROLLED_PROVENANCE_TAGS -- a comma-separated list ## of tags that contain provenance information that should be checked ## in the bibupload correct mode via matching provenance codes. (Only ## field instances of the same provenance information would be acted ## upon.) Please specify the whole tag info up to subfield codes. CFG_BIBUPLOAD_CONTROLLED_PROVENANCE_TAGS = 6531_9 ## CFG_BIBUPLOAD_FFT_ALLOWED_LOCAL_PATHS -- a comma-separated list of system ## paths from which it is allowed to take fulltextes that will be uploaded via ## FFT (CFG_TMPDIR is included by default). CFG_BIBUPLOAD_FFT_ALLOWED_LOCAL_PATHS = /tmp,/home ## CFG_BIBUPLOAD_SERIALIZE_RECORD_STRUCTURE -- do we want to serialize ## internal representation of records (Pythonic record structure) into ## the database? This can improve internal processing speed of some ## operations at the price of somewhat bigger disk space usage. ## If you change this value after some records have already been added ## to your installation, you may want to run: ## $ /opt/cds-invenio/bin/inveniocfg --reset-recstruct-cache ## in order to either erase the cache thus freeing database space, ## or to fill the cache for all records that have not been cached yet. CFG_BIBUPLOAD_SERIALIZE_RECORD_STRUCTURE = 1 #################################### ## Part 18: BibCatalog parameters ## #################################### ## EXPERIMENTAL: Please do not use. CFG_BIBCATALOG_SYSTEM = CFG_BIBCATALOG_SYSTEM_RT_CLI = /usr/bin/rt CFG_BIBCATALOG_SYSTEM_RT_URL = http://localhost/rt3 CFG_BIBCATALOG_QUEUES = General #################################### ## Part 19: BibFormat parameters ## #################################### ## CFG_BIBFORMAT_HIDDEN_TAGS -- comma-separated list of MARC tags that ## are not shown to users not having cataloging authorizations. CFG_BIBFORMAT_HIDDEN_TAGS = 595 ########################## ## THAT's ALL, FOLKS! ## ########################## diff --git a/configure-tests.py b/configure-tests.py index d8c8eff62..dbc82abce 100644 --- a/configure-tests.py +++ b/configure-tests.py @@ -1,294 +1,313 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ Test the suitability of Python core and the availability of various Python modules for running CDS Invenio. Warn the user if there are eventual troubles. Exit status: 0 if okay, 1 if not okay. Useful for running from configure.ac. """ ## minimally recommended/required versions: cfg_min_python_version = "2.4" cfg_min_mysqldb_version = "1.2.1_p2" ## 0) import modules needed for this testing: import string import sys import getpass def wait_for_user(msg): """Print MSG and prompt user for confirmation.""" try: raw_input(msg) except KeyboardInterrupt: print "\n\nInstallation aborted." sys.exit(1) except EOFError: print " (continuing in batch mode)" return ## 1) check Python version: if sys.version < cfg_min_python_version: print """ ******************************************************* ** ERROR: OLD PYTHON DETECTED: %s ******************************************************* ** You seem to be using an old version of Python. ** ** You must use at least Python %s. ** ** ** ** Note that if you have more than one Python ** ** installed on your system, you can specify the ** ** --with-python configuration option to choose ** ** a specific (e.g. non system wide) Python binary. ** ** ** ** Please upgrade your Python before continuing. ** ******************************************************* """ % (string.replace(sys.version, "\n", ""), cfg_min_python_version) sys.exit(1) ## 2) check for required modules: try: import MySQLdb import base64 import cPickle import cStringIO import cgi import copy import fileinput import getopt import sys if sys.hexversion < 0x2060000: import md5 else: import hashlib import marshal import os import signal import tempfile import time import traceback import unicodedata import urllib import zlib import wsgiref except ImportError, msg: print """ ************************************************* ** IMPORT ERROR %s ************************************************* ** Perhaps you forgot to install some of the ** ** prerequisite Python modules? Please look ** ** at our INSTALL file for more details and ** ** fix the problem before continuing! ** ************************************************* """ % msg sys.exit(1) ## 3) check for recommended modules: try: if (2**31 - 1) == sys.maxint: # check for Psyco since we seem to run in 32-bit environment import psyco else: # no need to advise on Psyco on 64-bit systems pass except ImportError, msg: print """ ***************************************************** ** IMPORT WARNING %s ***************************************************** ** Note that Psyco is not really required but we ** ** recommend it for faster CDS Invenio operation ** ** if you are running in 32-bit operating system. ** ** ** ** You can safely continue installing CDS Invenio ** ** now, and add this module anytime later. (I.e. ** ** even after your CDS Invenio installation is put ** ** into production.) ** ***************************************************** """ % msg wait_for_user("Press ENTER to continue the installation...") try: import rdflib except ImportError, msg: print """ ***************************************************** ** IMPORT WARNING %s ***************************************************** ** Note that rdflib is needed only if you plan ** ** to work with the automatic classification of ** ** documents based on RDF-based taxonomies. ** ** ** ** You can safely continue installing CDS Invenio ** ** now, and add this module anytime later. (I.e. ** ** even after your CDS Invenio installation is put ** ** into production.) ** ***************************************************** """ % msg wait_for_user("Press ENTER to continue the installation...") try: import pyRXP except ImportError, msg: print """ ***************************************************** ** IMPORT WARNING %s ***************************************************** ** Note that PyRXP is not really required but ** ** we recommend it for fast XML MARC parsing. ** ** ** ** You can safely continue installing CDS Invenio ** ** now, and add this module anytime later. (I.e. ** ** even after your CDS Invenio installation is put ** ** into production.) ** ***************************************************** """ % msg wait_for_user("Press ENTER to continue the installation...") try: import libxml2 except ImportError, msg: print """ ***************************************************** ** IMPORT WARNING %s ***************************************************** ** Note that libxml2 is not really required but ** ** we recommend it for XML metadata conversions ** ** and for fast XML parsing. ** ** ** ** You can safely continue installing CDS Invenio ** ** now, and add this module anytime later. (I.e. ** ** even after your CDS Invenio installation is put ** ** into production.) ** ***************************************************** """ % msg wait_for_user("Press ENTER to continue the installation...") try: import libxslt except ImportError, msg: print """ ***************************************************** ** IMPORT WARNING %s ***************************************************** ** Note that libxslt is not really required but ** ** we recommend it for XML metadata conversions. ** ** ** ** You can safely continue installing CDS Invenio ** ** now, and add this module anytime later. (I.e. ** ** even after your CDS Invenio installation is put ** ** into production.) ** ***************************************************** """ % msg wait_for_user("Press ENTER to continue the installation...") try: import Gnuplot except ImportError, msg: print """ ***************************************************** ** IMPORT WARNING %s ***************************************************** ** Note that Gnuplot.py is not really required but ** ** we recommend it in order to have nice download ** ** and citation history graphs on Detailed record ** ** pages. ** ** ** ** You can safely continue installing CDS Invenio ** ** now, and add this module anytime later. (I.e. ** ** even after your CDS Invenio installation is put ** ** into production.) ** ***************************************************** """ % msg wait_for_user("Press ENTER to continue the installation...") try: import magic except ImportError, msg: print """ ***************************************************** ** IMPORT WARNING %s ***************************************************** ** Note that magic module is not really required ** ** but we recommend it in order to have detailed ** ** content information about fulltext files. ** ** ** ** You can safely continue installing CDS Invenio ** ** now, and add this module anytime later. (I.e. ** ** even after your CDS Invenio installation is put ** ** into production.) ** ***************************************************** """ % msg +try: + import reportlab +except ImportError, msg: + print """ + ***************************************************** + ** IMPORT WARNING %s + ***************************************************** + ** Note that reportlab module is not really ** + ** required, but we recommend it you want to ** + ** enrich PDF with OCR information. ** + ** ** + ** You can safely continue installing CDS Invenio ** + ** now, and add this module anytime later. (I.e. ** + ** even after your CDS Invenio installation is put ** + ** into production.) ** + ***************************************************** + """ % msg + wait_for_user("Press ENTER to continue the installation...") + ## 4) check for versions of some important modules: if MySQLdb.__version__ < cfg_min_mysqldb_version: print """ ***************************************************** ** ERROR: PYTHON MODULE MYSQLDB %s DETECTED ***************************************************** ** You have to upgrade your MySQLdb to at least ** ** version %s. You must fix this problem ** ** before continuing. Please see the INSTALL file ** ** for more details. ** ***************************************************** """ % (MySQLdb.__version__, cfg_min_mysqldb_version) sys.exit(1) try: import Stemmer try: from Stemmer import algorithms except ImportError, msg: print """ ***************************************************** ** ERROR: STEMMER MODULE PROBLEM %s ***************************************************** ** Perhaps you are using an old Stemmer version? ** ** You must either remove your old Stemmer or else ** ** upgrade to Snowball Stemmer ** ** before continuing. Please see the INSTALL file ** ** for more details. ** ***************************************************** """ % (msg) sys.exit(1) except ImportError: pass # no prob, Stemmer is optional ## 5) check for Python.h (needed for intbitset): try: from distutils.sysconfig import get_python_inc path_to_python_h = get_python_inc() + os.sep + 'Python.h' if not os.path.exists(path_to_python_h): raise StandardError, "Cannot find %s" % path_to_python_h except StandardError, msg: print """ ***************************************************** ** ERROR: PYTHON HEADER FILE ERROR %s ***************************************************** ** You do not seem to have Python developer files ** ** installed (such as Python.h). Some operating ** ** systems provide these in a separate Python ** ** package called python-dev or python-devel. ** ** You must install such a package before ** ** continuing the installation process. ** ***************************************************** """ % (msg) sys.exit(1) diff --git a/configure.ac b/configure.ac index ee3bc31af..980fa3cbe 100644 --- a/configure.ac +++ b/configure.ac @@ -1,729 +1,799 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## This is CDS Invenio main configure.ac file. If you change this ## file, then please run "autoreconf" to regenerate the "configure" ## script. ## Initialize autoconf and automake: -AC_INIT(cds-invenio, 0.99.90.20100319, cds.support@cern.ch) +AC_INIT(cds-invenio, 0.99.90.20100413, cds.support@cern.ch) AM_INIT_AUTOMAKE([tar-ustar]) ## By default we shall install into /opt/cds-invenio. (Do not use ## AC_PREFIX_DEFAULT for this, because it would not work well with ## the localstatedir hack below.) test "${prefix}" = NONE && prefix=/opt/cds-invenio ## Remove eventual trailing slashes from the prefix value: test "${prefix%/}" != "" && prefix=${prefix%/} ## Check for install: AC_PROG_INSTALL ## Check for gettext support: AM_GNU_GETTEXT(external) AM_GNU_GETTEXT_VERSION(0.14.4) ## Check for MySQL client: AC_MSG_CHECKING(for mysql) AC_ARG_WITH(mysql, AC_HELP_STRING([--with-mysql], [path to a specific MySQL binary (optional)]), MYSQL=${withval}) if test -n "$MYSQL"; then AC_MSG_RESULT($MYSQL) else AC_PATH_PROG(MYSQL, mysql) if test -z "$MYSQL"; then AC_MSG_ERROR([ MySQL command-line client was not found in your PATH. Please install it first. Available from .]) fi fi ## Check for Python: AC_MSG_CHECKING(for python) AC_ARG_WITH(python, AC_HELP_STRING([--with-python], [path to a specific Python binary (optional)]), PYTHON=${withval}) if test -n "$PYTHON"; then AC_MSG_RESULT($PYTHON) else AC_PATH_PROG(PYTHON, python) if test -z "$PYTHON"; then AC_MSG_ERROR([ Python was not found in your PATH. Please either install it in your PATH or specify --with-python configure option. Python is available from .]) fi fi +## Check for OpenOffice.org Python binary: +AC_MSG_CHECKING(for OpenOffice.org Python binary) +AC_ARG_WITH(openoffice-python, AC_HELP_STRING([--with-openoffice-python], [path to a specific OpenOffice.org Python binary (optional)]), OPENOFFICE_PYTHON=`which ${withval}`) + +if test -z "$OPENOFFICE_PYTHON"; then + OPENOFFICE_PYTHON=`locate -n 1 -r "o.*office/program/python$"` + OPENOFFICE_PYTHON="$PYTHON $OPENOFFICE_PYTHON" + if test -n "$OPENOFFICE_PYTHON" && ($OPENOFFICE_PYTHON -c "import uno" 2> /dev/null); then + AC_MSG_RESULT($OPENOFFICE_PYTHON) + else + AC_MSG_WARN([ + You have not specified the path ot the OpenOffice.org Python binary. + OpenOffice.org and Microsoft Office document conversion and fulltext indexing + will not be available. We recommend you to install OpenOffice.org first + and to rerun the configure script. OpenOffice.org is available from + .]) + fi +elif ($OPENOFFICE_PYTHON -c "import uno" 2> /dev/null); then + AC_MSG_RESULT($OPENOFFICE_PYTHON) +else + AC_MSG_ERROR([ + The specified OpenOffice.org Python binary is not correctly configured. + Please specify the correct path to the specific OpenOffice Python binary + (OpenOffice.org is available from ).]) +fi + ## Check for Python version and modules: AC_MSG_CHECKING(for required Python modules) $PYTHON ${srcdir}/configure-tests.py if test $? -ne 0; then AC_MSG_ERROR([Please fix the above Python problem before continuing.]) fi AC_MSG_RESULT(found) ## Check for PHP: AC_PATH_PROG(PHP, php) -## Check for Acrobat Reader: -AC_PATH_PROG(ACROREAD, acroread) -if test -z "$ACROREAD"; then - AC_MSG_WARN([ - Acrobat Reader was not found in your PATH. It is used in - the WebSubmit module for automatic conversion of submitted documents. - You can continue without it but you will miss some CDS Invenio - functionality. We recommend you to install it first and to rerun - the configure script. Acrobat Reader is available from - .]) -fi - ## Check for gzip: AC_PATH_PROG(GZIP, gzip) if test -z "$GZIP"; then AC_MSG_WARN([ Gzip was not found in your PATH. It is used in the WebSubmit module to compress the data submitted in an archive. You can continue without it but you will miss some CDS Invenio functionality. We recommend you to install it first and to rerun the configure script. Gzip is available from .]) fi ## Check for gunzip: AC_PATH_PROG(GUNZIP, gunzip) if test -z "$GUNZIP"; then AC_MSG_WARN([ Gunzip was not found in your PATH. It is used in the WebSubmit module to correctly deal with submitted compressed files. You can continue without it but you will miss some CDS Invenio functionality. We recommend you to install it first and to rerun the configure script. Gunzip is available from .]) fi ## Check for tar: AC_PATH_PROG(TAR, tar) if test -z "$TAR"; then AC_MSG_WARN([ Tar was not found in your PATH. It is used in the WebSubmit module to pack the submitted data into an archive. You can continue without it but you will miss some CDS Invenio functionality. We recommend you to install it first and to rerun the configure script. Tar is available from .]) fi ## Check for wget: AC_PATH_PROG(WGET, wget) if test -z "$WGET"; then AC_MSG_WARN([ wget was not found in your PATH. It is used for the fulltext file retrieval. You can continue without it but we recomend you to install it first and to rerun the configure script. wget is available from .]) fi ## Check for md5sum: AC_PATH_PROG(MD5SUM, md5sum) if test -z "$MD5SUM"; then AC_MSG_WARN([ md5sum was not found in your PATH. It is used for the fulltext file checksum verification. You can continue without it but we recomend you to install it first and to rerun the configure script. md5sum is available from .]) fi ## Check for ps2pdf: AC_PATH_PROG(PS2PDF, ps2pdf) if test -z "$PS2PDF"; then AC_MSG_WARN([ ps2pdf was not found in your PATH. It is used in the WebSubmit module to convert submitted PostScripts into PDF. You can continue without it but you will miss some CDS Invenio functionality. We recommend you to install it first and to rerun the configure script. ps2pdf is available from .]) fi +## Check for tiff2pdf: +AC_PATH_PROG(TIFF2PDF, tiff2pdf) +if test -z "$TIFF2PDF"; then + AC_MSG_WARN([ + tiff2pdf was not found in your PATH. It is used in + the WebSubmit module to convert submitted TIFF file into PDF. + You can continue without it but you will miss some CDS Invenio + functionality. We recommend you to install it first and to rerun + the configure script. tiff2pdf is available from + .]) +fi + +## Check for gs: +AC_PATH_PROG(GS, gs) +if test -z "$GS"; then + AC_MSG_WARN([ + gs was not found in your PATH. It is used in + the WebSubmit module to convert submitted PostScripts into PDF. + You can continue without it but you will miss some CDS Invenio + functionality. We recommend you to install it first and to rerun + the configure script. gs is available from + .]) +fi + ## Check for pdftotext: AC_PATH_PROG(PDFTOTEXT, pdftotext) if test -z "$PDFTOTEXT"; then AC_MSG_WARN([ pdftotext was not found in your PATH. It is used for the fulltext indexation of PDF files. You can continue without it but you may miss fulltext searching capability of CDS Invenio. We recomend you to install it first and to rerun the configure script. pdftotext is available from . ]) fi +## Check for pdftotext: +AC_PATH_PROG(PDFINFO, pdfinfo) +if test -z "$PDFINFO"; then + AC_MSG_WARN([ + pdfinfo was not found in your PATH. It is used for gathering information on + PDF files. + You can continue without it but you may miss this feature of CDS Invenio. + We recomend you to install it first and to rerun the configure + script. pdftotext is available from . + ]) +fi + ## Check for pdftk: AC_PATH_PROG(PDFTK, pdftk) if test -z "$PDFTK"; then AC_MSG_WARN([ pdftk was not found in your PATH. It is used for the fulltext file stamping. You can continue without it but you may miss this feature of CDS Invenio. We recomend you to install it first and to rerun the configure script. pdftk is available from . ]) fi ## Check for pdf2ps: AC_PATH_PROG(PDF2PS, pdf2ps) if test -z "$PDF2PS"; then AC_MSG_WARN([ pdf2ps was not found in your PATH. It is used in the WebSubmit module to convert submitted PDFs into PostScript. You can continue without it but you will miss some CDS Invenio functionality. We recommend you to install it first and to rerun the configure script. pdf2ps is available from .]) fi -## Check for pstotext: -AC_PATH_PROG(PSTOTEXT, pstotext) -if test -z "$PSTOTEXT"; then +## Check for pdftops: +AC_PATH_PROG(PDFTOPS, pdftops) +if test -z "$PDFTOPS"; then AC_MSG_WARN([ - pstotext was not found in your PATH. It is used for the fulltext indexation - of PDF and PostScript files. - You can continue without it but you may miss fulltext searching capability - of CDS Invenio. We recomend you to install it first and to rerun the configure - script. pstotext is available from . - ]) + pdftops was not found in your PATH. It is used in + the WebSubmit module to convert submitted PDFs into PostScript. + You can continue without it but you will miss some CDS Invenio + functionality. We recommend you to install it first and to rerun + the configure script. pdftops is available from + .]) fi -## Check for ps2ascii: -AC_PATH_PROG(PSTOASCII, ps2ascii) -if test -z "$PSTOASCII"; then +## Check for pdfopt: +AC_PATH_PROG(PDFOPT, pdfopt) +if test -z "$PDFOPT"; then AC_MSG_WARN([ - ps2ascii was not found in your PATH. It is used for the fulltext indexation - of PostScript files. - You can continue without it but you may miss fulltext searching capability - of CDS Invenio. We recomend you to install it first and to rerun the configure - script. ps2ascii is available from . - ]) + pdfopt was not found in your PATH. It is used in + the WebSubmit module to linearized submitted PDFs. + You can continue without it but you will miss some CDS Invenio + functionality. We recommend you to install it first and to rerun + the configure script. pdfopt is available from + .]) fi -## Check for antiword: -AC_PATH_PROG(ANTIWORD, antiword) -if test -z "$ANTIWORD"; then +## Check for pdfimages: +AC_PATH_PROG(PDFTOPPM, pdftoppm) +if test -z "$PDFTOPPM"; then AC_MSG_WARN([ - antiword was not found in your PATH. It is used for the fulltext indexation - of Microsoft Word files. - You can continue without it but you may miss fulltext searching capability - of CDS Invenio. We recomend you to install it first and to rerun the configure - script. antiword is available from . - ]) + pdftoppm was not found in your PATH. It is used in + the WebSubmit module to extract images from PDFs for OCR. + You can continue without it but you will miss some CDS Invenio + functionality. We recommend you to install it first and to rerun + the configure script. pdftoppm is available from + .]) fi -## Check for catdoc: -AC_PATH_PROG(CATDOC, catdoc) -if test -z "$CATDOC"; then +## Check for pdfimages: +AC_PATH_PROG(PAMFILE, pdftoppm) +if test -z "$PAMFILE"; then AC_MSG_WARN([ - catdoc was not found in your PATH. It is used for the fulltext indexation - of Microsoft Word files. - You can continue without it but you may miss fulltext searching capability - of CDS Invenio. We recomend you to install it first and to rerun the configure - script. catdoc is available from . - ]) + pamfile was not found in your PATH. It is used in + the WebSubmit module to retrieve the size of images extracted from PDFs + for OCR. + You can continue without it but you will miss some CDS Invenio + functionality. We recommend you to install it first and to rerun + the configure script. pamfile is available as part of the netpbm utilities + from: + .]) fi -## Check for wvText: -AC_PATH_PROG(WVTEXT, wvText) -if test -z "$WVTEXT"; then +## Check for ocroscript: +AC_PATH_PROG(OCROSCRIPT, ocroscript) +if test -z "$OCROSCRIPT"; then AC_MSG_WARN([ - wvText was not found in your PATH. It is used for the fulltext indexation - of Microsoft Word files. - You can continue without it but you may miss fulltext searching capability - of CDS Invenio. We recomend you to install it first and to rerun the configure - script. wvText is available from . - ]) + If you plan to run OCR on your PDFs, then please install + ocroscript now. Otherwise you can safely continue. You have also an + option to install ocroscript later and edit invenio-local.conf to let + CDS Invenio know the path to ocroscript. + ocroscript is available as part of OCROpus from + . + NOTE: Since OCROpus is being actively developed and its api is continuosly + changing, please install relase 0.3.1]) fi -## Check for ppthtml: -AC_PATH_PROG(PPTHTML, ppthtml) -if test -z "$PPTHTML"; then +## Check for pstotext: +AC_PATH_PROG(PSTOTEXT, pstotext) +if test -z "$PSTOTEXT"; then AC_MSG_WARN([ - ppthtml was found in your PATH. It is used for the fulltext indexation - of Microsoft PowerPoint files. - You can continue without it but you may miss fulltext searching capability - of CDS Invenio. We recomend you to install it first and to rerun the configure - script. ppthtml is available from . + pstotext was not found in your PATH. It is used for the fulltext indexation + of PDF and PostScript files. + Please install pstotext. Otherwise you can safely continue. You have also an + option to install pstotext later and edit invenio-local.conf to let + CDS Invenio know the path to pstotext. + pstotext is available from . ]) fi -## Check for xlhtml: -AC_PATH_PROG(XLHTML, xlhtml) -if test -z "$XLHTML"; then +## Check for ps2ascii: +AC_PATH_PROG(PSTOASCII, ps2ascii) +if test -z "$PSTOASCII"; then AC_MSG_WARN([ - xlhtml was found in your PATH. It is used for the fulltext indexation - of Microsoft Excel files. - You can continue without it but you may miss fulltext searching capability - of CDS Invenio. We recomend you to install it first and to rerun the configure - script. xlhtml is available from . + ps2ascii was not found in your PATH. It is used for the fulltext indexation + of PostScript files. + Please install ps2ascii. Otherwise you can safely continue. You have also an + option to install ps2ascii later and edit invenio-local.conf to let + CDS Invenio know the path to ps2ascii. + ps2ascii is available from . ]) fi -## Check for html2text: -AC_PATH_PROG(HTMLTOTEXT, html2text) -if test -z "$HTMLTOTEXT"; then +## Check for any2djvu: +AC_PATH_PROG(ANY2DJVU, any2djvu) +if test -z "$ANY2DJVU"; then AC_MSG_WARN([ - html2text was found in your PATH. It is used for the fulltext indexation - of Microsoft PowerPoint and Excel files. - You can continue without it but you may miss fulltext searching capability - of CDS Invenio. We recomend you to install it first and to rerun the configure - script. html2text is available from . - ]) + any2djvu was not found in your PATH. It is used in + the WebSubmit module to convert documents to DJVU. + Please install any2djvu. Otherwise you can safely continue. You have also an + option to install any2djvu later and edit invenio-local.conf to let + CDS Invenio know the path to any2djvu. + any2djvu is available from + .]) fi -## Check for Giftext: -AC_PATH_PROG(GIFTEXT, giftext) -if test -z "$GIFTEXT"; then +## Check for DJVUPS: +AC_PATH_PROG(DJVUPS, djvups) +if test -z "$DJVUPS"; then AC_MSG_WARN([ - Giftext was not found in your PATH. It is used in - the WebSubmit module to create an icon from a submitted picture. + djvups was not found in your PATH. It is used in + the WebSubmit module to convert documents from DJVU. + Please install djvups. Otherwise you can safely continue. You have also an + option to install djvups later and edit invenio-local.conf to let + CDS Invenio know the path to djvups. + djvups is available from + .]) +fi + +## Check for DJVUTXT: +AC_PATH_PROG(DJVUTXT, djvutxt) +if test -z "$DJVUTXT"; then + AC_MSG_WARN([ + djvutxt was not found in your PATH. It is used in + the WebSubmit module to extract text from DJVU documents. You can continue without it but you will miss some CDS Invenio functionality. We recommend you to install it first and to rerun - the configure script. Giftext is available from - .]) + the configure script. djvutxt is available from + .]) fi ## Check for file: AC_PATH_PROG(FILE, file) if test -z "$FILE"; then AC_MSG_WARN([ File was not found in your PATH. It is used in the WebSubmit module to check the validity of the submitted files. You can continue without it but you will miss some CDS Invenio functionality. We recommend you to install it first and to rerun the configure script. File is available from .]) fi ## Check for convert: AC_PATH_PROG(CONVERT, convert) if test -z "$CONVERT"; then AC_MSG_WARN([ Convert was not found in your PATH. It is used in the WebSubmit module to create an icon from a submitted picture. You can continue without it but you will miss some CDS Invenio functionality. We recommend you to install it first and to rerun the configure script. Convert is available from .]) fi ## Check for CLISP: AC_MSG_CHECKING(for clisp) AC_ARG_WITH(clisp, AC_HELP_STRING([--with-clisp], [path to a specific CLISP binary (optional)]), CLISP=${withval}) if test -n "$CLISP"; then AC_MSG_RESULT($CLISP) else AC_PATH_PROG(CLISP, clisp) if test -z "$CLISP"; then AC_MSG_WARN([ GNU CLISP was not found in your PATH. It is used by the WebStat module to produce statistics about CDS Invenio usage. (Alternatively, SBCL or CMUCL can be used instead of CLISP.) You can continue without it but you will miss this feature. We recommend you to install it first (if you don't have neither CMUCL nor SBCL) and to rerun the configure script. GNU CLISP is available from .]) fi fi ## Check for CMUCL: AC_MSG_CHECKING(for cmucl) AC_ARG_WITH(cmucl, AC_HELP_STRING([--with-cmucl], [path to a specific CMUCL binary (optional)]), CMUCL=${withval}) if test -n "$CMUCL"; then AC_MSG_RESULT($CMUCL) else AC_PATH_PROG(CMUCL, cmucl) if test -z "$CMUCL"; then AC_MSG_CHECKING(for lisp) # CMUCL can also be installed under `lisp' exec name AC_PATH_PROG(CMUCL, lisp) fi if test -z "$CMUCL"; then AC_MSG_WARN([ CMUCL was not found in your PATH. It is used by the WebStat module to produce statistics about CDS Invenio usage. (Alternatively, CLISP or SBCL can be used instead of CMUCL.) You can continue without it but you will miss this feature. We recommend you to install it first (if you don't have neither CLISP nor SBCL) and to rerun the configure script. CMUCL is available from .]) fi fi ## Check for SBCL: AC_MSG_CHECKING(for sbcl) AC_ARG_WITH(sbcl, AC_HELP_STRING([--with-sbcl], [path to a specific SBCL binary (optional)]), SBCL=${withval}) if test -n "$SBCL"; then AC_MSG_RESULT($SBCL) else AC_PATH_PROG(SBCL, sbcl) if test -z "$SBCL"; then AC_MSG_WARN([ SBCL was not found in your PATH. It is used by the WebStat module to produce statistics about CDS Invenio usage. (Alternatively, CLISP or CMUCL can be used instead of SBCL.) You can continue without it but you will miss this feature. We recommend you to install it first (if you don't have neither CLISP nor CMUCL) and to rerun the configure script. SBCL is available from .]) fi fi ## Check for gnuplot: AC_PATH_PROG(GNUPLOT, gnuplot) if test -z "$GNUPLOT"; then AC_MSG_WARN([ Gnuplot was not found in your PATH. It is used by the BibRank module to produce graphs about download and citation history. You can continue without it but you will miss these graphs. We recommend you to install it first and to rerun the configure script. Gnuplot is available from .]) fi ## Substitute variables: AC_SUBST(VERSION) +AC_SUBST(OPENOFFICE_PYTHON) AC_SUBST(MYSQL) -AC_SUBST(PHP) AC_SUBST(PYTHON) -AC_SUBST(CLIDIR) +AC_SUBST(GZIP) +AC_SUBST(GUNZIP) +AC_SUBST(TAR) +AC_SUBST(WGET) +AC_SUBST(MD5SUM) +AC_SUBST(PS2PDF) +AC_SUBST(GS) AC_SUBST(PDFTOTEXT) AC_SUBST(PDFTK) AC_SUBST(PDF2PS) +AC_SUBST(PDFTOPS) +AC_SUBST(PDFOPT) +AC_SUBST(PDFTOPPM) +AC_SUBST(OCROSCRIPT) AC_SUBST(PSTOTEXT) AC_SUBST(PSTOASCII) -AC_SUBST(ANTIWORD) -AC_SUBST(CATDOC) -AC_SUBST(WVTEXT) -AC_SUBST(PPTHTML) -AC_SUBST(XLHTML) -AC_SUBST(HTMLTOTEXT) -AC_SUBST(localstatedir, `eval echo "${localstatedir}"`) -AC_SUBST(CACHEDIR) +AC_SUBST(ANY2DJVU) +AC_SUBST(DJVUPS) +AC_SUBST(DJVUTXT) +AC_SUBST(FILE) +AC_SUBST(CONVERT) +AC_SUBST(GNUPLOT) AC_SUBST(CLISP) AC_SUBST(CMUCL) AC_SUBST(SBCL) -AC_SUBST(GNUPLOT) -AC_SUBST(DJPEG) -AC_SUBST(CONVERT) -AC_SUBST(GIFTEXT) -AC_SUBST(JPEGSIZE) -AC_SUBST(PNMSCALE) -AC_SUBST(PPMQUANT) -AC_SUBST(PPMTOGIF) -AC_SUBST(GIFINTER) -AC_SUBST(GIFRSIZE) +AC_SUBST(CACHEDIR) +AC_SUBST(localstatedir, `eval echo "${localstatedir}"`) ## Define output files: AC_CONFIG_FILES([config.nice \ Makefile \ - po/Makefile.in \ + po/Makefile.in \ config/Makefile \ config/invenio-autotools.conf \ modules/Makefile \ modules/bibcatalog/Makefile \ modules/bibcatalog/doc/Makefile \ modules/bibcatalog/doc/admin/Makefile \ modules/bibcatalog/doc/hacking/Makefile modules/bibcatalog/lib/Makefile \ modules/bibcheck/Makefile \ modules/bibcheck/doc/Makefile \ modules/bibcheck/doc/admin/Makefile \ modules/bibcheck/doc/hacking/Makefile \ modules/bibcheck/etc/Makefile \ modules/bibcheck/web/Makefile \ modules/bibcheck/web/admin/Makefile \ modules/bibcirculation/Makefile \ modules/bibcirculation/bin/Makefile \ modules/bibcirculation/doc/Makefile \ modules/bibcirculation/doc/admin/Makefile \ modules/bibcirculation/doc/hacking/Makefile modules/bibcirculation/lib/Makefile \ modules/bibcirculation/web/Makefile \ modules/bibcirculation/web/admin/Makefile \ modules/bibclassify/Makefile \ modules/bibclassify/bin/Makefile \ modules/bibclassify/bin/bibclassify \ modules/bibclassify/doc/Makefile \ modules/bibclassify/doc/admin/Makefile \ modules/bibclassify/doc/hacking/Makefile \ modules/bibclassify/lib/Makefile \ modules/bibconvert/Makefile \ modules/bibconvert/bin/Makefile \ modules/bibconvert/bin/bibconvert \ modules/bibconvert/doc/Makefile \ modules/bibconvert/doc/admin/Makefile \ modules/bibconvert/doc/hacking/Makefile \ modules/bibconvert/etc/Makefile \ modules/bibconvert/lib/Makefile \ modules/bibedit/Makefile \ modules/bibedit/bin/Makefile \ modules/bibedit/bin/bibedit \ modules/bibedit/bin/refextract \ modules/bibedit/bin/xmlmarc2textmarc \ modules/bibedit/bin/xmlmarclint \ modules/bibedit/doc/Makefile \ modules/bibedit/doc/admin/Makefile \ modules/bibedit/doc/hacking/Makefile \ modules/bibedit/etc/Makefile \ modules/bibedit/lib/Makefile \ modules/bibedit/web/Makefile \ modules/bibedit/web/admin/Makefile \ modules/bibexport/Makefile \ modules/bibexport/bin/Makefile \ modules/bibexport/bin/bibexport \ modules/bibexport/doc/Makefile \ modules/bibexport/doc/admin/Makefile \ modules/bibexport/doc/hacking/Makefile modules/bibexport/etc/Makefile \ modules/bibexport/lib/Makefile \ modules/bibexport/web/Makefile \ modules/bibexport/web/admin/Makefile \ modules/bibformat/Makefile \ modules/bibformat/bin/Makefile \ modules/bibformat/bin/bibreformat \ modules/bibformat/doc/Makefile \ modules/bibformat/doc/admin/Makefile \ modules/bibformat/doc/hacking/Makefile \ modules/bibformat/etc/Makefile \ modules/bibformat/etc/format_templates/Makefile \ modules/bibformat/etc/output_formats/Makefile \ modules/bibformat/lib/Makefile \ modules/bibformat/lib/elements/Makefile \ modules/bibformat/web/Makefile \ modules/bibformat/web/admin/Makefile \ modules/bibharvest/Makefile \ modules/bibharvest/bin/Makefile \ modules/bibharvest/bin/oairepositoryupdater \ modules/bibharvest/bin/oaiharvest \ modules/bibharvest/doc/Makefile \ modules/bibharvest/doc/admin/Makefile \ modules/bibharvest/doc/hacking/Makefile \ modules/bibharvest/lib/Makefile \ modules/bibharvest/web/Makefile \ modules/bibharvest/web/admin/Makefile \ modules/bibindex/Makefile \ modules/bibindex/bin/Makefile \ modules/bibindex/bin/bibindex \ modules/bibindex/bin/bibstat \ modules/bibindex/doc/Makefile \ modules/bibindex/doc/admin/Makefile \ modules/bibindex/doc/hacking/Makefile \ modules/bibindex/lib/Makefile \ modules/bibindex/web/Makefile \ modules/bibindex/web/admin/Makefile \ modules/bibknowledge/Makefile \ modules/bibknowledge/lib/Makefile \ modules/bibknowledge/doc/Makefile \ modules/bibknowledge/doc/admin/Makefile \ modules/bibknowledge/doc/hacking/Makefile \ modules/bibmatch/Makefile \ modules/bibmatch/bin/Makefile \ modules/bibmatch/bin/bibmatch \ modules/bibmatch/doc/Makefile \ modules/bibmatch/doc/admin/Makefile \ modules/bibmatch/etc/Makefile \ modules/bibmatch/lib/Makefile \ modules/bibmerge/Makefile \ modules/bibmerge/bin/Makefile \ modules/bibmerge/doc/Makefile \ modules/bibmerge/doc/admin/Makefile \ modules/bibmerge/doc/hacking/Makefile \ modules/bibmerge/lib/Makefile \ modules/bibmerge/web/Makefile \ modules/bibmerge/web/admin/Makefile \ modules/bibrank/Makefile \ modules/bibrank/bin/Makefile \ modules/bibrank/bin/bibrank \ modules/bibrank/bin/bibrankgkb \ modules/bibrank/doc/Makefile \ modules/bibrank/doc/admin/Makefile \ modules/bibrank/doc/hacking/Makefile \ modules/bibrank/etc/Makefile \ modules/bibrank/etc/bibrankgkb.cfg \ modules/bibrank/etc/demo_jif.cfg \ modules/bibrank/etc/template_single_tag_rank_method.cfg \ modules/bibrank/lib/Makefile \ modules/bibrank/web/Makefile \ modules/bibrank/web/admin/Makefile \ modules/bibsched/Makefile \ modules/bibsched/bin/Makefile \ modules/bibsched/bin/bibsched \ modules/bibsched/bin/bibtaskex \ modules/bibsched/doc/Makefile \ modules/bibsched/doc/admin/Makefile \ modules/bibsched/doc/hacking/Makefile \ modules/bibsched/lib/Makefile \ modules/bibupload/Makefile \ modules/bibupload/bin/Makefile \ modules/bibupload/bin/bibupload \ modules/bibupload/doc/Makefile \ modules/bibupload/doc/admin/Makefile \ modules/bibupload/doc/hacking/Makefile \ modules/bibupload/lib/Makefile \ modules/elmsubmit/Makefile \ modules/elmsubmit/bin/Makefile \ modules/elmsubmit/bin/elmsubmit \ modules/elmsubmit/doc/Makefile \ modules/elmsubmit/doc/admin/Makefile \ modules/elmsubmit/doc/hacking/Makefile \ modules/elmsubmit/etc/Makefile \ modules/elmsubmit/etc/elmsubmit.cfg \ modules/elmsubmit/lib/Makefile \ modules/miscutil/Makefile \ modules/miscutil/bin/Makefile \ modules/miscutil/bin/dbdump \ modules/miscutil/bin/dbexec \ modules/miscutil/bin/inveniocfg \ modules/miscutil/demo/Makefile \ modules/miscutil/doc/Makefile \ modules/miscutil/doc/hacking/Makefile \ modules/miscutil/lib/Makefile \ modules/miscutil/sql/Makefile \ modules/miscutil/web/Makefile \ modules/webaccess/Makefile \ modules/webaccess/bin/Makefile \ modules/webaccess/bin/authaction \ modules/webaccess/bin/webaccessadmin \ modules/webaccess/doc/Makefile \ modules/webaccess/doc/admin/Makefile \ modules/webaccess/doc/hacking/Makefile \ modules/webaccess/lib/Makefile \ modules/webaccess/web/Makefile \ modules/webaccess/web/admin/Makefile \ modules/webalert/Makefile \ modules/webalert/bin/Makefile \ modules/webalert/bin/alertengine \ modules/webalert/doc/Makefile \ modules/webalert/doc/admin/Makefile \ modules/webalert/doc/hacking/Makefile \ modules/webalert/lib/Makefile \ modules/webalert/web/Makefile \ modules/webbasket/Makefile \ modules/webbasket/doc/Makefile \ modules/webbasket/doc/admin/Makefile \ modules/webbasket/doc/hacking/Makefile \ modules/webbasket/lib/Makefile \ modules/webbasket/web/Makefile \ modules/webcomment/Makefile \ modules/webcomment/doc/Makefile \ modules/webcomment/doc/admin/Makefile \ modules/webcomment/doc/hacking/Makefile \ modules/webcomment/lib/Makefile \ modules/webcomment/web/Makefile \ modules/webcomment/web/admin/Makefile \ modules/webhelp/Makefile \ modules/webhelp/web/Makefile \ modules/webhelp/web/admin/Makefile \ modules/webhelp/web/admin/howto/Makefile \ modules/webhelp/web/hacking/Makefile \ modules/webjournal/Makefile \ modules/webjournal/etc/Makefile \ modules/webjournal/doc/Makefile \ modules/webjournal/doc/admin/Makefile \ modules/webjournal/doc/hacking/Makefile \ modules/webjournal/lib/Makefile \ modules/webjournal/lib/elements/Makefile \ modules/webjournal/lib/widgets/Makefile \ modules/webjournal/web/Makefile \ modules/webjournal/web/admin/Makefile \ modules/webmessage/Makefile \ modules/webmessage/bin/Makefile \ modules/webmessage/bin/webmessageadmin \ modules/webmessage/doc/Makefile \ modules/webmessage/doc/admin/Makefile \ modules/webmessage/doc/hacking/Makefile \ modules/webmessage/lib/Makefile \ modules/webmessage/web/Makefile \ modules/websearch/Makefile \ modules/websearch/bin/Makefile \ modules/websearch/bin/webcoll \ modules/websearch/doc/Makefile \ modules/websearch/doc/admin/Makefile \ modules/websearch/doc/hacking/Makefile \ modules/websearch/lib/Makefile \ modules/websearch/web/Makefile \ modules/websearch/web/admin/Makefile \ modules/websession/Makefile \ modules/websession/bin/Makefile \ modules/websession/bin/inveniogc \ modules/websession/doc/Makefile \ modules/websession/doc/admin/Makefile \ modules/websession/doc/hacking/Makefile \ modules/websession/lib/Makefile \ modules/websession/web/Makefile \ modules/webstat/Makefile \ modules/webstat/bin/Makefile \ modules/webstat/bin/webstat \ modules/webstat/bin/webstatadmin \ modules/webstat/doc/Makefile \ modules/webstat/doc/admin/Makefile \ modules/webstat/doc/hacking/Makefile \ modules/webstat/etc/Makefile \ modules/webstat/lib/Makefile \ modules/webstyle/Makefile \ modules/webstyle/bin/Makefile \ modules/webstyle/bin/webdoc \ modules/webstyle/css/Makefile \ modules/webstyle/doc/Makefile \ modules/webstyle/doc/admin/Makefile \ modules/webstyle/doc/hacking/Makefile \ modules/webstyle/etc/Makefile \ modules/webstyle/img/Makefile \ modules/webstyle/lib/Makefile \ modules/websubmit/Makefile \ modules/websubmit/bin/Makefile \ modules/websubmit/bin/bibdocfile \ modules/websubmit/doc/Makefile \ modules/websubmit/doc/admin/Makefile \ modules/websubmit/doc/hacking/Makefile \ modules/websubmit/etc/Makefile \ modules/websubmit/lib/Makefile \ modules/websubmit/lib/functions/Makefile \ modules/websubmit/web/Makefile \ modules/websubmit/web/admin/Makefile \ ]) ## Finally, write output files: AC_OUTPUT ## Write help: AC_MSG_RESULT([****************************************************************************]) AC_MSG_RESULT([** Your CDS Invenio installation is now ready for building. **]) AC_MSG_RESULT([** You have entered the following parameters: **]) AC_MSG_RESULT([** - CDS Invenio main install directory: ${prefix}]) AC_MSG_RESULT([** - Python executable: $PYTHON]) AC_MSG_RESULT([** - MySQL client executable: $MYSQL]) AC_MSG_RESULT([** - CLISP executable: $CLISP]) AC_MSG_RESULT([** - CMUCL executable: $CMUCL]) AC_MSG_RESULT([** - SBCL executable: $SBCL]) AC_MSG_RESULT([** Here are the steps to continue the building process: **]) AC_MSG_RESULT([** 1) Type 'make' to build your CDS Invenio system. **]) AC_MSG_RESULT([** 2) Type 'make install' to install your CDS Invenio system. **]) AC_MSG_RESULT([** After that you can start customizing your installation as documented **]) AC_MSG_RESULT([** in the INSTALL file (i.e. edit invenio.conf, run inveniocfg, etc). **]) AC_MSG_RESULT([** Good luck, and thanks for choosing CDS Invenio. **]) AC_MSG_RESULT([** -- CDS Development Group **]) AC_MSG_RESULT([****************************************************************************]) ## end of file diff --git a/modules/bibedit/lib/bibrecord.py b/modules/bibedit/lib/bibrecord.py index ed61affbe..53579811c 100644 --- a/modules/bibedit/lib/bibrecord.py +++ b/modules/bibedit/lib/bibrecord.py @@ -1,1518 +1,1525 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """BibRecord - XML MARC processing library for CDS Invenio. For API, see create_record(), record_get_field_instances() and friends in the source code of this file in the section entitled INTERFACE. Note: Does not access the database, the input is MARCXML only.""" ### IMPORT INTERESTING MODULES AND XML PARSERS import re import sys try: import psyco PSYCO_AVAILABLE = True except ImportError: PSYCO_AVAILABLE = False if sys.hexversion < 0x2040000: # pylint: disable-msg=W0622 from sets import Set as set # pylint: enable-msg=W0622 from invenio.bibrecord_config import CFG_MARC21_DTD, \ CFG_BIBRECORD_WARNING_MSGS, CFG_BIBRECORD_DEFAULT_VERBOSE_LEVEL, \ CFG_BIBRECORD_DEFAULT_CORRECT, CFG_BIBRECORD_PARSERS_AVAILABLE, \ InvenioBibRecordParserError, InvenioBibRecordFieldError from invenio.config import CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG from invenio.textutils import encode_for_xml # Some values used for the RXP parsing. TAG, ATTRS, CHILDREN = 0, 1, 2 # Find out about the best usable parser: AVAILABLE_PARSERS = [] # Do we remove singletons (empty tags)? CFG_BIBRECORD_KEEP_SINGLETONS = False try: import pyRXP if 'pyrxp' in CFG_BIBRECORD_PARSERS_AVAILABLE: AVAILABLE_PARSERS.append('pyrxp') except ImportError: pass try: import Ft.Xml.Domlette if '4suite' in CFG_BIBRECORD_PARSERS_AVAILABLE: AVAILABLE_PARSERS.append('4suite') except ImportError: pass try: import xml.dom.minidom import xml.parsers.expat if 'minidom' in CFG_BIBRECORD_PARSERS_AVAILABLE: AVAILABLE_PARSERS.append('minidom') except ImportError: pass ### INTERFACE / VISIBLE FUNCTIONS def create_field(subfields=None, ind1=' ', ind2=' ', controlfield_value='', global_position=-1): """ Returns a field created with the provided elements. Global position is set arbitrary to -1.""" if subfields is None: subfields = [] ind1, ind2 = _wash_indicators(ind1, ind2) field = (subfields, ind1, ind2, controlfield_value, global_position) _check_field_validity(field) return field def create_records(marcxml, verbose=CFG_BIBRECORD_DEFAULT_VERBOSE_LEVEL, correct=CFG_BIBRECORD_DEFAULT_CORRECT, parser=''): """Creates a list of records from the marcxml description. Returns a list of objects initiated by the function create_record(). Please see that function's docstring.""" # Use the DOTALL flag to include newlines. regex = re.compile('.*?', re.DOTALL) record_xmls = regex.findall(marcxml) return [create_record(record_xml, verbose=verbose, correct=correct, parser=parser) for record_xml in record_xmls] def create_record(marcxml, verbose=CFG_BIBRECORD_DEFAULT_VERBOSE_LEVEL, correct=CFG_BIBRECORD_DEFAULT_CORRECT, parser='', sort_fields_by_indicators=False): """Creates a record object from the marcxml description. Uses the best parser available in CFG_BIBRECORD_PARSERS_AVAILABLE or the parser specified. The returned object is a tuple (record, status_code, list_of_errors), where status_code is 0 when there are errors, 1 when no errors. The return record structure is as follows: Record := {tag : [Field]} Field := (Subfields, ind1, ind2, value) Subfields := [(code, value)] For example: ______ |record| ------ __________________________|_______________________________________ |record['001'] |record['909'] |record['520'] | | | | | [list of fields] [list of fields] [list of fields] ... | ______|______________ | |[0] |[0] |[1] | |[0] ___|_____ _____|___ ___|_____ ... ____|____ |Field 001| |Field 909| |Field 909| |Field 520| --------- --------- --------- --------- | _______________|_________________ | | ... |[0] |[1] |[2] | ... ... | | | | [list of subfields] 'C' '4' ___|__________________________________________ | | | ('a', 'value') ('b', 'value for subfield b') ('a', 'value for another a') @param marcxml: an XML string representation of the record to create @param verbose: the level of verbosity: 0 (silent), 1-2 (warnings), 3(strict:stop when errors) @param correct: 1 to enable correction of marcxml syntax. Else 0. @return: a tuple (record, status_code, list_of_errors), where status code is 0 where there are errors, 1 when no errors""" # Select the appropriate parser. parser = _select_parser(parser) try: if parser == 'pyrxp': rec = _create_record_rxp(marcxml, verbose, correct) elif parser == '4suite': rec = _create_record_4suite(marcxml) elif parser == 'minidom': rec = _create_record_minidom(marcxml) except InvenioBibRecordParserError, ex1: return (None, 0, str(ex1)) # _create_record = { # 'pyrxp': _create_record_rxp, # '4suite': _create_record_4suite, # 'minidom': _create_record_minidom, # } # try: # rec = _create_record[parser](marcxml, verbose) # except InvenioBibRecordParserError, ex1: # return (None, 0, str(ex1)) if sort_fields_by_indicators: _record_sort_by_indicators(rec) errs = [] if correct: # Correct the structure of the record. errs = _correct_record(rec) return (rec, errs and 0 or 1, errs) def record_get_field_instances(rec, tag="", ind1=" ", ind2=" "): """Returns the list of field instances for the specified tag and indicators of the record (rec). Returns empty list if not found. If tag is empty string, returns all fields Parameters (tag, ind1, ind2) can contain wildcard %. @param rec: a record structure as returned by create_record() @param tag: a 3 characters long string @param ind1: a 1 character long string @param ind2: a 1 character long string @param code: a 1 character long string @return: a list of field tuples (Subfields, ind1, ind2, value, field_position_global) where subfields is list of (code, value)""" if not tag: return rec.items() else: out = [] ind1, ind2 = _wash_indicators(ind1, ind2) if '%' in tag: # Wildcard in tag. Check all possible for field_tag in rec: if _tag_matches_pattern(field_tag, tag): for possible_field_instance in rec[field_tag]: if (ind1 in ('%', possible_field_instance[1]) and ind2 in ('%', possible_field_instance[2])): out.append(possible_field_instance) else: # Completely defined tag. Use dict for possible_field_instance in rec.get(tag, []): if (ind1 in ('%', possible_field_instance[1]) and ind2 in ('%', possible_field_instance[2])): out.append(possible_field_instance) return out def record_add_field(rec, tag, ind1=' ', ind2=' ', controlfield_value='', subfields=None, field_position_global=None, field_position_local=None): """ Adds a new field into the record. If field_position_global or field_position_local is specified then this method will insert the new field at the desired position. Otherwise a global field position will be computed in order to insert the field at the best position (first we try to keep the order of the tags and then we insert the field at the end of the fields with the same tag). If both field_position_global and field_position_local are present, then field_position_local takes precedence. @param rec: the record data structure @param tag: the tag of the field to be added @param ind1: the first indicator @param ind2: the second indicator @param controlfield_value: the value of the controlfield @param subfields: the subfields (a list of tuples (code, value)) @param field_position_global: the global field position (record wise) @param field_position_local: the local field position (tag wise) @return: the global field position of the newly inserted field or -1 if the operation failed """ error = validate_record_field_positions_global(rec) if error: # FIXME one should write a message here pass # Clean the parameters. if subfields is None: subfields = [] ind1, ind2 = _wash_indicators(ind1, ind2) if controlfield_value and (ind1 != ' ' or ind2 != ' ' or subfields): return -1 # Detect field number to be used for insertion: # Dictionaries for uniqueness. tag_field_positions_global = {}.fromkeys([field[4] for field in rec.get(tag, [])]) all_field_positions_global = {}.fromkeys([field[4] for fields in rec.values() for field in fields]) if field_position_global is None and field_position_local is None: # Let's determine the global field position of the new field. if tag in rec: try: field_position_global = max([field[4] for field in rec[tag]]) \ + 1 except IndexError: if tag_field_positions_global: field_position_global = max(tag_field_positions_global) + 1 elif all_field_positions_global: field_position_global = max(all_field_positions_global) + 1 else: field_position_global = 1 else: if tag in ('FMT', 'FFT'): # Add the new tag to the end of the record. if tag_field_positions_global: field_position_global = max(tag_field_positions_global) + 1 elif all_field_positions_global: field_position_global = max(all_field_positions_global) + 1 else: field_position_global = 1 else: # Insert the tag in an ordered way by selecting the # right global field position. immediate_lower_tag = '000' for rec_tag in rec: if (tag not in ('FMT', 'FFT') and immediate_lower_tag < rec_tag < tag): immediate_lower_tag = rec_tag if immediate_lower_tag == '000': field_position_global = 1 else: field_position_global = rec[immediate_lower_tag][-1][4] + 1 field_position_local = len(rec.get(tag, [])) _shift_field_positions_global(rec, field_position_global, 1) elif field_position_local is not None: if tag in rec: if field_position_local >= len(rec[tag]): field_position_global = rec[tag][-1][4] + 1 else: field_position_global = rec[tag][field_position_local][4] _shift_field_positions_global(rec, field_position_global, 1) else: if all_field_positions_global: field_position_global = max(all_field_positions_global) + 1 else: # Empty record. field_position_global = 1 elif field_position_global is not None: # If the user chose an existing global field position, shift all the # global field positions greater than the input global field position. if tag not in rec: if all_field_positions_global: field_position_global = max(all_field_positions_global) + 1 else: field_position_global = 1 field_position_local = 0 elif field_position_global < min(tag_field_positions_global): field_position_global = min(tag_field_positions_global) _shift_field_positions_global(rec, min(tag_field_positions_global), 1) field_position_local = 0 elif field_position_global > max(tag_field_positions_global): field_position_global = max(tag_field_positions_global) + 1 _shift_field_positions_global(rec, max(tag_field_positions_global) + 1, 1) field_position_local = len(rec.get(tag, [])) else: if field_position_global in tag_field_positions_global: _shift_field_positions_global(rec, field_position_global, 1) field_position_local = 0 for position, field in enumerate(rec[tag]): if field[4] == field_position_global + 1: field_position_local = position # Create the new field. newfield = (subfields, ind1, ind2, str(controlfield_value), field_position_global) rec.setdefault(tag, []).insert(field_position_local, newfield) # Return new field number: return field_position_global def record_has_field(rec, tag): """ Checks if the tag exists in the record. @param rec: the record data structure @param the: field @return: a boolean """ return tag in rec def record_delete_field(rec, tag, ind1=' ', ind2=' ', field_position_global=None, field_position_local=None): """ If global field position is specified, deletes the field with the corresponding global field position. If field_position_local is specified, deletes the field with the corresponding local field position and tag. Else deletes all the fields matching tag and optionally ind1 and ind2. If both field_position_global and field_position_local are present, then field_position_local takes precedence. @param rec: the record data structure @param tag: the tag of the field to be deleted @param ind1: the first indicator of the field to be deleted @param ind2: the second indicator of the field to be deleted @param field_position_global: the global field position (record wise) @param field_position_local: the local field position (tag wise) @return: the list of deleted fields """ error = validate_record_field_positions_global(rec) if error: # FIXME one should write a message here. pass if tag not in rec: return False ind1, ind2 = _wash_indicators(ind1, ind2) deleted = [] newfields = [] if field_position_global is None and field_position_local is None: # Remove all fields with tag 'tag'. for field in rec[tag]: if field[1] != ind1 or field[2] != ind2: newfields.append(field) else: deleted.append(field) rec[tag] = newfields elif field_position_global is not None: # Remove the field with 'field_position_global'. for field in rec[tag]: if (field[1] != ind1 and field[2] != ind2 or field[4] != field_position_global): newfields.append(field) else: deleted.append(field) rec[tag] = newfields elif field_position_local is not None: # Remove the field with 'field_position_local'. try: del rec[tag][field_position_local] except IndexError: return [] if not rec[tag]: # Tag is now empty, remove it. del rec[tag] return deleted def record_delete_fields(rec, tag, field_positions_local=None): """ - Delete all/some fields defined with MARC tag 'tag' and indicators - 'ind1' and 'ind2' from record 'rec'. If 'field_position_global' - and 'field_position_local' is None, then delete all the field - instances. Otherwise delete only the field instance corresponding - to given 'field_position_global' or 'field_position_local'. - - Returns True if fields were deleted, False otherwise. + Delete all/some fields defined with MARC tag 'tag' from record 'rec'. + + @param rec: a record structure. + @type rec: tuple + @param tag: three letter field. + @type tag: string + @param field_position_local: if set, it is the list of local positions + within all the fields with the specified tag, that should be deleted. + If not set all the fields with the specified tag will be deleted. + @type field_position_local: sequence + @return: the list of deleted fields. + @rtype: list + @note: the record is modified in place. """ if tag not in rec: return [] new_fields, deleted_fields = [], [] for position, field in enumerate(rec.get(tag, [])): if field_positions_local is None or position in field_positions_local: deleted_fields.append(field) else: new_fields.append(field) if new_fields: rec[tag] = new_fields else: del rec[tag] return deleted_fields def record_add_fields(rec, tag, fields, field_position_local=None, field_position_global=None): """ Adds the fields into the record at the required position. The position is specified by the tag and the field_position_local in the list of fields. - @param rec: a record structure @param tag: the tag of the fields + @param rec: a record structure + @param tag: the tag of the fields to be moved @param field_position_local: the field_position_local to which the field will be inserted. If not specified, appends the fields to the tag. @param a: list of fields to be added @return: -1 if the operation failed, or the field_position_local if it was successful """ if field_position_local is None and field_position_global is None: for field in fields: record_add_field(rec, tag, ind1=field[1], ind2=field[2], subfields=field[0], controlfield_value=field[3]) else: fields.reverse() for field in fields: record_add_field(rec, tag, ind1=field[1], ind2=field[2], subfields=field[0], controlfield_value=field[3], field_position_local=field_position_local, field_position_global=field_position_global) return field_position_local def record_move_fields(rec, tag, field_positions_local, field_position_local=None): """ Moves some fields to the position specified by 'field_position_local'. @param rec: a record structure as returned by create_record() @param tag: the tag of the fields to be moved @param field_positions_local: the positions of the fields to move @param field_position_local: insert the field before that field_position_local. If unspecified, appends the fields @return: the field_position_local is the operation was successful """ fields = record_delete_fields(rec, tag, field_positions_local=field_positions_local) return record_add_fields(rec, tag, fields, field_position_local=field_position_local) def record_delete_subfield(rec, tag, subfield_code, ind1=' ', ind2=' '): """Deletes all subfields with subfield_code in the record.""" ind1, ind2 = _wash_indicators(ind1, ind2) for field in rec.get(tag, []): if field[1] == ind1 and field[2] == ind2: field[0][:] = [subfield for subfield in field[0] if subfield_code != subfield[0]] def record_get_field(rec, tag, field_position_global=None, field_position_local=None): """ Returns the the matching field. One has to enter either a global field position or a local field position. @return: a list of subfield tuples (subfield code, value). @rtype: list """ if field_position_global is None and field_position_local is None: raise InvenioBibRecordFieldError("A field position is required to " "complete this operation.") elif field_position_global is not None and field_position_local is not None: raise InvenioBibRecordFieldError("Only one field position is required " "to complete this operation.") elif field_position_global: if not tag in rec: raise InvenioBibRecordFieldError("No tag '%s' in record." % tag) for field in rec[tag]: if field[4] == field_position_global: return field raise InvenioBibRecordFieldError("No field has the tag '%s' and the " "global field position '%d'." % (tag, field_position_global)) else: try: return rec[tag][field_position_local] except KeyError: raise InvenioBibRecordFieldError("No tag '%s' in record." % tag) except IndexError: raise InvenioBibRecordFieldError("No field has the tag '%s' and " "the local field position '%d'." % (tag, field_position_local)) def record_replace_field(rec, tag, new_field, field_position_global=None, field_position_local=None): """Replaces a field with a new field.""" if field_position_global is None and field_position_local is None: raise InvenioBibRecordFieldError("A field position is required to " "complete this operation.") elif field_position_global is not None and field_position_local is not None: raise InvenioBibRecordFieldError("Only one field position is required " "to complete this operation.") elif field_position_global: if not tag in rec: raise InvenioBibRecordFieldError("No tag '%s' in record." % tag) replaced = False for position, field in enumerate(rec[tag]): if field[4] == field_position_global: rec[tag][position] = new_field replaced = True if not replaced: raise InvenioBibRecordFieldError("No field has the tag '%s' and " "the global field position '%d'." % (tag, field_position_global)) else: try: rec[tag][field_position_local] = new_field except KeyError: raise InvenioBibRecordFieldError("No tag '%s' in record." % tag) except IndexError: raise InvenioBibRecordFieldError("No field has the tag '%s' and " "the local field position '%d'." % (tag, field_position_local)) def record_get_subfields(rec, tag, field_position_global=None, field_position_local=None): """ Returns the subfield of the matching field. One has to enter either a global field position or a local field position. @return: a list of subfield tuples (subfield code, value). @rtype: list """ field = record_get_field(rec, tag, field_position_global=field_position_global, field_position_local=field_position_local) return field[0] def record_delete_subfield_from(rec, tag, subfield_position, field_position_global=None, field_position_local=None): """Delete subfield from position specified by tag, field number and subfield position.""" subfields = record_get_subfields(rec, tag, field_position_global=field_position_global, field_position_local=field_position_local) try: del subfields[subfield_position] except IndexError: from invenio.xmlmarc2textmarclib import create_marc_record recordMarc = create_marc_record(rec, 0, {"text-marc": 1, "aleph-marc": 0}) raise InvenioBibRecordFieldError("The record : %(recordCode)s does not contain the subfield " "'%(subfieldIndex)s' inside the field (local: '%(fieldIndexLocal)s, global: '%(fieldIndexGlobal)s' ) of tag '%(tag)s'." % \ {"subfieldIndex" : subfield_position, \ "fieldIndexLocal" : str(field_position_local), \ "fieldIndexGlobal" : str(field_position_global), \ "tag" : tag, \ "recordCode" : recordMarc}) if not subfields: if field_position_global is not None: for position, field in enumerate(rec[tag]): if field[4] == field_position_global: del rec[tag][position] else: del rec[tag][field_position_local] if not rec[tag]: del rec[tag] def record_add_subfield_into(rec, tag, subfield_code, value, subfield_position=None, field_position_global=None, field_position_local=None): """Add subfield into position specified by tag, field number and optionally by subfield position.""" subfields = record_get_subfields(rec, tag, field_position_global=field_position_global, field_position_local=field_position_local) if subfield_position is None: subfields.append((subfield_code, value)) else: subfields.insert(subfield_position, (subfield_code, value)) def record_modify_controlfield(rec, tag, controlfield_value, field_position_global=None, field_position_local=None): """Modify controlfield at position specified by tag and field number.""" field = record_get_field(rec, tag, field_position_global=field_position_global, field_position_local=field_position_local) new_field = (field[0], field[1], field[2], controlfield_value, field[4]) record_replace_field(rec, tag, new_field, field_position_global=field_position_global, field_position_local=field_position_local) def record_modify_subfield(rec, tag, subfield_code, value, subfield_position, field_position_global=None, field_position_local=None): """Modify subfield at position specified by tag, field number and subfield position.""" subfields = record_get_subfields(rec, tag, field_position_global=field_position_global, field_position_local=field_position_local) try: subfields[subfield_position] = (subfield_code, value) except IndexError: raise InvenioBibRecordFieldError("There is no subfield with position " "'%d'." % subfield_position) def record_move_subfield(rec, tag, subfield_position, new_subfield_position, field_position_global=None, field_position_local=None): """Move subfield at position specified by tag, field number and subfield position to new subfield position.""" subfields = record_get_subfields(rec, tag, field_position_global=field_position_global, field_position_local=field_position_local) try: subfield = subfields.pop(subfield_position) subfields.insert(new_subfield_position, subfield) except IndexError: raise InvenioBibRecordFieldError("There is no subfield with position " "'%d'." % subfield_position) def record_get_field_value(rec, tag, ind1=" ", ind2=" ", code=""): """Returns first (string) value that matches specified field (tag, ind1, ind2, code) of the record (rec). Returns empty string if not found. Parameters (tag, ind1, ind2, code) can contain wildcard %. Difference between wildcard % and empty '': - Empty char specifies that we are not interested in a field which has one of the indicator(s)/subfield specified. - Wildcard specifies that we are interested in getting the value of the field whatever the indicator(s)/subfield is. For e.g. consider the following record in MARC: 100C5 $$a val1 555AB $$a val2 555AB val3 555 $$a val4 555A val5 >> record_get_field_value(record, '555', 'A', '', '') >> "val5" >> record_get_field_value(record, '555', 'A', '%', '') >> "val3" >> record_get_field_value(record, '555', 'A', '%', '%') >> "val2" >> record_get_field_value(record, '555', 'A', 'B', '') >> "val3" >> record_get_field_value(record, '555', '', 'B', 'a') >> "" >> record_get_field_value(record, '555', '', '', 'a') >> "val4" >> record_get_field_value(record, '555', '', '', '') >> "" >> record_get_field_value(record, '%%%', '%', '%', '%') >> "val1" @param rec: a record structure as returned by create_record() @param tag: a 3 characters long string @param ind1: a 1 character long string @param ind2: a 1 character long string @param code: a 1 character long string @return: string value (empty if nothing found)""" # Note: the code is quite redundant for speed reasons (avoid calling # functions or doing tests inside loops) ind1, ind2 = _wash_indicators(ind1, ind2) if '%' in tag: # Wild card in tag. Must find all corresponding fields if code == '': # Code not specified. for field_tag, fields in rec.items(): if _tag_matches_pattern(field_tag, tag): for field in fields: if ind1 in ('%', field[1]) and ind2 in ('%', field[2]): # Return matching field value if not empty if field[3]: return field[3] elif code == '%': # Code is wildcard. Take first subfield of first matching field for field_tag, fields in rec.items(): if _tag_matches_pattern(field_tag, tag): for field in fields: if (ind1 in ('%', field[1]) and ind2 in ('%', field[2]) and field[0]): return field[0][0][1] else: # Code is specified. Take corresponding one for field_tag, fields in rec.items(): if _tag_matches_pattern(field_tag, tag): for field in fields: if ind1 in ('%', field[1]) and ind2 in ('%', field[2]): for subfield in field[0]: if subfield[0] == code: return subfield[1] else: # Tag is completely specified. Use tag as dict key if tag in rec: if code == '': # Code not specified. for field in rec[tag]: if ind1 in ('%', field[1]) and ind2 in ('%', field[2]): # Return matching field value if not empty # or return "" empty if not exist. if field[3]: return field[3] elif code == '%': # Code is wildcard. Take first subfield of first matching field for field in rec[tag]: if (ind1 in ('%', field[1]) and ind2 in ('%', field[2]) and field[0]): return field[0][0][1] else: # Code is specified. Take corresponding one for field in rec[tag]: if ind1 in ('%', field[1]) and ind2 in ('%', field[2]): for subfield in field[0]: if subfield[0] == code: return subfield[1] # Nothing was found return "" def record_get_field_values(rec, tag, ind1=" ", ind2=" ", code=""): """Returns the list of (string) values for the specified field (tag, ind1, ind2, code) of the record (rec). Returns empty list if not found. Parameters (tag, ind1, ind2, code) can contain wildcard %. @param rec: a record structure as returned by create_record() @param tag: a 3 characters long string @param ind1: a 1 character long string @param ind2: a 1 character long string @param code: a 1 character long string @return: a list of strings""" tmp = [] ind1, ind2 = _wash_indicators(ind1, ind2) if '%' in tag: # Wild card in tag. Must find all corresponding tags and fields tags = [k for k in rec if _tag_matches_pattern(k, tag)] if code == '': # Code not specified. Consider field value (without subfields) for tag in tags: for field in rec[tag]: if (ind1 in ('%', field[1]) and ind2 in ('%', field[2]) and field[3]): tmp.append(field[3]) elif code == '%': # Code is wildcard. Consider all subfields for tag in tags: for field in rec[tag]: if ind1 in ('%', field[1]) and ind2 in ('%', field[2]): for subfield in field[0]: tmp.append(subfield[1]) else: # Code is specified. Consider all corresponding subfields for tag in tags: for field in rec[tag]: if ind1 in ('%', field[1]) and ind2 in ('%', field[2]): for subfield in field[0]: if subfield[0] == code: tmp.append(subfield[1]) else: # Tag is completely specified. Use tag as dict key if rec and tag in rec: if code == '': # Code not specified. Consider field value (without subfields) for field in rec[tag]: if (ind1 in ('%', field[1]) and ind2 in ('%', field[2]) and field[3]): tmp.append(field[3]) elif code == '%': # Code is wildcard. Consider all subfields for field in rec[tag]: if ind1 in ('%', field[1]) and ind2 in ('%', field[2]): for subfield in field[0]: tmp.append(subfield[1]) else: # Code is specified. Take corresponding one for field in rec[tag]: if ind1 in ('%', field[1]) and ind2 in ('%', field[2]): for subfield in field[0]: if subfield[0] == code: tmp.append(subfield[1]) # If tmp was not set, nothing was found return tmp def record_xml_output(rec, tags=None): """Generates the XML for record 'rec' and returns it as a string @rec: record @tags: list of tags to be printed""" if tags is None: tags = [] if isinstance(tags, str): tags = [tags] if tags and '001' not in tags: # Add the missing controlfield. tags.append('001') marcxml = [''] # Add the tag 'tag' to each field in rec[tag] fields = [] for tag in rec: if not tags or tag in tags: for field in rec[tag]: fields.append((tag, field)) record_order_fields(fields) for field in fields: marcxml.append(field_xml_output(field[1], field[0])) marcxml.append('') return '\n'.join(marcxml) def field_get_subfield_instances(field): """Returns the list of subfields associated with field 'field'""" return field[0] def field_get_subfield_values(field_instance, code): """Return subfield CODE values of the field instance FIELD.""" return [subfield_value for subfield_code, subfield_value in field_instance[0] if subfield_code == code] def field_add_subfield(field, code, value): """Adds a subfield to field 'field'""" field[0].append((code, value)) def record_order_fields(rec, fun="_order_by_ord"): """Orders field inside record 'rec' according to a function""" rec.sort(eval(fun)) def field_xml_output(field, tag): """Generates the XML for field 'field' and returns it as a string.""" marcxml = [] if field[3]: marcxml.append(' %s' % (tag, encode_for_xml(field[3]))) else: marcxml.append(' ' % (tag, field[1], field[2])) marcxml += [_subfield_xml_output(subfield) for subfield in field[0]] marcxml.append(' ') return '\n'.join(marcxml) def record_extract_oai_id(record): """Returns the OAI ID of the record.""" tag = CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3] ind1 = CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3] ind2 = CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4] subfield = CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5] values = record_get_field_values(record, tag, ind1, ind2, subfield) oai_id_regex = re.compile("oai[a-zA-Z0-9/.:]+") for value in [value.strip() for value in values]: if oai_id_regex.match(value): return value return "" def print_rec(rec, format=1, tags=None): """prints a record format = 1 -- XML format = 2 -- HTML (not implemented) @tags: list of tags to be printed """ if tags is None: tags = [] if format == 1: text = record_xml_output(rec, tags) else: return '' return text def print_recs(listofrec, format=1, tags=None): """prints a list of records format = 1 -- XML format = 2 -- HTML (not implemented) @tags: list of tags to be printed if 'listofrec' is not a list it returns empty string """ if tags is None: tags = [] text = "" if type(listofrec).__name__ !='list': return "" else: for rec in listofrec: text = "%s\n%s" % (text, print_rec(rec, format, tags)) return text def concat(alist): """Concats a list of lists""" newl = [] for l in alist: newl.extend(l) return newl def print_errors(alist): """Creates a unique string with the strings in list, using '\n' as a separator.""" text = "" for l in alist: text = '%s\n%s'% (text, l) return text def record_find_field(rec, tag, field, strict=False): """ Returns the global and local positions of the first occurrence of the field in a record. @param rec: A record dictionary structure @type rec: dictionary @param tag: The tag of the field to search for @type tag: string @param field: A field tuple as returned by create_field() @type field: tuple @param strict: A boolean describing the search method. If strict is False, then the order of the subfields doesn't matter. Default search method is strict. @type strict: boolean @return: A tuple of (global_position, local_position) or a tuple (None, None) if the field is not present. @rtype: tuple @raise InvenioBibRecordFieldError: If the provided field is invalid. """ try: _check_field_validity(field) except InvenioBibRecordFieldError: raise for local_position, field1 in enumerate(rec.get(tag, [])): if _compare_fields(field, field1, strict): return (field1[4], local_position) return (None, None) def record_strip_empty_volatile_subfields(rec): """ Removes unchanged volatile subfields from the record """ for tag in rec.keys(): for field in rec[tag]: field[0][:] = [subfield for subfield in field[0] if subfield[1][:9] != "VOLATILE:"] def record_strip_empty_fields(rec, tag=None): """ Removes empty subfields and fields from the record. If 'tag' is not None, only a specific tag of the record will be stripped, otherwise the whole record. @param rec: A record dictionary structure @type rec: dictionary @param tag: The tag of the field to strip empty fields from @type tag: string """ # Check whole record if tag is None: tags = rec.keys() for tag in tags: record_strip_empty_fields(rec, tag) # Check specific tag of the record elif tag in rec: # in case of a controlfield if tag[:2] == '00': if len(rec[tag]) == 0 or not rec[tag][0][3]: del rec[tag] #in case of a normal field else: fields = [] for field in rec[tag]: subfields = [] for subfield in field[0]: # check if the subfield has been given a value if subfield[1]: subfields.append(subfield) if len(subfields) > 0: new_field = create_field(subfields, field[1], field[2], field[3]) fields.append(new_field) if len(fields) > 0: rec[tag] = fields else: del rec[tag] ### IMPLEMENTATION / INVISIBLE FUNCTIONS def _compare_fields(field1, field2, strict=True): """ Compares 2 fields. If strict is True, then the order of the subfield will be taken care of, if not then the order of the subfields doesn't matter. @return: True if the field are equivalent, False otherwise. """ if strict: # Return a simple equal test on the field minus the position. return field1[:4] == field2[:4] else: if field1[1:4] != field2[1:4]: # Different indicators or controlfield value. return False else: # Compare subfields in a loose way. return set(field1[0]) == set(field2[0]) def _check_field_validity(field): """ Checks if a field is well-formed. @param field: A field tuple as returned by create_field() @type field: tuple @raise InvenioBibRecordFieldError: If the field is invalid. """ if type(field) not in (list, tuple): raise InvenioBibRecordFieldError("Field of type '%s' should be either " "a list or a tuple." % type(field)) if len(field) != 5: raise InvenioBibRecordFieldError("Field of length '%d' should have 5 " "elements." % len(field)) if type(field[0]) not in (list, tuple): raise InvenioBibRecordFieldError("Subfields of type '%s' should be " "either a list or a tuple." % type(field[0])) if type(field[1]) is not str: raise InvenioBibRecordFieldError("Indicator 1 of type '%s' should be " "a string." % type(field[1])) if type(field[2]) is not str: raise InvenioBibRecordFieldError("Indicator 2 of type '%s' should be " "a string." % type(field[2])) if type(field[3]) is not str: raise InvenioBibRecordFieldError("Controlfield value of type '%s' " "should be a string." % type(field[3])) if type(field[4]) is not int: raise InvenioBibRecordFieldError("Global position of type '%s' should " "be an int." % type(field[4])) for subfield in field[0]: if (type(subfield) not in (list, tuple) or len(subfield) != 2 or type(subfield[0]) is not str or type(subfield[1]) is not str): raise InvenioBibRecordFieldError("Subfields are malformed. " "Should a list of tuples of 2 strings.") def _shift_field_positions_global(record, start, delta=1): """Shifts all global field positions with global field positions higher or equal to 'start' from the value 'delta'.""" if not delta: return for tag, fields in record.items(): newfields = [] for field in fields: if field[4] < start: newfields.append(field) else: # Increment the global field position by delta. newfields.append(tuple(list(field[:4]) + [field[4] + delta])) record[tag] = newfields def _tag_matches_pattern(tag, pattern): """Returns true if MARC 'tag' matches a 'pattern'. 'pattern' is plain text, with % as wildcard Both parameters must be 3 characters long strings. For e.g. >> _tag_matches_pattern("909", "909") -> True >> _tag_matches_pattern("909", "9%9") -> True >> _tag_matches_pattern("909", "9%8") -> False @param tag: a 3 characters long string @param pattern: a 3 characters long string @return: False or True""" for char1, char2 in zip(tag, pattern): if char2 not in ('%', char1): return False return True def validate_record_field_positions_global(record): """ Checks if the global field positions in the record are valid ie no duplicate global field positions and local field positions in the list of fields are ascending. @param record: the record data structure @return: the first error found as a string or None if no error was found """ all_fields = [] for tag, fields in record.items(): previous_field_position_global = -1 for field in fields: if field[4] < previous_field_position_global: return "Non ascending global field positions in tag '%s'." % tag previous_field_position_global = field[4] if field[4] in all_fields: return ("Duplicate global field position '%d' in tag '%s'" % (field[4], tag)) def _record_sort_by_indicators(record): """Sorts the fields inside the record by indicators.""" for tag, fields in record.items(): record[tag] = _fields_sort_by_indicators(fields) def _fields_sort_by_indicators(fields): """Sorts a set of fields by their indicators. Returns a sorted list with correct global field positions.""" field_dict = {} field_positions_global = [] for field in fields: field_dict.setdefault(field[1:3], []).append(field) field_positions_global.append(field[4]) indicators = field_dict.keys() indicators.sort() field_list = [] for indicator in indicators: for field in field_dict[indicator]: field_list.append(field[:4] + (field_positions_global.pop(0),)) return field_list def _select_parser(parser=None): """Selects the more relevant parser based on the parsers available and on the parser desired by the user.""" if not AVAILABLE_PARSERS: # No parser is available. This is bad. return None if parser is None or parser not in AVAILABLE_PARSERS: # Return the best available parser. return AVAILABLE_PARSERS[0] else: return parser def _create_record_rxp(marcxml, verbose=CFG_BIBRECORD_DEFAULT_VERBOSE_LEVEL, correct=CFG_BIBRECORD_DEFAULT_CORRECT): """Creates a record object using the RXP parser. If verbose>3 then the parser will be strict and will stop in case of well-formedness errors or DTD errors. If verbose=0, the parser will not give warnings. If 0 < verbose <= 3, the parser will not give errors, but will warn the user about possible mistakes correct != 0 -> We will try to correct errors such as missing attributes correct = 0 -> there will not be any attempt to correct errors""" if correct: # Note that with pyRXP < 1.13 a memory leak has been found # involving DTD parsing. So enable correction only if you have # pyRXP 1.13 or greater. marcxml = ('\n' '\n' '\n%s\n' % (CFG_MARC21_DTD, marcxml)) # Create the pyRXP parser. pyrxp_parser = pyRXP.Parser(ErrorOnValidityErrors=0, ProcessDTD=1, ErrorOnUnquotedAttributeValues=0, srcName='string input') if verbose > 3: pyrxp_parser.ErrorOnValidityErrors = 1 pyrxp_parser.ErrorOnUnquotedAttributeValues = 1 try: root = pyrxp_parser.parse(marcxml) except pyRXP.error, ex1: raise InvenioBibRecordParserError(str(ex1)) # If record is enclosed in a collection tag, extract it. if root[TAG] == 'collection': children = _get_children_by_tag_name_rxp(root, 'record') if not children: return {} root = children[0] record = {} # This is needed because of the record_xml_output function, where we # need to know the order of the fields. field_position_global = 1 # Consider the control fields. for controlfield in _get_children_by_tag_name_rxp(root, 'controlfield'): if controlfield[CHILDREN]: value = ''.join([n for n in controlfield[CHILDREN]]) # Construct the field tuple. field = ([], ' ', ' ', value, field_position_global) record.setdefault(controlfield[ATTRS]['tag'], []).append(field) field_position_global += 1 elif CFG_BIBRECORD_KEEP_SINGLETONS: field = ([], ' ', ' ', '', field_position_global) record.setdefault(controlfield[ATTRS]['tag'], []).append(field) field_position_global += 1 # Consider the data fields. for datafield in _get_children_by_tag_name_rxp(root, 'datafield'): subfields = [] for subfield in _get_children_by_tag_name_rxp(datafield, 'subfield'): if subfield[CHILDREN]: value = ''.join([n for n in subfield[CHILDREN]]) subfields.append((subfield[ATTRS].get('code', '!'), value)) elif CFG_BIBRECORD_KEEP_SINGLETONS: subfields.append((subfield[ATTRS].get('code', '!'), '')) if subfields or CFG_BIBRECORD_KEEP_SINGLETONS: # Create the field. tag = datafield[ATTRS].get('tag', '!') ind1 = datafield[ATTRS].get('ind1', '!') ind2 = datafield[ATTRS].get('ind2', '!') ind1, ind2 = _wash_indicators(ind1, ind2) # Construct the field tuple. field = (subfields, ind1, ind2, '', field_position_global) record.setdefault(tag, []).append(field) field_position_global += 1 return record def _create_record_from_document(document): """Creates a record from the document (of type xml.dom.minidom.Document or Ft.Xml.Domlette.Document).""" root = None for node in document.childNodes: if node.nodeType == node.ELEMENT_NODE: root = node break if root is None: return {} if root.tagName == 'collection': children = _get_children_by_tag_name(root, 'record') if not children: return {} root = children[0] field_position_global = 1 record = {} for controlfield in _get_children_by_tag_name(root, "controlfield"): tag = controlfield.getAttributeNS(None, "tag").encode('utf-8') text_nodes = controlfield.childNodes value = ''.join([n.data for n in text_nodes]).encode("utf-8") if value or CFG_BIBRECORD_KEEP_SINGLETONS: field = ([], " ", " ", value, field_position_global) record.setdefault(tag, []).append(field) field_position_global += 1 for datafield in _get_children_by_tag_name(root, "datafield"): subfields = [] for subfield in _get_children_by_tag_name(datafield, "subfield"): text_nodes = subfield.childNodes value = ''.join([n.data for n in text_nodes]).encode("utf-8") if value or CFG_BIBRECORD_KEEP_SINGLETONS: code = subfield.getAttributeNS(None, 'code').encode("utf-8") subfields.append((code or '!', value)) if subfields or CFG_BIBRECORD_KEEP_SINGLETONS: tag = datafield.getAttributeNS(None, "tag").encode("utf-8") or '!' ind1 = datafield.getAttributeNS(None, "ind1").encode("utf-8") ind2 = datafield.getAttributeNS(None, "ind2").encode("utf-8") ind1, ind2 = _wash_indicators(ind1, ind2) field = (subfields, ind1, ind2, "", field_position_global) record.setdefault(tag, []).append(field) field_position_global += 1 return record def _create_record_minidom(marcxml): """Creates a record using minidom.""" try: dom = xml.dom.minidom.parseString(marcxml) except xml.parsers.expat.ExpatError, ex1: raise InvenioBibRecordParserError(str(ex1)) return _create_record_from_document(dom) def _create_record_4suite(marcxml): """Creates a record using the 4suite parser.""" try: dom = Ft.Xml.Domlette.NonvalidatingReader.parseString(marcxml, "urn:dummy") except Ft.Xml.ReaderException, ex1: raise InvenioBibRecordParserError(ex1.message) return _create_record_from_document(dom) def _concat(alist): """Concats a list of lists""" return [element for single_list in alist for element in single_list] def _subfield_xml_output(subfield): """Generates the XML for a subfield object and return it as a string""" return ' %s' % (subfield[0], encode_for_xml(subfield[1])) def _order_by_ord(field1, field2): """Function used to order the fields according to their ord value""" return cmp(field1[1][4], field2[1][4]) def _get_children_by_tag_name(node, name): """Retrieves all children from node 'node' with name 'name' and returns them as a list.""" try: return [child for child in node.childNodes if child.nodeName == name] except TypeError: return [] def _get_children_by_tag_name_rxp(node, name): """Retrieves all children from 'children' with tag name 'tag' and returns them as a list. children is a list returned by the RXP parser""" try: return [child for child in node[CHILDREN] if child[TAG] == name] except TypeError: return [] def _wash_indicators(*indicators): """ Washes the values of the indicators. An empty string or an underscore is replaced by a blank space. @param indicators: a series of indicators to be washed @return: a list of washed indicators """ return [indicator in ('', '_') and ' ' or indicator for indicator in indicators] def _correct_record(record): """ Checks and corrects the structure of the record. @param record: the record data structure @return: a list of errors found """ errors = [] for tag in record.keys(): upper_bound = '999' n = len(tag) if n > 3: i = n - 3 while i > 0: upper_bound = '%s%s' % ('0', upper_bound) i -= 1 # Missing tag. Replace it with dummy tag '000'. if tag == '!': errors.append((1, '(field number(s): ' + str([f[4] for f in record[tag]]) + ')')) record['000'] = record.pop(tag) tag = '000' elif not ('001' <= tag <= upper_bound or tag in ('FMT', 'FFT')): errors.append(2) record['000'] = record.pop(tag) tag = '000' fields = [] for field in record[tag]: # Datafield without any subfield. if field[0] == [] and field[3] == '': errors.append((8, '(field number: ' + str(field[4]) + ')')) subfields = [] for subfield in field[0]: if subfield[0] == '!': errors.append((3, '(field number: ' + str(field[4]) + ')')) newsub = ('', subfield[1]) else: newsub = subfield subfields.append(newsub) if field[1] == '!': errors.append((4, '(field number: ' + str(field[4]) + ')')) ind1 = " " else: ind1 = field[1] if field[2] == '!': errors.append((5, '(field number: ' + str(field[4]) + ')')) ind2 = " " else: ind2 = field[2] fields.append((subfields, ind1, ind2, field[3], field[4])) record[tag] = fields return errors def _warning(code): """It returns a warning message of code 'code'. If code = (cd, str) it returns the warning message of code 'cd' and appends str at the end""" if isinstance(code, str): return code message = '' if isinstance(code, tuple): if isinstance(code[0], str): message = code[1] code = code[0] return CFG_BIBRECORD_WARNING_MSGS.get(code, '') + message def _warnings(alist): """Applies the function _warning() to every element in l.""" return [_warning(element) for element in alist] def _compare_lists(list1, list2, custom_cmp): """Compares twolists using given comparing function @param list1: first list to compare @param list2: second list to compare @param custom_cmp: a function taking two arguments (element of list 1, element of list 2) and @return: True or False depending if the values are the same""" if len(list1) != len(list2): return False for element1, element2 in zip(list1, list2): if not custom_cmp(element1, element2): return False return True if PSYCO_AVAILABLE: psyco.bind(_correct_record) psyco.bind(_create_record_4suite) psyco.bind(_create_record_rxp) psyco.bind(_create_record_minidom) psyco.bind(field_get_subfield_values) psyco.bind(create_records) psyco.bind(create_record) psyco.bind(record_get_field_instances) psyco.bind(record_get_field_value) psyco.bind(record_get_field_values) diff --git a/modules/bibedit/lib/bibrecord_tests.py b/modules/bibedit/lib/bibrecord_tests.py index 0c48d09c9..54ca9eff3 100644 --- a/modules/bibedit/lib/bibrecord_tests.py +++ b/modules/bibedit/lib/bibrecord_tests.py @@ -1,1539 +1,1540 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ The BibRecord test suite. """ import unittest from invenio.config import CFG_TMPDIR from invenio import bibrecord, bibrecord_config from invenio.testutils import make_test_suite, run_test_suite try: import pyRXP parser_pyrxp_available = True except ImportError: parser_pyrxp_available = False try: import Ft.Xml.Domlette parser_4suite_available = True except ImportError: parser_4suite_available = False try: import xml.dom.minidom import xml.parsers.expat parser_minidom_available = True except ImportError: parser_minidom_available = False class BibRecordSuccessTest(unittest.TestCase): """ bibrecord - demo file parsing test """ def setUp(self): """Initialize stuff""" f = open(CFG_TMPDIR + '/demobibdata.xml', 'r') xmltext = f.read() f.close() self.recs = [rec[0] for rec in bibrecord.create_records(xmltext)] def test_records_created(self): """ bibrecord - demo file how many records are created """ - self.assertEqual(102, len(self.recs)) + self.assertEqual(104, len(self.recs)) def test_tags_created(self): """ bibrecord - demo file which tags are created """ ## check if the tags are correct - tags = [u'003', u'005', '020', '035', '037', '041', '080', '088', - '100', '242', '245', '246', '250', '260', '269', '270', '300', - '340', '490', '500', '502', '520', '590', '595', '650', '653', - '690', '695', '700', '710', '720', '773', '856', '859', '901', - '909', '916', '960', '961', '962', '963', '970', '980', '999', - 'FFT'] + tags = ['003', '005', '020', '024', '035', '037', '041', '080', '088', + '100', '242', '245', '246', '250', '260', '269', '270', '300', + '340', '490', '500', '502', '520', '590', '595', '650', '653', + '690', '694', '695', '700', '710', '720', '773', '856', '859', + '901', '909', '916', '960', '961', '962', '963', '964', '970', + '980', '999', 'FFT'] t = [] for rec in self.recs: t.extend(rec.keys()) t.sort() #eliminate the elements repeated tt = [] for x in t: if not x in tt: tt.append(x) self.assertEqual(tags, tt) def test_fields_created(self): """bibrecord - demo file how many fields are created""" ## check if the number of fields for each record is correct fields = [14, 14, 8, 11, 11, 12, 11, 15, 10, 18, 14, 16, 10, 9, 15, 10, - 11, 11, 11, 9, 11, 11, 10, 9, 9, 9, 10, 9, 10, 10, 8, 9, 8, 9, 14, - 13, 14, 14, 15, 12, 12, 12, 15, 14, 12, 16, 16, 15, 15, 14, 16, 15, - 15, 15, 16, 15, 16, 15, 15, 16, 15, 14, 14, 15, 12, 13, 11, 15, 8, - 11, 14, 13, 12, 13, 6, 6, 25, 24, 27, 26, 26, 24, 26, 27, 25, 28, - 24, 23, 27, 25, 25, 26, 26, 24, 19, 26, 9, 8, 9, 9, 8, 7] + 11, 11, 11, 9, 11, 11, 10, 9, 9, 9, 10, 9, 10, 10, 8, 9, 8, + 9, 14, 13, 14, 14, 15, 12, 12, 12, 15, 14, 12, 16, 16, 15, + 15, 14, 16, 15, 15, 15, 16, 15, 16, 15, 15, 16, 15, 14, 14, + 15, 12, 13, 11, 15, 8, 11, 14, 13, 12, 13, 6, 6, 25, 24, 27, + 26, 26, 24, 26, 27, 25, 28, 24, 23, 27, 25, 25, 26, 26, 24, + 19, 26, 25, 22, 9, 8, 9, 9, 8, 7] cr = [] ret = [] for rec in self.recs: cr.append(len(rec.values())) ret.append(rec) self.assertEqual(fields, cr) def test_create_record_with_collection_tag(self): """ bibrecord - create_record() for single record in collection""" xmltext = """ 33 eng """ record = bibrecord.create_record(xmltext) record1 = bibrecord.create_records(xmltext)[0] self.assertEqual(record1, record) class BibRecordParsersTest(unittest.TestCase): """ bibrecord - testing the creation of records with different parsers""" def setUp(self): """Initialize stuff""" self.xmltext = """ 33 eng """ self.expected_record = { '001': [([], ' ', ' ', '33', 1)], '041': [([('a', 'eng')], ' ', ' ', '', 2)] } if parser_pyrxp_available: def test_pyRXP(self): """ bibrecord - create_record() with pyRXP """ record = bibrecord._create_record_rxp(self.xmltext) self.assertEqual(record, self.expected_record) if parser_4suite_available: def test_4suite(self): """ bibrecord - create_record() with 4suite """ record = bibrecord._create_record_4suite(self.xmltext) self.assertEqual(record, self.expected_record) if parser_minidom_available: def test_minidom(self): """ bibrecord - create_record() with minidom """ record = bibrecord._create_record_minidom(self.xmltext) self.assertEqual(record, self.expected_record) class BibRecordBadInputTreatmentTest(unittest.TestCase): """ bibrecord - testing for bad input treatment """ def test_empty_collection(self): """bibrecord - empty collection""" xml_error0 = """""" rec = bibrecord.create_record(xml_error0)[0] self.assertEqual(rec, {}) records = bibrecord.create_records(xml_error0) self.assertEqual(len(records), 0) def test_wrong_attribute(self): """bibrecord - bad input subfield \'cde\' instead of \'code\'""" ws = bibrecord.CFG_BIBRECORD_WARNING_MSGS xml_error1 = """ 33 eng Doe, John On the foo and bar """ e = bibrecord.create_record(xml_error1, 1, 1)[2] ee ='' for i in e: if type(i).__name__ == 'str': if i.count(ws[3])>0: ee = i self.assertEqual(bibrecord._warning((3, '(field number: 4)')), ee) def test_missing_attribute(self): """ bibrecord - bad input missing \"tag\" """ ws = bibrecord.CFG_BIBRECORD_WARNING_MSGS xml_error2 = """ 33 eng Doe, John On the foo and bar """ e = bibrecord.create_record(xml_error2, 1, 1)[2] ee = '' for i in e: if type(i).__name__ == 'str': if i.count(ws[1])>0: ee = i self.assertEqual(bibrecord._warning((1, '(field number(s): [2])')), ee) def test_empty_datafield(self): """ bibrecord - bad input no subfield """ ws = bibrecord.CFG_BIBRECORD_WARNING_MSGS xml_error3 = """ 33 Doe, John On the foo and bar """ e = bibrecord.create_record(xml_error3, 1, 1)[2] ee = '' for i in e: if type(i).__name__ == 'str': if i.count(ws[8])>0: ee = i self.assertEqual(bibrecord._warning((8, '(field number: 2)')), ee) def test_missing_tag(self): """bibrecord - bad input missing end \"tag\" """ ws = bibrecord.CFG_BIBRECORD_WARNING_MSGS xml_error4 = """ 33 eng Doe, John On the foo and bar """ e = bibrecord.create_record(xml_error4, 1, 1)[2] ee = '' for i in e: if type(i).__name__ == 'str': if i.count(ws[99])>0: ee = i self.assertEqual(bibrecord._warning((99, '(Tagname : datafield)')), ee) class BibRecordAccentedUnicodeLettersTest(unittest.TestCase): """ bibrecord - testing accented UTF-8 letters """ def setUp(self): """Initialize stuff""" self.xml_example_record = """ 33 eng Döè1, John Doe2, J>ohn editor Пушкин On the foo and bar2 """ self.rec = bibrecord.create_record(self.xml_example_record, 1, 1)[0] def test_accented_unicode_characters(self): """bibrecord - accented Unicode letters""" self.assertEqual(self.xml_example_record, bibrecord.record_xml_output(self.rec)) self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", " ", " "), [([('a', 'Döè1, John')], " ", " ", "", 3), ([('a', 'Doe2, J>ohn'), ('b', 'editor')], " ", " ", "", 4)]) self.assertEqual(bibrecord.record_get_field_instances(self.rec, "245", " ", "1"), [([('a', 'Пушкин')], " ", '1', "", 5)]) class BibRecordGettingFieldValuesTest(unittest.TestCase): """ bibrecord - testing for getting field/subfield values """ def setUp(self): """Initialize stuff""" xml_example_record = """ 33 eng Doe1, John Doe2, John editor On the foo and bar1 On the foo and bar2 """ self.rec = bibrecord.create_record(xml_example_record, 1, 1)[0] def test_get_field_instances(self): """bibrecord - getting field instances""" self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", " ", " "), [([('a', 'Doe1, John')], " ", " ", "", 3), ([('a', 'Doe2, John'), ('b', 'editor')], " ", " ", "", 4)]) self.assertEqual(bibrecord.record_get_field_instances(self.rec, "", " ", " "), [('245', [([('a', 'On the foo and bar1')], " ", '1', "", 5), ([('a', 'On the foo and bar2')], " ", '2', "", 6)]), ('001', [([], " ", " ", '33', 1)]), ('100', [([('a', 'Doe1, John')], " ", " ", "", 3), ([('a', 'Doe2, John'), ('b', 'editor')], " ", " ", "", 4)]), ('041', [([('a', 'eng')], " ", " ", "", 2)])]) def test_get_field_values(self): """bibrecord - getting field values""" self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "a"), ['Doe1, John', 'Doe2, John']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "b"), ['editor']) def test_get_field_value(self): """bibrecord - getting first field value""" self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", " ", "a"), 'Doe1, John') self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", " ", "b"), 'editor') def test_get_subfield_values(self): """bibrecord - getting subfield values""" fi1, fi2 = bibrecord.record_get_field_instances(self.rec, "100", " ", " ") self.assertEqual(bibrecord.field_get_subfield_values(fi1, "b"), []) self.assertEqual(bibrecord.field_get_subfield_values(fi2, "b"), ["editor"]) class BibRecordGettingFieldValuesViaWildcardsTest(unittest.TestCase): """ bibrecord - testing for getting field/subfield values via wildcards """ def setUp(self): """Initialize stuff""" xml_example_record = """ 1 val1 val2 val3 val4a val4b val5 val6 val7a val7b """ self.rec = bibrecord.create_record(xml_example_record, 1, 1)[0] def test_get_field_instances_via_wildcard(self): """bibrecord - getting field instances via wildcards""" self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", " ", " "), []) self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", "%", " "), []) self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", "%", "%"), [([('a', 'val1')], 'C', '5', "", 2)]) self.assertEqual(bibrecord.record_get_field_instances(self.rec, "55%", "A", "%"), [([('a', 'val2')], 'A', 'B', "", 3), ([('a', 'val3')], 'A', " ", "", 4), ([('a', 'val6')], 'A', 'C', "", 7), ([('a', 'val7a'), ('b', 'val7b')], 'A', " ", "", 8)]) self.assertEqual(bibrecord.record_get_field_instances(self.rec, "55%", "A", " "), [([('a', 'val3')], 'A', " ", "", 4), ([('a', 'val7a'), ('b', 'val7b')], 'A', " ", "", 8)]) self.assertEqual(bibrecord.record_get_field_instances(self.rec, "556", "A", " "), [([('a', 'val7a'), ('b', 'val7b')], 'A', " ", "", 8)]) def test_get_field_values_via_wildcard(self): """bibrecord - getting field values via wildcards""" self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", " "), []) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", " ", " "), []) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", "%", " "), []) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", "%", " "), []) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", "%", "z"), []) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "%"), []) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "a"), []) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", " ", "a"), []) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", "%", "a"), ['val1']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", "%", "%"), ['val1']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", "%", "a"), ['val2', 'val3', 'val6', 'val7a']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", " ", "a"), ['val3', 'val7a']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "556", "A", " ", "a"), ['val7a']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "555", " ", " ", " "), []) self.assertEqual(bibrecord.record_get_field_values(self.rec, "555", " ", " ", "z"), []) self.assertEqual(bibrecord.record_get_field_values(self.rec, "555", " ", " ", "%"), ['val4a', 'val4b']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", " ", " ", "b"), ['val4b']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "%", "%", "b"), ['val4b', 'val7b']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", " ", "b"), ['val7b']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", "%", "b"), ['val7b']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", " ", "a"), ['val3', 'val7a']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", "%", "a"), ['val2', 'val3', 'val6', 'val7a']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "%", "%", "a"), ['val2', 'val3', 'val4a', 'val5', 'val6', 'val7a']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", " ", " ", "a"), ['val4a']) def test_get_field_value_via_wildcard(self): """bibrecord - getting first field value via wildcards""" self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", " ", " "), '') self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", "%", " ", " "), '') self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", "%", " "), '') self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", "%", "%", " "), '') self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", " ", "%"), '') self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", " ", "a"), '') self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", "%", " ", "a"), '') self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", "%", "%", "a"), 'val1') self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", "%", "%", "%"), 'val1') self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", "%", "a"), 'val2') self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", " ", "a"), 'val3') self.assertEqual(bibrecord.record_get_field_value(self.rec, "556", "A", " ", "a"), 'val7a') self.assertEqual(bibrecord.record_get_field_value(self.rec, "555", " ", " ", " "), '') self.assertEqual(bibrecord.record_get_field_value(self.rec, "555", " ", " ", "%"), 'val4a') self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", " ", " ", "b"), 'val4b') self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "%", "%", "b"), 'val4b') self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", " ", "b"), 'val7b') self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", "%", "b"), 'val7b') self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", " ", "a"), 'val3') self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", "%", "a"), 'val2') self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "%", "%", "a"), 'val2') self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", " ", " ", "a"), 'val4a') class BibRecordAddFieldTest(unittest.TestCase): """ bibrecord - testing adding field """ def setUp(self): """Initialize stuff""" xml_example_record = """ 33 eng Doe1, John Doe2, John editor On the foo and bar1 On the foo and bar2 """ self.rec = bibrecord.create_record(xml_example_record, 1, 1)[0] def test_add_controlfield(self): """bibrecord - adding controlfield""" field_position_global_1 = bibrecord.record_add_field(self.rec, "003", controlfield_value="SzGeCERN") field_position_global_2 = bibrecord.record_add_field(self.rec, "004", controlfield_value="Test") self.assertEqual(field_position_global_1, 2) self.assertEqual(field_position_global_2, 3) self.assertEqual(bibrecord.record_get_field_values(self.rec, "003", " ", " ", ""), ['SzGeCERN']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "004", " ", " ", ""), ['Test']) def test_add_datafield(self): """bibrecord - adding datafield""" field_position_global_1 = bibrecord.record_add_field(self.rec, "100", subfields=[('a', 'Doe3, John')]) field_position_global_2 = bibrecord.record_add_field(self.rec, "100", subfields= [('a', 'Doe4, John'), ('b', 'editor')]) self.assertEqual(field_position_global_1, 5) self.assertEqual(field_position_global_2, 6) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "a"), ['Doe1, John', 'Doe2, John', 'Doe3, John', 'Doe4, John']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "b"), ['editor', 'editor']) def test_add_controlfield_on_desired_position(self): """bibrecord - adding controlfield on desired position""" field_position_global_1 = bibrecord.record_add_field(self.rec, "005", controlfield_value="Foo", field_position_global=0) field_position_global_2 = bibrecord.record_add_field(self.rec, "006", controlfield_value="Bar", field_position_global=0) self.assertEqual(field_position_global_1, 7) self.assertEqual(field_position_global_2, 8) def test_add_datafield_on_desired_position_field_position_global(self): """bibrecord - adding datafield on desired global field position""" field_position_global_1 = bibrecord.record_add_field(self.rec, "100", subfields=[('a', 'Doe3, John')], field_position_global=0) field_position_global_2 = bibrecord.record_add_field(self.rec, "100", subfields=[('a', 'Doe4, John'), ('b', 'editor')], field_position_global=0) self.assertEqual(field_position_global_1, 3) self.assertEqual(field_position_global_2, 3) def test_add_datafield_on_desired_position_field_position_local(self): """bibrecord - adding datafield on desired local field position""" field_position_global_1 = bibrecord.record_add_field(self.rec, "100", subfields=[('a', 'Doe3, John')], field_position_local=0) field_position_global_2 = bibrecord.record_add_field(self.rec, "100", subfields=[('a', 'Doe4, John'), ('b', 'editor')], field_position_local=2) self.assertEqual(field_position_global_1, 3) self.assertEqual(field_position_global_2, 5) class BibRecordManageMultipleFieldsTest(unittest.TestCase): """ bibrecord - testing the management of multiple fields """ def setUp(self): """Initialize stuff""" xml_example_record = """ 33 subfield1 subfield2 subfield3 subfield4 """ self.rec = bibrecord.create_record(xml_example_record, 1, 1)[0] def test_delete_multiple_datafields(self): """bibrecord - deleting multiple datafields""" self.fields = bibrecord.record_delete_fields(self.rec, '245', [1, 2]) self.assertEqual(self.fields[0], ([('a', 'subfield2')], ' ', ' ', '', 3)) self.assertEqual(self.fields[1], ([('a', 'subfield3')], ' ', ' ', '', 4)) def test_add_multiple_datafields_default_index(self): """bibrecord - adding multiple fields with the default index""" fields = [([('a', 'subfield5')], ' ', ' ', '', 4), ([('a', 'subfield6')], ' ', ' ', '', 19)] index = bibrecord.record_add_fields(self.rec, '245', fields) self.assertEqual(index, None) self.assertEqual(self.rec['245'][-2], ([('a', 'subfield5')], ' ', ' ', '', 6)) self.assertEqual(self.rec['245'][-1], ([('a', 'subfield6')], ' ', ' ', '', 7)) def test_add_multiple_datafields_with_index(self): """bibrecord - adding multiple fields with an index""" fields = [([('a', 'subfield5')], ' ', ' ', '', 4), ([('a', 'subfield6')], ' ', ' ', '', 19)] index = bibrecord.record_add_fields(self.rec, '245', fields, field_position_local=0) self.assertEqual(index, 0) self.assertEqual(self.rec['245'][0], ([('a', 'subfield5')], ' ', ' ', '', 2)) self.assertEqual(self.rec['245'][1], ([('a', 'subfield6')], ' ', ' ', '', 3)) self.assertEqual(self.rec['245'][2], ([('a', 'subfield1')], ' ', ' ', '', 4)) def test_move_multiple_fields(self): """bibrecord - move multiple fields""" bibrecord.record_move_fields(self.rec, '245', [1, 3]) self.assertEqual(self.rec['245'][0], ([('a', 'subfield1')], ' ', ' ', '', 2)) self.assertEqual(self.rec['245'][1], ([('a', 'subfield3')], ' ', ' ', '', 4)) self.assertEqual(self.rec['245'][2], ([('a', 'subfield2')], ' ', ' ', '', 5)) self.assertEqual(self.rec['245'][3], ([('a', 'subfield4')], ' ', ' ', '', 6)) class BibRecordDeleteFieldTest(unittest.TestCase): """ bibrecord - testing field deletion """ def setUp(self): """Initialize stuff""" xml_example_record = """ 33 eng Doe1, John Doe2, John editor On the foo and bar1 On the foo and bar2 """ self.rec = bibrecord.create_record(xml_example_record, 1, 1)[0] xml_example_record_empty = """ """ self.rec_empty = bibrecord.create_record(xml_example_record_empty, 1, 1)[0] def test_delete_controlfield(self): """bibrecord - deleting controlfield""" bibrecord.record_delete_field(self.rec, "001", " ", " ") self.assertEqual(bibrecord.record_get_field_values(self.rec, "001", " ", " ", " "), []) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "b"), ['editor']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "2", "a"), ['On the foo and bar2']) def test_delete_datafield(self): """bibrecord - deleting datafield""" bibrecord.record_delete_field(self.rec, "100", " ", " ") self.assertEqual(bibrecord.record_get_field_values(self.rec, "001", " ", " ", ""), ['33']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "b"), []) bibrecord.record_delete_field(self.rec, "245", " ", " ") self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "1", "a"), ['On the foo and bar1']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "2", "a"), ['On the foo and bar2']) bibrecord.record_delete_field(self.rec, "245", " ", "2") self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "1", "a"), ['On the foo and bar1']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "2", "a"), []) def test_add_delete_add_field_to_empty_record(self): """bibrecord - adding, deleting, and adding back a field to an empty record""" field_position_global_1 = bibrecord.record_add_field(self.rec_empty, "003", controlfield_value="SzGeCERN") self.assertEqual(field_position_global_1, 1) self.assertEqual(bibrecord.record_get_field_values(self.rec_empty, "003", " ", " ", ""), ['SzGeCERN']) bibrecord.record_delete_field(self.rec_empty, "003", " ", " ") self.assertEqual(bibrecord.record_get_field_values(self.rec_empty, "003", " ", " ", ""), []) field_position_global_1 = bibrecord.record_add_field(self.rec_empty, "003", controlfield_value="SzGeCERN2") self.assertEqual(field_position_global_1, 1) self.assertEqual(bibrecord.record_get_field_values(self.rec_empty, "003", " ", " ", ""), ['SzGeCERN2']) class BibRecordDeleteFieldFromTest(unittest.TestCase): """ bibrecord - testing field deletion from position""" def setUp(self): """Initialize stuff""" xml_example_record = """ 33 eng Doe1, John Doe2, John editor On the foo and bar1 On the foo and bar2 """ self.rec = bibrecord.create_record(xml_example_record, 1, 1)[0] def test_delete_field_from(self): """bibrecord - deleting field from position""" bibrecord.record_delete_field(self.rec, "100", field_position_global=4) self.assertEqual(self.rec['100'], [([('a', 'Doe1, John')], ' ', ' ', '', 3)]) bibrecord.record_delete_field(self.rec, "100", field_position_global=3) self.failIf(self.rec.has_key('100')) bibrecord.record_delete_field(self.rec, "001", field_position_global=1) bibrecord.record_delete_field(self.rec, "245", field_position_global=6) self.failIf(self.rec.has_key('001')) self.assertEqual(self.rec['245'], [([('a', 'On the foo and bar1')], ' ', '1', '', 5)]) # Some crash tests bibrecord.record_delete_field(self.rec, '999', field_position_global=1) bibrecord.record_delete_field(self.rec, '245', field_position_global=999) class BibRecordAddSubfieldIntoTest(unittest.TestCase): """ bibrecord - testing subfield addition """ def setUp(self): """Initialize stuff""" xml_example_record = """ 33 eng Doe2, John editor On the foo and bar1 On the foo and bar2 """ self.rec = bibrecord.create_record(xml_example_record, 1, 1)[0] def test_add_subfield_into(self): """bibrecord - adding subfield into position""" bibrecord.record_add_subfield_into(self.rec, "100", "b", "Samekniv", field_position_global=3) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "b"), ['editor', 'Samekniv']) bibrecord.record_add_subfield_into(self.rec, "245", "x", "Elgokse", field_position_global=4) bibrecord.record_add_subfield_into(self.rec, "245", "x", "Fiskeflue", subfield_position=0, field_position_global=4) bibrecord.record_add_subfield_into(self.rec, "245", "z", "Ulriken", subfield_position=2, field_position_global=4) bibrecord.record_add_subfield_into(self.rec, "245", "z", "Stortinget", subfield_position=999, field_position_global=4) self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "1", "%"), ['Fiskeflue', 'On the foo and bar1', 'Ulriken', 'Elgokse', 'Stortinget']) # Some crash tests self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_add_subfield_into, self.rec, "187", "x", "Crash", field_position_global=1) self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_add_subfield_into, self.rec, "245", "x", "Crash", field_position_global=999) class BibRecordModifyControlfieldTest(unittest.TestCase): """ bibrecord - testing controlfield modification """ def setUp(self): """Initialize stuff""" xml_example_record = """ 33 A Foo's Tale Skeech Skeech Whoop Whoop eng On the foo and bar2 """ self.rec = bibrecord.create_record(xml_example_record, 1, 1)[0] def test_modify_controlfield(self): """bibrecord - modify controlfield""" bibrecord.record_modify_controlfield(self.rec, "001", "34", field_position_global=1) bibrecord.record_modify_controlfield(self.rec, "008", "Foo Foo", field_position_global=3) self.assertEqual(bibrecord.record_get_field_values(self.rec, "001"), ["34"]) self.assertEqual(bibrecord.record_get_field_values(self.rec, "005"), ["A Foo's Tale"]) self.assertEqual(bibrecord.record_get_field_values(self.rec, "008"), ["Foo Foo", "Whoop Whoop"]) # Some crash tests self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_modify_controlfield, self.rec, "187", "Crash", field_position_global=1) self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_modify_controlfield, self.rec, "008", "Test", field_position_global=10) self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_modify_controlfield, self.rec, "245", "Burn", field_position_global=5) self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "2", "%"), ["On the foo and bar2"]) class BibRecordModifySubfieldTest(unittest.TestCase): """ bibrecord - testing subfield modification """ def setUp(self): """Initialize stuff""" xml_example_record = """ 33 eng Doe2, John editor On the foo and bar1 On writing unit tests On the foo and bar2 """ self.rec = bibrecord.create_record(xml_example_record, 1, 1)[0] def test_modify_subfield(self): """bibrecord - modify subfield""" bibrecord.record_modify_subfield(self.rec, "245", "a", "Holmenkollen", 0, field_position_global=4) bibrecord.record_modify_subfield(self.rec, "245", "x", "Brann", 1, field_position_global=4) self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "1", "%"), ['Holmenkollen', 'Brann']) # Some crash tests self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_modify_subfield, self.rec, "187", "x", "Crash", 0, field_position_global=1) self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_modify_subfield, self.rec, "245", "x", "Burn", 1, field_position_global=999) self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_modify_subfield, self.rec, "245", "a", "Burn", 999, field_position_global=4) class BibRecordDeleteSubfieldFromTest(unittest.TestCase): """ bibrecord - testing subfield deletion """ def setUp(self): """Initialize stuff""" xml_example_record = """ 33 eng Doe2, John editor Skal vi danse? On the foo and bar1 On the foo and bar2 """ self.rec = bibrecord.create_record(xml_example_record, 1, 1)[0] def test_delete_subfield_from(self): """bibrecord - delete subfield from position""" bibrecord.record_delete_subfield_from(self.rec, "100", 2, field_position_global=3) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "z"), []) bibrecord.record_delete_subfield_from(self.rec, "100", 0, field_position_global=3) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "%"), ['editor']) bibrecord.record_delete_subfield_from(self.rec, "100", 0, field_position_global=3) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "%"), []) # Some crash tests self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_delete_subfield_from, self.rec, "187", 0, field_position_global=1) self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_delete_subfield_from, self.rec, "245", 0, field_position_global=999) self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_delete_subfield_from, self.rec, "245", 999, field_position_global=4) class BibRecordDeleteSubfieldTest(unittest.TestCase): """ bibrecord - testing subfield deletion """ def setUp(self): """Initialize stuff""" self.xml_example_record = """ 33 eng Doe2, John editor Skal vi danse? Doe3, Zbigniew Doe4, Joachim On the foo and bar1 On the foo and bar2 On the foo and bar1 On the foo and bar2 """ def test_simple_removals(self): """ bibrecord - delete subfield by its code""" # testing a simple removals where all the fields are removed rec = bibrecord.create_record(self.xml_example_record, 1, 1)[0] bibrecord.record_delete_subfield(rec, "041", "b") # nothing should change self.assertEqual(rec["041"][0][0], [("a", "eng")]) bibrecord.record_delete_subfield(rec, "041", "a") self.assertEqual(rec["041"][0][0], []) def test_indices_important(self): """ bibrecord - delete subfield where indices are important""" rec = bibrecord.create_record(self.xml_example_record, 1, 1)[0] bibrecord.record_delete_subfield(rec, "245", "a", " ", "1") self.assertEqual(rec["245"][0][0], []) self.assertEqual(rec["245"][1][0], [("a", "On the foo and bar2")]) bibrecord.record_delete_subfield(rec, "245", "a", " ", "2") self.assertEqual(rec["245"][1][0], []) def test_remove_some(self): """ bibrecord - delete subfield when some should be preserved and some removed""" rec = bibrecord.create_record(self.xml_example_record, 1, 1)[0] bibrecord.record_delete_subfield(rec, "100", "a", " ", " ") self.assertEqual(rec["100"][0][0], [("b", "editor"), ("z", "Skal vi danse?"), ("d", "Doe4, Joachim")]) def test_more_fields(self): """ bibrecord - delete subfield where more fits criteria""" rec = bibrecord.create_record(self.xml_example_record, 1, 1)[0] bibrecord.record_delete_subfield(rec, "246", "c", "1", "2") self.assertEqual(rec["246"][1][0], []) self.assertEqual(rec["246"][0][0], []) def test_nonexisting_removals(self): """ bibrecord - delete subfield that does not exist """ rec = bibrecord.create_record(self.xml_example_record, 1, 1)[0] # further preparation bibrecord.record_delete_subfield(rec, "100", "a", " ", " ") self.assertEqual(rec["100"][0][0], [("b", "editor"), ("z", "Skal vi danse?"), ("d", "Doe4, Joachim")]) #the real tests begin # 1) removing the subfield from an empty list of subfields bibrecord.record_delete_subfield(rec, "246", "c", "1", "2") self.assertEqual(rec["246"][1][0], []) self.assertEqual(rec["246"][0][0], []) bibrecord.record_delete_subfield(rec, "246", "8", "1", "2") self.assertEqual(rec["246"][1][0], []) self.assertEqual(rec["246"][0][0], []) # 2) removing a subfield from a field that has some subfields but none has an appropriate code bibrecord.record_delete_subfield(rec, "100", "a", " ", " ") self.assertEqual(rec["100"][0][0], [("b", "editor"), ("z", "Skal vi danse?"), ("d", "Doe4, Joachim")]) bibrecord.record_delete_subfield(rec, "100", "e", " ", " ") self.assertEqual(rec["100"][0][0], [("b", "editor"), ("z", "Skal vi danse?"), ("d", "Doe4, Joachim")]) class BibRecordMoveSubfieldTest(unittest.TestCase): """ bibrecord - testing subfield moving """ def setUp(self): """Initialize stuff""" xml_example_record = """ 33 eng Doe2, John editor fisk eple hammer On the foo and bar1 """ self.rec = bibrecord.create_record(xml_example_record, 1, 1)[0] def test_move_subfield(self): """bibrecord - move subfields""" bibrecord.record_move_subfield(self.rec, "100", 2, 4, field_position_global=3) bibrecord.record_move_subfield(self.rec, "100", 1, 0, field_position_global=3) bibrecord.record_move_subfield(self.rec, "100", 2, 999, field_position_global=3) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "%"), ['editor', 'Doe2, John', 'hammer', 'fisk', 'eple']) # Some crash tests self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_move_subfield, self.rec, "187", 0, 1, field_position_global=3) self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_move_subfield, self.rec, "100", 1, 0, field_position_global=999) self.assertRaises(bibrecord.InvenioBibRecordFieldError, bibrecord.record_move_subfield, self.rec, "100", 999, 0, field_position_global=3) class BibRecordSpecialTagParsingTest(unittest.TestCase): """ bibrecord - parsing special tags (FMT, FFT)""" def setUp(self): """setting up example records""" self.xml_example_record_with_fmt = """ 33 eng HB Let us see if this gets inserted well. """ self.xml_example_record_with_fft = """ 33 eng file:///foo.pdf http://bar.com/baz.ps.gz """ self.xml_example_record_with_xyz = """ 33 eng HB Let us see if this gets inserted well. """ def test_parsing_file_containing_fmt_special_tag_with_correcting(self): """bibrecord - parsing special FMT tag, correcting on""" rec = bibrecord.create_record(self.xml_example_record_with_fmt, 1, 1)[0] self.assertEqual(rec, {u'001': [([], " ", " ", '33', 1)], 'FMT': [([('f', 'HB'), ('g', 'Let us see if this gets inserted well.')], " ", " ", "", 3)], '041': [([('a', 'eng')], " ", " ", "", 2)]}) self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"), ['eng']) self.assertEqual(bibrecord.record_get_field_values(rec, "FMT", " ", " ", "f"), ['HB']) self.assertEqual(bibrecord.record_get_field_values(rec, "FMT", " ", " ", "g"), ['Let us see if this gets inserted well.']) def test_parsing_file_containing_fmt_special_tag_without_correcting(self): """bibrecord - parsing special FMT tag, correcting off""" rec = bibrecord.create_record(self.xml_example_record_with_fmt, 1, 0)[0] self.assertEqual(rec, {u'001': [([], " ", " ", '33', 1)], 'FMT': [([('f', 'HB'), ('g', 'Let us see if this gets inserted well.')], " ", " ", "", 3)], '041': [([('a', 'eng')], " ", " ", "", 2)]}) self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"), ['eng']) self.assertEqual(bibrecord.record_get_field_values(rec, "FMT", " ", " ", "f"), ['HB']) self.assertEqual(bibrecord.record_get_field_values(rec, "FMT", " ", " ", "g"), ['Let us see if this gets inserted well.']) def test_parsing_file_containing_fft_special_tag_with_correcting(self): """bibrecord - parsing special FFT tag, correcting on""" rec = bibrecord.create_record(self.xml_example_record_with_fft, 1, 1)[0] self.assertEqual(rec, {u'001': [([], " ", " ", '33', 1)], 'FFT': [([('a', 'file:///foo.pdf'), ('a', 'http://bar.com/baz.ps.gz')], " ", " ", "", 3)], '041': [([('a', 'eng')], " ", " ", "", 2)]}) self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"), ['eng']) self.assertEqual(bibrecord.record_get_field_values(rec, "FFT", " ", " ", "a"), ['file:///foo.pdf', 'http://bar.com/baz.ps.gz']) def test_parsing_file_containing_fft_special_tag_without_correcting(self): """bibrecord - parsing special FFT tag, correcting off""" rec = bibrecord.create_record(self.xml_example_record_with_fft, 1, 0)[0] self.assertEqual(rec, {u'001': [([], " ", " ", '33', 1)], 'FFT': [([('a', 'file:///foo.pdf'), ('a', 'http://bar.com/baz.ps.gz')], " ", " ", "", 3)], '041': [([('a', 'eng')], " ", " ", "", 2)]}) self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"), ['eng']) self.assertEqual(bibrecord.record_get_field_values(rec, "FFT", " ", " ", "a"), ['file:///foo.pdf', 'http://bar.com/baz.ps.gz']) def test_parsing_file_containing_xyz_special_tag_with_correcting(self): """bibrecord - parsing unrecognized special XYZ tag, correcting on""" # XYZ should not get accepted when correcting is on; should get changed to 000 rec = bibrecord.create_record(self.xml_example_record_with_xyz, 1, 1)[0] self.assertEqual(rec, {u'001': [([], " ", " ", '33', 1)], '000': [([('f', 'HB'), ('g', 'Let us see if this gets inserted well.')], " ", " ", "", 3)], '041': [([('a', 'eng')], " ", " ", "", 2)]}) self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"), ['eng']) self.assertEqual(bibrecord.record_get_field_values(rec, "XYZ", " ", " ", "f"), []) self.assertEqual(bibrecord.record_get_field_values(rec, "XYZ", " ", " ", "g"), []) self.assertEqual(bibrecord.record_get_field_values(rec, "000", " ", " ", "f"), ['HB']) self.assertEqual(bibrecord.record_get_field_values(rec, "000", " ", " ", "g"), ['Let us see if this gets inserted well.']) def test_parsing_file_containing_xyz_special_tag_without_correcting(self): """bibrecord - parsing unrecognized special XYZ tag, correcting off""" # XYZ should get accepted without correcting rec = bibrecord.create_record(self.xml_example_record_with_xyz, 1, 0)[0] self.assertEqual(rec, {u'001': [([], " ", " ", '33', 1)], 'XYZ': [([('f', 'HB'), ('g', 'Let us see if this gets inserted well.')], " ", " ", "", 3)], '041': [([('a', 'eng')], " ", " ", "", 2)]}) self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"), ['eng']) self.assertEqual(bibrecord.record_get_field_values(rec, "XYZ", " ", " ", "f"), ['HB']) self.assertEqual(bibrecord.record_get_field_values(rec, "XYZ", " ", " ", "g"), ['Let us see if this gets inserted well.']) class BibRecordPrintingTest(unittest.TestCase): """ bibrecord - testing for printing record """ def setUp(self): """Initialize stuff""" self.xml_example_record = """ 81 TEST-ARTICLE-2006-001 ARTICLE-2006-001 Test ti """ self.xml_example_record_short = """ 81 TEST-ARTICLE-2006-001 ARTICLE-2006-001 """ self.xml_example_multi_records = """ 81 TEST-ARTICLE-2006-001 ARTICLE-2006-001 Test ti 82 Author, t """ self.xml_example_multi_records_short = """ 81 TEST-ARTICLE-2006-001 ARTICLE-2006-001 82 """ def test_record_xml_output(self): """bibrecord - xml output""" rec = bibrecord.create_record(self.xml_example_record, 1, 1)[0] rec_short = bibrecord.create_record(self.xml_example_record_short, 1, 1)[0] self.assertEqual(bibrecord.create_record(bibrecord.record_xml_output(rec, tags=[]), 1, 1)[0], rec) self.assertEqual(bibrecord.create_record(bibrecord.record_xml_output(rec, tags=["001", "037"]), 1, 1)[0], rec_short) self.assertEqual(bibrecord.create_record(bibrecord.record_xml_output(rec, tags=["037"]), 1, 1)[0], rec_short) class BibRecordCreateFieldTest(unittest.TestCase): """ bibrecord - testing for creating field """ def test_create_valid_field(self): """bibrecord - create and check a valid field""" bibrecord.create_field() bibrecord.create_field([('a', 'testa'), ('b', 'testb')], '2', 'n', 'controlfield', 15) def test_invalid_field_raises_exception(self): """bibrecord - exception raised when creating an invalid field""" # Invalid subfields. self.assertRaises(bibrecord_config.InvenioBibRecordFieldError, bibrecord.create_field, 'subfields', '1', '2', 'controlfield', 10) self.assertRaises(bibrecord_config.InvenioBibRecordFieldError, bibrecord.create_field, ('1', 'value'), '1', '2', 'controlfield', 10) self.assertRaises(bibrecord_config.InvenioBibRecordFieldError, bibrecord.create_field, [('value')], '1', '2', 'controlfield', 10) self.assertRaises(bibrecord_config.InvenioBibRecordFieldError, bibrecord.create_field, [('1', 'value', '2')], '1', '2', 'controlfield', 10) # Invalid indicators. self.assertRaises(bibrecord_config.InvenioBibRecordFieldError, bibrecord.create_field, [], 1, '2', 'controlfield', 10) self.assertRaises(bibrecord_config.InvenioBibRecordFieldError, bibrecord.create_field, [], '1', 2, 'controlfield', 10) # Invalid controlfield value self.assertRaises(bibrecord_config.InvenioBibRecordFieldError, bibrecord.create_field, [], '1', '2', 13, 10) # Invalid global position self.assertRaises(bibrecord_config.InvenioBibRecordFieldError, bibrecord.create_field, [], '1', '2', 'controlfield', 'position') def test_compare_fields(self): """bibrecord - compare fields""" # Identical field0 = ([('a', 'test')], '1', '2', '', 0) field1 = ([('a', 'test')], '1', '2', '', 3) self.assertEqual(True, bibrecord._compare_fields(field0, field1, strict=True)) self.assertEqual(True, bibrecord._compare_fields(field0, field1, strict=False)) # Order of the subfields changed. field0 = ([('a', 'testa'), ('b', 'testb')], '1', '2', '', 0) field1 = ([('b', 'testb'), ('a', 'testa')], '1', '2', '', 3) self.assertEqual(False, bibrecord._compare_fields(field0, field1, strict=True)) self.assertEqual(True, bibrecord._compare_fields(field0, field1, strict=False)) # Different field0 = ([], '3', '2', '', 0) field1 = ([], '1', '2', '', 3) self.assertEqual(False, bibrecord._compare_fields(field0, field1, strict=True)) self.assertEqual(False, bibrecord._compare_fields(field0, field1, strict=False)) class BibRecordFindFieldTest(unittest.TestCase): """ bibrecord - testing for finding field """ def setUp(self): """Initialize stuff""" xml = """ 81 TEST-ARTICLE-2006-001 ARTICLE-2007-001 """ self.rec = bibrecord.create_record(xml)[0] self.field0 = self.rec['001'][0] self.field1 = self.rec['037'][0] self.field2 = ( [self.field1[0][1], self.field1[0][0]], self.field1[1], self.field1[2], self.field1[3], self.field1[4], ) def test_finding_field_strict(self): """bibrecord - test finding field strict""" self.assertEqual((1, 0), bibrecord.record_find_field(self.rec, '001', self.field0, strict=True)) self.assertEqual((2, 0), bibrecord.record_find_field(self.rec, '037', self.field1, strict=True)) self.assertEqual((None, None), bibrecord.record_find_field(self.rec, '037', self.field2, strict=True)) def test_finding_field_loose(self): """bibrecord - test finding field loose""" self.assertEqual((1, 0), bibrecord.record_find_field(self.rec, '001', self.field0, strict=False)) self.assertEqual((2, 0), bibrecord.record_find_field(self.rec, '037', self.field1, strict=False)) self.assertEqual((2, 0), bibrecord.record_find_field(self.rec, '037', self.field2, strict=False)) class BibRecordSingletonTest(unittest.TestCase): """ bibrecord - testing singleton removal """ def setUp(self): """Initialize stuff""" self.xml = """ 33 Some value """ self.rec_expected = { '001': [([], ' ', ' ', '33', 1)], '100': [([('a', 'Some value')], ' ', ' ', '', 2)], } if parser_minidom_available: def test_singleton_removal_minidom(self): """bibrecord - singleton removal with minidom""" rec = bibrecord.create_records(self.xml, verbose=1, correct=1, parser='minidom')[0][0] self.assertEqual(rec, self.rec_expected) if parser_4suite_available: def test_singleton_removal_4suite(self): """bibrecord - singleton removal with 4suite""" rec = bibrecord.create_records(self.xml, verbose=1, correct=1, parser='4suite')[0][0] self.assertEqual(rec, self.rec_expected) if parser_pyrxp_available: def test_singleton_removal_pyrxp(self): """bibrecord - singleton removal with pyrxp""" rec = bibrecord.create_records(self.xml, verbose=1, correct=1, parser='pyrxp')[0][0] self.assertEqual(rec, self.rec_expected) class BibRecordNumCharRefTest(unittest.TestCase): """ bibrecord - testing numerical character reference expansion""" def setUp(self): """Initialize stuff""" self.xml = """ 33 Σ & Σ use &amp; in XML """ self.rec_expected = { '001': [([], ' ', ' ', '33', 1)], '123': [([('a', '\xce\xa3 & \xce\xa3'), ('a', 'use & in XML'),], ' ', ' ', '', 2)], } if parser_minidom_available: def test_numcharref_expansion_minidom(self): """bibrecord - numcharref expansion with minidom""" rec = bibrecord.create_records(self.xml, verbose=1, correct=1, parser='minidom')[0][0] self.assertEqual(rec, self.rec_expected) if parser_4suite_available: def test_numcharref_expansion_4suite(self): """bibrecord - numcharref expansion with 4suite""" rec = bibrecord.create_records(self.xml, verbose=1, correct=1, parser='4suite')[0][0] self.assertEqual(rec, self.rec_expected) if parser_pyrxp_available: def test_numcharref_expansion_pyrxp(self): """bibrecord - but *no* numcharref expansion with pyrxp (see notes) FIXME: pyRXP does not seem to like num char ref entities, so this test is mostly left here in a TDD style in order to remind us of this fact. If we want to fix this situation, then we should probably use pyRXPU that uses Unicode strings internally, hence it is num char ref friendly. Maybe we should use pyRXPU by default, if performance is acceptable, or maybe we should introduce a flag to govern this behaviour. """ rec = bibrecord.create_records(self.xml, verbose=1, correct=1, parser='pyrxp')[0][0] #self.assertEqual(rec, self.rec_expected) self.assertEqual(rec, None) TEST_SUITE = make_test_suite( BibRecordSuccessTest, BibRecordParsersTest, BibRecordBadInputTreatmentTest, BibRecordGettingFieldValuesTest, BibRecordGettingFieldValuesViaWildcardsTest, BibRecordAddFieldTest, BibRecordDeleteFieldTest, BibRecordManageMultipleFieldsTest, BibRecordDeleteFieldFromTest, BibRecordAddSubfieldIntoTest, BibRecordModifyControlfieldTest, BibRecordModifySubfieldTest, BibRecordDeleteSubfieldFromTest, BibRecordMoveSubfieldTest, BibRecordAccentedUnicodeLettersTest, BibRecordSpecialTagParsingTest, BibRecordPrintingTest, BibRecordCreateFieldTest, BibRecordFindFieldTest, BibRecordDeleteSubfieldTest, BibRecordSingletonTest, BibRecordNumCharRefTest ) if __name__ == '__main__': run_test_suite(TEST_SUITE) diff --git a/modules/bibformat/etc/format_templates/Picture_HTML_detailed.bft b/modules/bibformat/etc/format_templates/Picture_HTML_detailed.bft index 8490a45e1..1e852b4ce 100644 --- a/modules/bibformat/etc/format_templates/Picture_HTML_detailed.bft +++ b/modules/bibformat/etc/format_templates/Picture_HTML_detailed.bft @@ -1,62 +1,62 @@ Picture HTML detailed The detailed HTML format suitable for displaying pictures.


- +
© CERN Geneva: The use of photos requires prior authorization (from CERN copyright). The words CERN Photo must be quoted for each use.
-
\ No newline at end of file + diff --git a/modules/bibformat/lib/bibformat_utils.py b/modules/bibformat/lib/bibformat_utils.py index 54444b827..be990b550 100644 --- a/modules/bibformat/lib/bibformat_utils.py +++ b/modules/bibformat/lib/bibformat_utils.py @@ -1,701 +1,692 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ Utilities for special formatting of records. API functions: highlight, get_contextual_content, encode_for_xml Used mainly by BibFormat elements. Depends on search_engine.py for record_exists() FIXME: currently copies record_exists() code from search engine. Refactor later. """ __revision__ = "$Id$" import re import zlib import os from invenio.config import \ CFG_OAI_ID_FIELD, \ CFG_WEBSEARCH_FULLTEXT_SNIPPETS, \ CFG_WEBSEARCH_FULLTEXT_SNIPPETS_WORDS, \ CFG_PATH_PDFTOTEXT from invenio.dbquery import run_sql from invenio.urlutils import string_to_numeric_char_reference from invenio.textutils import encode_for_xml from invenio.shellutils import run_shell_command def highlight(text, keywords=None, prefix_tag='', suffix_tag=""): """ Returns text with all words highlighted with given tags (this function places 'prefix_tag' and 'suffix_tag' before and after words from 'keywords' in 'text'). for example set prefix_tag='' and suffix_tag="" @param text: the text to modify @param keywords: a list of string @return: highlighted text """ if not keywords: return text #FIXME decide if non english accentuated char should be desaccentuaded def replace_highlight(match): """ replace match.group() by prefix_tag + match.group() + suffix_tag""" return prefix_tag + match.group() + suffix_tag #Build a pattern of the kind keyword1 | keyword2 | keyword3 pattern = '|'.join(keywords) compiled_pattern = re.compile(pattern, re.IGNORECASE) #Replace and return keywords with prefix+keyword+suffix return compiled_pattern.sub(replace_highlight, text) def get_contextual_content(text, keywords, max_lines=2): """ Returns some lines from a text contextually to the keywords in 'keywords_string' @param text: the text from which we want to get contextual content @param keywords: a list of keyword strings ("the context") @param max_lines: the maximum number of line to return from the record @return: a string """ def grade_line(text_line, keywords): """ Grades a line according to keywords. grade = number of keywords in the line """ grade = 0 for keyword in keywords: grade += text_line.upper().count(keyword.upper()) return grade #Grade each line according to the keywords lines = text.split('.') #print 'lines: ',lines weights = [grade_line(line, keywords) for line in lines] #print 'line weights: ', weights def grade_region(lines_weight): """ Grades a region. A region is a set of consecutive lines. grade = sum of weights of the line composing the region """ grade = 0 for weight in lines_weight: grade += weight return grade if max_lines > 1: region_weights = [] for index_weight in range(len(weights)- max_lines + 1): region_weights.append(grade_region(weights[index_weight:(index_weight+max_lines)])) weights = region_weights #print 'region weights: ',weights #Returns line with maximal weight, and (max_lines - 1) following lines. index_with_highest_weight = 0 highest_weight = 0 i = 0 for weight in weights: if weight > highest_weight: index_with_highest_weight = i highest_weight = weight i += 1 #print 'highest weight', highest_weight if index_with_highest_weight+max_lines > len(lines): return lines[index_with_highest_weight:] else: return lines[index_with_highest_weight:index_with_highest_weight+max_lines] def record_get_xml(recID, format='xm', decompress=zlib.decompress, on_the_fly=False): """ Returns an XML string of the record given by recID. The function builds the XML directly from the database, without using the standard formatting process. 'format' allows to define the flavour of XML: - 'xm' for standard XML - 'marcxml' for MARC XML - 'oai_dc' for OAI Dublin Core - 'xd' for XML Dublin Core If record does not exist, returns empty string. @param recID: the id of the record to retrieve @param on_the_fly: if False, try to fetch precreated one in database @return: the xml string of the record """ from invenio.search_engine import record_exists def get_fieldvalues(recID, tag): """Return list of field values for field TAG inside record RECID.""" out = [] if tag == "001___": # we have asked for recID that is not stored in bibXXx tables out.append(str(recID)) else: # we are going to look inside bibXXx tables digit = tag[0:2] bx = "bib%sx" % digit bibx = "bibrec_bib%sx" % digit query = "SELECT bx.value FROM %s AS bx, %s AS bibx WHERE bibx.id_bibrec='%s' AND bx.id=bibx.id_bibxxx AND bx.tag LIKE '%s'" \ "ORDER BY bibx.field_number, bx.tag ASC" % (bx, bibx, recID, tag) res = run_sql(query) for row in res: out.append(row[0]) return out def get_creation_date(recID, fmt="%Y-%m-%d"): "Returns the creation date of the record 'recID'." out = "" res = run_sql("SELECT DATE_FORMAT(creation_date,%s) FROM bibrec WHERE id=%s", (fmt, recID), 1) if res: out = res[0][0] return out def get_modification_date(recID, fmt="%Y-%m-%d"): "Returns the date of last modification for the record 'recID'." out = "" res = run_sql("SELECT DATE_FORMAT(modification_date,%s) FROM bibrec WHERE id=%s", (fmt, recID), 1) if res: out = res[0][0] return out #_ = gettext_set_language(ln) out = "" # sanity check: record_exist_p = record_exists(recID) if record_exist_p == 0: # doesn't exist return out # print record opening tags, if needed: if format == "marcxml" or format == "oai_dc": out += " \n" out += "
\n" for identifier in get_fieldvalues(recID, CFG_OAI_ID_FIELD): out += " %s\n" % identifier out += " %s\n" % get_modification_date(recID) out += "
\n" out += " \n" if format.startswith("xm") or format == "marcxml": res = None if on_the_fly == False: # look for cached format existence: query = """SELECT value FROM bibfmt WHERE id_bibrec='%s' AND format='%s'""" % (recID, format) res = run_sql(query, None, 1) if res and record_exist_p == 1: # record 'recID' is formatted in 'format', so print it out += "%s" % decompress(res[0][0]) else: # record 'recID' is not formatted in 'format' -- they are # not in "bibfmt" table; so fetch all the data from # "bibXXx" tables: if format == "marcxml": out += """ \n""" out += " %d\n" % int(recID) elif format.startswith("xm"): out += """ \n""" out += " %d\n" % int(recID) if record_exist_p == -1: # deleted record, so display only OAI ID and 980: oai_ids = get_fieldvalues(recID, CFG_OAI_ID_FIELD) if oai_ids: out += "%s\n" % \ (CFG_OAI_ID_FIELD[0:3], CFG_OAI_ID_FIELD[3:4], CFG_OAI_ID_FIELD[4:5], CFG_OAI_ID_FIELD[5:6], oai_ids[0]) out += "DELETED\n" else: # controlfields query = "SELECT b.tag,b.value,bb.field_number FROM bib00x AS b, bibrec_bib00x AS bb "\ "WHERE bb.id_bibrec='%s' AND b.id=bb.id_bibxxx AND b.tag LIKE '00%%' "\ "ORDER BY bb.field_number, b.tag ASC" % recID res = run_sql(query) for row in res: field, value = row[0], row[1] value = encode_for_xml(value) out += """ %s\n""" % \ (encode_for_xml(field[0:3]), value) # datafields i = 1 # Do not process bib00x and bibrec_bib00x, as # they are controlfields. So start at bib01x and # bibrec_bib00x (and set i = 0 at the end of # first loop) for digit1 in range(0, 10): for digit2 in range(i, 10): bx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) query = "SELECT b.tag,b.value,bb.field_number FROM %s AS b, %s AS bb "\ "WHERE bb.id_bibrec='%s' AND b.id=bb.id_bibxxx AND b.tag LIKE '%s%%' "\ "ORDER BY bb.field_number, b.tag ASC" % (bx, bibx, recID, str(digit1)+str(digit2)) res = run_sql(query) field_number_old = -999 field_old = "" for row in res: field, value, field_number = row[0], row[1], row[2] ind1, ind2 = field[3], field[4] if ind1 == "_" or ind1 == "": ind1 = " " if ind2 == "_" or ind2 == "": ind2 = " " # print field tag if field_number != field_number_old or \ field[:-1] != field_old[:-1]: if field_number_old != -999: out += """ \n""" out += """ \n""" % \ (encode_for_xml(field[0:3]), encode_for_xml(ind1), encode_for_xml(ind2)) field_number_old = field_number field_old = field # print subfield value value = encode_for_xml(value) out += """ %s\n""" % \ (encode_for_xml(field[-1:]), value) # all fields/subfields printed in this run, so close the tag: if field_number_old != -999: out += """ \n""" i = 0 # Next loop should start looking at bib%0 and bibrec_bib00x # we are at the end of printing the record: out += " \n" elif format == "xd" or format == "oai_dc": # XML Dublin Core format, possibly OAI -- select only some bibXXx fields: out += """ \n""" if record_exist_p == -1: out += "" else: for f in get_fieldvalues(recID, "041__a"): out += " %s\n" % f for f in get_fieldvalues(recID, "100__a"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "700__a"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "245__a"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "65017a"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "8564_u"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "520__a"): out += " %s\n" % encode_for_xml(f) out += " %s\n" % get_creation_date(recID) out += " \n" # print record closing tags, if needed: if format == "marcxml" or format == "oai_dc": out += " \n" out += "
\n" return out def parse_tag(tag): """ Parse a marc code and decompose it in a table with: 0-tag 1-indicator1 2-indicator2 3-subfield The first 3 chars always correspond to tag. The indicators are optional. However they must both be indicated, or both ommitted. If indicators are ommitted or indicated with underscore '_', they mean "No indicator". "No indicator" is also equivalent indicator marked as whitespace. The subfield is optional. It can optionally be preceded by a dot '.' or '$$' or '$' Any of the chars can be replaced by wildcard % THE FUNCTION DOES NOT CHECK WELLFORMNESS OF 'tag' Any empty chars is not considered For example: >> parse_tag('245COc') = ['245', 'C', 'O', 'c'] >> parse_tag('245C_c') = ['245', 'C', '', 'c'] >> parse_tag('245__c') = ['245', '', '', 'c'] >> parse_tag('245__$$c') = ['245', '', '', 'c'] >> parse_tag('245__$c') = ['245', '', '', 'c'] >> parse_tag('245 $c') = ['245', '', '', 'c'] >> parse_tag('245 $$c') = ['245', '', '', 'c'] >> parse_tag('245__.c') = ['245', '', '', 'c'] >> parse_tag('245 .c') = ['245', '', '', 'c'] >> parse_tag('245C_$c') = ['245', 'C', '', 'c'] >> parse_tag('245CO$$c') = ['245', 'C', 'O', 'c'] >> parse_tag('245C_.c') = ['245', 'C', '', 'c'] >> parse_tag('245$c') = ['245', '', '', 'c'] >> parse_tag('245.c') = ['245', '', '', 'c'] >> parse_tag('245$$c') = ['245', '', '', 'c'] >> parse_tag('245__%') = ['245', '', '', ''] >> parse_tag('245__$$%') = ['245', '', '', ''] >> parse_tag('245__$%') = ['245', '', '', ''] >> parse_tag('245 $%') = ['245', '', '', ''] >> parse_tag('245 $$%') = ['245', '', '', ''] >> parse_tag('245$%') = ['245', '', '', ''] >> parse_tag('245.%') = ['245', '', '', ''] >> parse_tag('245$$%') = ['245', '', '', ''] >> parse_tag('2%5$$a') = ['2%5', '', '', 'a'] """ p_tag = ['', '', '', ''] # tag, ind1, ind2, code tag = tag.replace(" ", "") # Remove empty characters tag = tag.replace("$", "") # Remove $ characters tag = tag.replace(".", "") # Remove . characters #tag = tag.replace("_", "") # Remove _ characters p_tag[0] = tag[0:3] # tag if len(tag) == 4: p_tag[3] = tag[3] # subfield elif len(tag) == 5: ind1 = tag[3] # indicator 1 if ind1 != "_": p_tag[1] = ind1 ind2 = tag[4] # indicator 2 if ind2 != "_": p_tag[2] = ind2 elif len(tag) == 6: p_tag[3] = tag[5] # subfield ind1 = tag[3] # indicator 1 if ind1 != "_": p_tag[1] = ind1 ind2 = tag[4] # indicator 2 if ind2 != "_": p_tag[2] = ind2 return p_tag def get_all_fieldvalues(recID, tags_in): """ Returns list of values that belong to fields in tags_in for record with given recID. Note that when a partial 'tags_in' is specified (eg. '100__'), the subfields of all corresponding datafields are returned all 'mixed' together. Eg. with: 123 100__ $a Ellis, J $u CERN 123 100__ $a Smith, K >> get_all_fieldvalues(123, '100__') ['Ellis, J', 'CERN', 'Smith, K'] """ out = [] if type(tags_in) is not list: tags_in = [tags_in, ] dict_of_tags_out = {} if not tags_in: for i in range(0, 10): for j in range(0, 10): dict_of_tags_out["%d%d%%" % (i, j)] = '%' else: for tag in tags_in: if len(tag) == 0: for i in range(0, 10): for j in range(0, 10): dict_of_tags_out["%d%d%%" % (i, j)] = '%' elif len(tag) == 1: for j in range(0, 10): dict_of_tags_out["%s%d%%" % (tag, j)] = '%' elif len(tag) <= 5: dict_of_tags_out["%s%%" % tag] = '%' else: dict_of_tags_out[tag[0:5]] = tag[5:6] tags_out = dict_of_tags_out.keys() tags_out.sort() # search all bibXXx tables as needed: for tag in tags_out: digits = tag[0:2] try: intdigits = int(digits) if intdigits < 0 or intdigits > 99: raise ValueError except ValueError: # invalid tag value asked for continue bx = "bib%sx" % digits bibx = "bibrec_bib%sx" % digits query = "SELECT b.tag,b.value,bb.field_number FROM %s AS b, %s AS bb "\ "WHERE bb.id_bibrec=%%s AND b.id=bb.id_bibxxx AND b.tag LIKE %%s"\ "ORDER BY bb.field_number, b.tag ASC" % (bx, bibx) res = run_sql(query, (recID, str(tag)+dict_of_tags_out[tag])) # go through fields: for row in res: field, value, field_number = row[0], row[1], row[2] out.append(value) return out re_bold_latex = re.compile('\$?\\\\textbf\{(?P.*?)\}\$?') re_emph_latex = re.compile('\$?\\\\emph\{(?P.*?)\}\$?') re_generic_start_latex = re.compile('\$?\\\\begin\{(?P.*?)\}\$?') re_generic_end_latex = re.compile('\$?\\\\end\{(?P.*?)\}\$?') re_verbatim_env_latex = re.compile('\\\\begin\{verbatim.*?\}(?P.*?)\\\\end\{verbatim.*?\}') def latex_to_html(text): """ Do some basic interpretation of LaTeX input. Gives some nice results when used in combination with JSMath. """ # Process verbatim environment first def make_verbatim(match_obj): """Replace all possible special chars by HTML character entities, so that they are not interpreted by further commands""" return '
' + \
                string_to_numeric_char_reference(match_obj.group('content')) + \
                '

' text = re_verbatim_env_latex.sub(make_verbatim, text) # Remove trailing "line breaks" text = text.strip('\\\\') # Process special characters text = text.replace("\\%", "%") text = text.replace("\\#", "#") text = text.replace("\\$", "$") text = text.replace("\\&", "&") text = text.replace("\\{", "{") text = text.replace("\\}", "}") text = text.replace("\\_", "_") text = text.replace("\\^{} ", "^") text = text.replace("\\~{} ", "~") text = text.replace("\\textregistered", "®") text = text.replace("\\copyright", "©") text = text.replace("\\texttrademark", "™ ") # Remove commented lines and join lines text = '\\\\'.join([line for line in text.split('\\\\') \ if not line.lstrip().startswith('%')]) # Line breaks text = text.replace('\\\\', '
') # Non-breakable spaces text = text.replace('~', ' ') # Styled text def make_bold(match_obj): "Make the found pattern bold" # FIXME: check if it is valid to have this inside a formula return '' + match_obj.group('content') + '' text = re_bold_latex.sub(make_bold, text) def make_emph(match_obj): "Make the found pattern emphasized" # FIXME: for the moment, remove as it could cause problem in # the case it is used in a formula. To be check if it is valid. return ' ' + match_obj.group('content') + '' text = re_emph_latex.sub(make_emph, text) # Lists text = text.replace('\\begin{enumerate}', '
    ') text = text.replace('\\end{enumerate}', '
') text = text.replace('\\begin{itemize}', '
    ') text = text.replace('\\end{itemize}', '
') text = text.replace('\\item', '
  • ') # Remove remaining non-processed tags text = re_generic_start_latex.sub('', text) text = re_generic_end_latex.sub('', text) return text def get_pdf_snippets(recID, patterns, nb_words_around=CFG_WEBSEARCH_FULLTEXT_SNIPPETS_WORDS, max_snippets=CFG_WEBSEARCH_FULLTEXT_SNIPPETS): """ Extract text snippets around 'patterns' from the newest PDF file of 'recID' The search is case-insensitive. The snippets are meant to look like in the results of the popular search engine: using " ... " between snippets. For empty patterns it returns "" """ from invenio.bibdocfile import BibRecDocs - path = pathTxt = "" - # After integration of Sam's branch, we shall use: - # for bd in BibRecDocs(recID).list_bibdocs(): - # text = bd.get_text() - # For the time being: - for bdf in BibRecDocs(recID).list_latest_files(): - if bdf.get_format() == '.pdf': - path = bdf.get_path() # to print filesystem path to PDF - break # stop at the first PDF file - - if path != "": - pathTxt = path.replace(".pdf", ".TMP.txt") - pathTxt = pathTxt.split(';')[0] - if not os.path.exists(pathTxt): - run_shell_command(CFG_PATH_PDFTOTEXT + " %s %s", [path, pathTxt]) - if os.path.exists(pathTxt): - return get_text_snippets(pathTxt, patterns, nb_words_around, max_snippets) - else: - return "" + text_path = "" + for bd in BibRecDocs(recID).list_bibdocs(): + if bd.get_text(): + text_path = bd.get_text_path() + break # stop at the first good PDF textable file + + if text_path: + return get_text_snippets(text_path, patterns, nb_words_around, max_snippets) + else: + return "" def get_text_snippets(textfile_path, patterns, nb_words_around, max_snippets): """ Extract text snippets around 'patterns' from file found at 'textfile_path' The snippets are meant to look like in the results of the popular search engine: using " ... " between snippets. For empty patterns it returns "" The idea is to first produce big snippets with grep and narrow them TODO: - compare stem versions instead of using startswith() - distinguish the beginning of sentences and try to make the snippets start there """ if len(patterns) == 0: return "" # the max number of words that can still be added to the snippet words_left = max_snippets * (nb_words_around * 2 + 1) # Assuming that there will be at least one word per line we can produce the # big snippets like this cmd = "grep -i -A%s -B%s -m%s" cmdargs = [str(nb_words_around), str(nb_words_around), str(max_snippets)] for p in patterns: cmd += " -e %s" cmdargs.append(p) cmd += " %s" cmdargs.append(textfile_path) (dummy1, output, dummy2) = run_shell_command(cmd, cmdargs) result = [] big_snippets = output.split("--") # cut the snippets to match the nb_words_around parameter precisely: for s in big_snippets: small_snippet = cut_out_snippet(s, patterns, nb_words_around, words_left) #count words words_left -= len(small_snippet.split()) #if words_left <= 0: #print "Error: snippet too long" result.append(small_snippet) # combine snippets out = "" for snippet in result: if out != "" and snippet != "": out += " ... " out += highlight(snippet, patterns) return "
    " + out + "
    " def cut_out_snippet(text, patterns, nb_words_around, max_words): # the snippet can include many occurances of the patterns if they are not # further appart than 2 * nb_words_around def starts_with_any(word, patterns): # Check whether the word's beginning matches any of the patterns. # The second argument is an array of patterns to match. ret = False lower_case = word.lower() for p in patterns: if lower_case.startswith(str(p).lower()): ret = True break return ret # make the nb_words_around smaller if required by max_words # to make sure that at least one pattern is included while nb_words_around * 2 + 1 > max_words: nb_words_around -= 1 if nb_words_around < 1: return "" snippet = "" words = text.split() last_written_word = -1 i = 0 while i < len(words): if starts_with_any(words[i], patterns): # add part before first or following occurance of a word j = max(last_written_word + 1, i - nb_words_around) while j < i: snippet += (" " + words[j]) j += 1 # write the pattern snippet += (" " + words[i]) last_written_word = i # write the suffix. If pattern found, break j = 1 while j <= nb_words_around and i + j < len(words): if starts_with_any(words[i+j], patterns): break else: snippet += (" " + words[i+j]) last_written_word = i + j j += 1 i += j else: i += 1 # apply max_words param if needed snippet_words = snippet.split() length = len(snippet_words) if (length > max_words): j = 0 shorter_snippet = "" while j < max_words: shorter_snippet += " " + snippet_words[j] j += 1 return shorter_snippet else: return snippet diff --git a/modules/bibformat/lib/elements/Makefile.am b/modules/bibformat/lib/elements/Makefile.am index daee07c3b..3c2f9ed2a 100644 --- a/modules/bibformat/lib/elements/Makefile.am +++ b/modules/bibformat/lib/elements/Makefile.am @@ -1,40 +1,40 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. pylibdir=$(libdir)/python/invenio/bibformat_elements pylib_DATA = bfe_field.py bfe_title.py bfe_authors.py bfe_abstract.py bfe_affiliation.py \ bfe_imprint.py bfe_fulltext.py bfe_place.py bfe_publisher.py bfe_topbanner.py \ bfe_date_rec.py bfe_keywords.py bfe_notes.py bfe_reprints.py bfe_publi_info.py \ - bfe_cited_by.py bfe_references.py bfe_photo_resources.py bfe_title_brief.py \ + bfe_cited_by.py bfe_references.py bfe_title_brief.py \ bfe_report_numbers.py bfe_additional_report_numbers.py bfe_url.py \ bfe_addresses.py bfe_contact.py bfe_photo_resources_brief.py \ bfe_collection.py bfe_editors.py bfe_bibtex.py bfe_edit_record.py \ bfe_date.py bfe_xml_record.py bfe_external_publications.py __init__.py \ bfe_bfx_engine.py bfe_creation_date.py bfe_server_info.py bfe_issn.py \ bfe_client_info.py bfe_language.py bfe_record_id.py bfe_comments.py \ bfe_pagination.py bfe_fulltext_mini.py bfe_year.py bfe_isbn.py \ bfe_appears_in_collections.py bfe_photos.py bfe_record_stats.py tmpdir = $(prefix)/var/tmp/tests_bibformat_elements tmp_DATA = test_1.py bfe_test_2.py bfe_test_4.py test3.py test_5.py \ test_no_element.test __init__.py EXTRA_DIST = $(pylib_DATA) $(tmp_DATA) CLEANFILES = *~ *.tmp *.pyc diff --git a/modules/bibformat/lib/elements/bfe_photo_resources.py b/modules/bibformat/lib/elements/bfe_photo_resources.py deleted file mode 100644 index dcf6e37a2..000000000 --- a/modules/bibformat/lib/elements/bfe_photo_resources.py +++ /dev/null @@ -1,47 +0,0 @@ -# -*- coding: utf-8 -*- -## -## This file is part of CDS Invenio. -## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. -## -## CDS Invenio is free software; you can redistribute it and/or -## modify it under the terms of the GNU General Public License as -## published by the Free Software Foundation; either version 2 of the -## License, or (at your option) any later version. -## -## CDS Invenio is distributed in the hope that it will be useful, but -## WITHOUT ANY WARRANTY; without even the implied warranty of -## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -## General Public License for more details. -## -## You should have received a copy of the GNU General Public License -## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., -## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. -"""BibFormat element - Prints HTML picture and links to resources -""" -__revision__ = "$Id$" - -def format(bfo): - """ - Prints html image and link to photo resources. - """ - - resources = bfo.fields("8564_", escape=1) - out = "" - for resource in resources: - - if resource.get("x", "") == "icon" and resource.get("u", "") == "": - out += '

    ' - - if resource.get("x", "") == "1": - out += '
    High resolution: '+ resource.get("q", "") +"" - - out += '
    © CERN Geneva' - out += '
    '+ bfo.field("8564_z") + "" - return out - -def escape_values(bfo): - """ - Called by BibFormat in order to check if output of this element - should be escaped. - """ - return 0 diff --git a/modules/bibformat/lib/elements/bfe_photo_resources_brief.py b/modules/bibformat/lib/elements/bfe_photo_resources_brief.py index af9bee08e..f21d55ddb 100644 --- a/modules/bibformat/lib/elements/bfe_photo_resources_brief.py +++ b/modules/bibformat/lib/elements/bfe_photo_resources_brief.py @@ -1,45 +1,45 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """BibFormat element - Prints brief HTML picture and links to resources """ __revision__ = "$Id$" def format(bfo): """ Prints html image and link to photo resources. """ from invenio.config import CFG_SITE_URL resources = bfo.fields("8564_") out = "" for resource in resources: - if resource.get("x", "") == "icon" and resource.get("u", "") == "": + if resource.get("x", "") == "icon": out += '' return out def escape_values(bfo): """ Called by BibFormat in order to check if output of this element should be escaped. """ return 0 diff --git a/modules/bibformat/lib/elements/bfe_photos.py b/modules/bibformat/lib/elements/bfe_photos.py index 6acbd8634..a41112116 100644 --- a/modules/bibformat/lib/elements/bfe_photos.py +++ b/modules/bibformat/lib/elements/bfe_photos.py @@ -1,56 +1,66 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """BibFormat element - Print photos of the record (if bibdoc file) """ -from invenio.bibdocfile import BibRecDocs + +import cgi +from invenio.bibdocfile import BibRecDocs, get_subformat_from_format def format(bfo, separator=" ", style='', print_links='yes'): """ Lists the photos of a record. Display the icon version, linked to its original version. This element works for photos appended to a record as BibDoc files, for which a preview icon has been generated. If there are several formats for one photo, use the first one found. @param separator: separator between each photo @param print_links: if 'yes', print links to the original photo @param style: style attributes of the images. Eg: "width:50px;border:none" """ photos = [] bibarchive = BibRecDocs(bfo.recID) for doc in bibarchive.list_bibdocs(): - if doc.get_icon() is not None: - original_url = doc.list_latest_files()[0].get_url() - icon_url = doc.get_icon().list_latest_files()[0].get_url() + found_url = '' + found_icon = '' + for docfile in doc.list_latest_files(): + if docfile.is_icon(): + if not found_icon: + found_icon = docfile.get_url() + else: + if not found_url: + found_url = docfile.get_url() + + if found_icon: name = doc.get_docname() - img = '%s' % (icon_url, name, style) + img = '%s' % (cgi.escape(found_icon, True), cgi.escape(name, True), cgi.escape(style, True)) if print_links.lower() == 'yes': - img = '%s' % (original_url, img) + img = '%s' % (cgi.escape(found_url, True), img) photos.append(img) return separator.join(photos) def escape_values(bfo): """ Called by BibFormat in order to check if output of this element should be escaped. """ return 0 diff --git a/modules/bibindex/lib/bibindex_engine.py b/modules/bibindex/lib/bibindex_engine.py index 8b0d51adc..9f525eced 100644 --- a/modules/bibindex/lib/bibindex_engine.py +++ b/modules/bibindex/lib/bibindex_engine.py @@ -1,1694 +1,1649 @@ # -*- coding: utf-8 -*- ## BibIndxes bibliographic data, reference and fulltext indexing utility. ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ BibIndex indexing engine implementation. See bibindex executable for entry point. """ __revision__ = "$Id$" import os import re import sys import time import urllib2 import tempfile from invenio.config import \ CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS, \ CFG_BIBINDEX_CHARS_PUNCTUATION, \ CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY, \ CFG_BIBINDEX_MIN_WORD_LENGTH, \ CFG_BIBINDEX_REMOVE_HTML_MARKUP, \ CFG_BIBINDEX_REMOVE_LATEX_MARKUP, \ CFG_SITE_URL, CFG_TMPDIR, \ - CFG_CERN_SITE, CFG_INSPIRE_SITE + CFG_CERN_SITE, CFG_INSPIRE_SITE, \ + CFG_BIBINDEX_PERFORM_OCR_ON_DOCNAMES, \ + CFG_BIBINDEX_SPLASH_PAGES +from invenio.websubmit_config import CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM from invenio.bibindex_engine_config import CFG_MAX_MYSQL_THREADS, \ - CFG_MYSQL_THREAD_TIMEOUT, CONV_PROGRAMS, CONV_PROGRAMS_HELPERS, \ + CFG_MYSQL_THREAD_TIMEOUT, \ CFG_CHECK_MYSQL_THREADS from invenio.bibindex_engine_tokenizer import BibIndexFuzzyNameTokenizer, \ BibIndexExactNameTokenizer from invenio.bibdocfile import bibdocfile_url_to_fullpath, bibdocfile_url_p, \ - decompose_bibdocfile_url + decompose_bibdocfile_url, bibdocfile_url_to_bibdoc, normalize_format, \ + decompose_file, download_url, guess_format_from_url +from invenio.websubmit_file_converter import convert_file from invenio.search_engine import perform_request_search, strip_accents, \ wash_index_term, lower_index_term, get_index_stemming_language from invenio.dbquery import run_sql, DatabaseError, serialize_via_marshal, \ deserialize_via_marshal from invenio.bibindex_engine_stopwords import is_stopword from invenio.bibindex_engine_stemmer import stem from invenio.bibtask import task_init, write_message, get_datetime, \ task_set_option, task_get_option, task_get_task_param, task_update_status, \ task_update_progress, task_sleep_now_if_required from invenio.intbitset import intbitset from invenio.errorlib import register_exception from invenio.shellutils import escape_shell_arg from invenio.htmlutils import remove_html_markup # FIXME: journal tag and journal pubinfo standard format are defined here: if CFG_CERN_SITE: CFG_JOURNAL_TAG = '773__%' CFG_JOURNAL_PUBINFO_STANDARD_FORM = "773__p 773__v (773__y) 773__c" elif CFG_INSPIRE_SITE: CFG_JOURNAL_TAG = '773__%' CFG_JOURNAL_PUBINFO_STANDARD_FORM = "773__p,773__v,773__c" else: CFG_JOURNAL_TAG = '909C4%' CFG_JOURNAL_PUBINFO_STANDARD_FORM = "909C4p 909C4v (909C4y) 909C4c" ## precompile some often-used regexp for speed reasons: re_subfields = re.compile('\$\$\w') re_block_punctuation_begin = re.compile(r"^"+CFG_BIBINDEX_CHARS_PUNCTUATION+"+") re_block_punctuation_end = re.compile(CFG_BIBINDEX_CHARS_PUNCTUATION+"+$") re_punctuation = re.compile(CFG_BIBINDEX_CHARS_PUNCTUATION) re_separators = re.compile(CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS) re_datetime_shift = re.compile("([-\+]{0,1})([\d]+)([dhms])") nb_char_in_line = 50 # for verbose pretty printing chunksize = 1000 # default size of chunks that the records will be treated by base_process_size = 4500 # process base size _last_word_table = None -## Dictionary merging functions -def intersection(dict1, dict2): - "Returns intersection of the two dictionaries." - int_dict = {} - if len(dict1) > len(dict2): - for e in dict2: - if dict1.has_key(e): - int_dict[e] = 1 - else: - for e in dict1: - if dict2.has_key(e): - int_dict[e] = 1 - return int_dict - -def union(dict1, dict2): - "Returns union of the two dictionaries." - union_dict = {} - for e in dict1.keys(): - union_dict[e] = 1 - for e in dict2.keys(): - union_dict[e] = 1 - return union_dict - -def diff(dict1, dict2): - "Returns dict1 - dict2." - diff_dict = {} - for e in dict1.keys(): - if not dict2.has_key(e): - diff_dict[e] = 1 - return diff_dict - def list_union(list1, list2): "Returns union of the two lists." union_dict = {} for e in list1: union_dict[e] = 1 for e in list2: union_dict[e] = 1 return union_dict.keys() ## safety function for killing slow DB threads: def kill_sleepy_mysql_threads(max_threads=CFG_MAX_MYSQL_THREADS, thread_timeout=CFG_MYSQL_THREAD_TIMEOUT): """Check the number of DB threads and if there are more than MAX_THREADS of them, lill all threads that are in a sleeping state for more than THREAD_TIMEOUT seconds. (This is useful for working around the the max_connection problem that appears during indexation in some not-yet-understood cases.) If some threads are to be killed, write info into the log file. """ res = run_sql("SHOW FULL PROCESSLIST") if len(res) > max_threads: for row in res: r_id, dummy, dummy, dummy, r_command, r_time, dummy, dummy = row if r_command == "Sleep" and int(r_time) > thread_timeout: run_sql("KILL %s", (r_id,)) write_message("WARNING: too many DB threads, killing thread %s" % r_id, verbose=1) return ## MARC-21 tag/field access functions def get_fieldvalues(recID, tag): """Returns list of values of the MARC-21 'tag' fields for the record 'recID'.""" - out = [] bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx - query = "SELECT value FROM %s AS b, %s AS bb WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id AND tag LIKE '%s'" \ - % (bibXXx, bibrec_bibXXx, recID, tag) - res = run_sql(query) - for row in res: - out.append(row[0]) - return out + query = "SELECT value FROM %s AS b, %s AS bb WHERE bb.id_bibrec=%%s AND bb.id_bibxxx=b.id AND tag LIKE %%s" \ + % (bibXXx, bibrec_bibXXx) + res = run_sql(query, (recID, tag)) + return [row[0] for row in res] def get_associated_subfield_value(recID, tag, value, associated_subfield_code): """Return list of ASSOCIATED_SUBFIELD_CODE, if exists, for record RECID and TAG of value VALUE. Used by fulltext indexer only. Note: TAG must be 6 characters long (tag+ind1+ind2+sfcode), otherwise en empty string is returned. FIXME: what if many tag values have the same value but different associated_subfield_code? Better use bibrecord library for this. """ out = "" if len(tag) != 6: return out bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = """SELECT bb.field_number, b.tag, b.value FROM %s AS b, %s AS bb - WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id AND tag LIKE '%s%%'""" % \ - (bibXXx, bibrec_bibXXx, recID, tag[:-1]) - res = run_sql(query) + WHERE bb.id_bibrec=%%s AND bb.id_bibxxx=b.id AND tag LIKE + %%s%%""" % (bibXXx, bibrec_bibXXx) + res = run_sql(query, (recID, tag[:-1])) field_number = -1 for row in res: if row[1] == tag and row[2] == value: field_number = row[0] if field_number > 0: for row in res: if row[0] == field_number and row[1] == tag[:-1] + associated_subfield_code: out = row[2] break return out def get_field_tags(field): """Returns a list of MARC tags for the field code 'field'. Returns empty list in case of error. Example: field='author', output=['100__%','700__%'].""" out = [] query = """SELECT t.value FROM tag AS t, field_tag AS ft, field AS f - WHERE f.code='%s' AND ft.id_field=f.id AND t.id=ft.id_tag - ORDER BY ft.score DESC""" % field - res = run_sql(query) - for row in res: - out.append(row[0]) - return out + WHERE f.code=%s AND ft.id_field=f.id AND t.id=ft.id_tag + ORDER BY ft.score DESC""" + res = run_sql(query, (field, )) + return [row[0] for row in res] ## Fulltext word extraction functions def get_fulltext_urls_from_html_page(htmlpagebody): """Parses htmlpagebody data (the splash page content) looking for url_directs referring to probable fulltexts. Returns an array of (ext,url_direct) to fulltexts. Note: it looks for file format extensions as defined by global - 'CONV_PROGRAMS' structure, minus the HTML ones, because we don't + 'CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM' structure, minus the HTML ones, because we don't want to index HTML pages that the splash page might point to. """ out = [] - for ext in CONV_PROGRAMS.keys(): + for ext in CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM: expr = re.compile( r"\"(http://[\w]+\.+[\w]+[^\"'><]*\." + \ ext + r")\"") match = expr.search(htmlpagebody) if match and ext not in ['htm', 'html']: out.append([ext, match.group(1)]) #else: # FIXME: workaround for getfile, should use bibdoc tables - #expr_getfile = re.compile(r"\"(http://.*getfile\.py\?.*format=" + ext + r"&version=.*)\"") + #expr_getfile = re.compile(r"\"(http://.*getfile\.py\?.*format=" + ext + "&version=.*)\"") #match = expr_getfile.search(htmlpagebody) #if match and ext not in ['htm', 'html']: #out.append([ext, match.group(1)]) return out def get_words_from_journal_tag(recID, tag): """ Special procedure to extract words from journal tags. Joins title/volume/year/page into a standard form that is also used for citations. """ # get all journal tags/subfields: bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = """SELECT bb.field_number,b.tag,b.value FROM %s AS b, %s AS bb - WHERE bb.id_bibrec=%d - AND bb.id_bibxxx=b.id AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID, tag) - res = run_sql(query) + WHERE bb.id_bibrec=%%s + AND bb.id_bibxxx=b.id AND tag LIKE %%s""" % (bibXXx, bibrec_bibXXx) + res = run_sql(query, (recID, tag)) # construct journal pubinfo: dpubinfos = {} for row in res: nb_instance, subfield, value = row if subfield.endswith("c"): # delete pageend if value is pagestart-pageend # FIXME: pages may not be in 'c' subfield value = value.split('-', 1)[0] if dpubinfos.has_key(nb_instance): dpubinfos[nb_instance][subfield] = value else: dpubinfos[nb_instance] = {subfield: value} # construct standard format: lwords = [] for dpubinfo in dpubinfos.values(): # index all journal subfields separately for tag,val in dpubinfo.items(): lwords.append(val) # index journal standard format: pubinfo = CFG_JOURNAL_PUBINFO_STANDARD_FORM for tag,val in dpubinfo.items(): pubinfo = pubinfo.replace(tag,val) if CFG_JOURNAL_TAG[:-1] in pubinfo: # some subfield was missing, do nothing pass else: lwords.append(pubinfo) # return list of words and pubinfos: return lwords def get_words_from_date_tag(datestring, stemming_language=None): """ Special procedure to index words from tags storing date-like information in format YYYY or YYYY-MM or YYYY-MM-DD. Namely, we are indexing word-terms YYYY, YYYY-MM, YYYY-MM-DD, but never standalone MM or DD. """ out = [] for dateword in datestring.split(): # maybe there are whitespaces, so break these too out.append(dateword) parts = dateword.split('-') for nb in range(1,len(parts)): out.append("-".join(parts[:nb])) return out def get_words_from_fulltext(url_direct_or_indirect, stemming_language=None): """Returns all the words contained in the document specified by URL_DIRECT_OR_INDIRECT with the words being split by various SRE_SEPARATORS regexp set earlier. If FORCE_FILE_EXTENSION is set (e.g. to "pdf", then treat URL_DIRECT_OR_INDIRECT as a PDF file. (This is interesting to index Indico for example.) Note also that URL_DIRECT_OR_INDIRECT may be either a direct URL to the fulltext file or an URL to a setlink-like page body that presents the links to be indexed. In the latter case the URL_DIRECT_OR_INDIRECT is parsed to extract actual direct URLs to fulltext documents, for all knows file extensions as specified by global CONV_PROGRAMS config variable. """ - - if CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY and \ - url_direct_or_indirect.find(CFG_SITE_URL) < 0: - return [] + re_perform_ocr = re.compile(CFG_BIBINDEX_PERFORM_OCR_ON_DOCNAMES) write_message("... reading fulltext files from %s started" % url_direct_or_indirect, verbose=2) - - fulltext_urls = [] - if bibdocfile_url_p(url_direct_or_indirect): - write_message("... url %s is an internal url" % url_direct_or_indirect, verbose=9) - ext = decompose_bibdocfile_url(url_direct_or_indirect)[2] - if ext.startswith('.'): - ext = ext[1:].lower() - fulltext_urls = [(ext, url_direct_or_indirect)] - else: - # check for direct link in url - url_direct_or_indirect_ext = url_direct_or_indirect.split(".")[-1].lower() - - if url_direct_or_indirect_ext in CONV_PROGRAMS.keys(): - fulltext_urls = [(url_direct_or_indirect_ext, url_direct_or_indirect)] - - # Indirect URL. Try to discover the real fulltext(s) from this splash page URL. - if not fulltext_urls: - # read "setlink" data - try: - htmlpagebody = urllib2.urlopen(url_direct_or_indirect).read() - except Exception, e: - register_exception() - sys.stderr.write("Error: Cannot read %s: %s" % (url_direct_or_indirect, e)) - return [] - fulltext_urls = get_fulltext_urls_from_html_page(htmlpagebody) - write_message("... fulltext_urls = %s" % fulltext_urls, verbose=9) - - write_message('... data to elaborate: %s' % fulltext_urls, verbose=9) - - words = {} - - # process as many urls as they were found: - for (ext, url_direct) in fulltext_urls: - - write_message(".... processing %s from %s started" % (ext, url_direct), verbose=2) - - # sanity check: - if not url_direct: - break - - if bibdocfile_url_p(url_direct): - # Let's manage this with BibRecDocs... - # We got something like http://$(CFG_SITE_URL)/record/xxx/yyy.ext - try: - tmp_name = bibdocfile_url_to_fullpath(url_direct) - write_message("Found internal path %s for url %s" % (tmp_name, url_direct), verbose=2) - no_src_delete = True - except Exception, e: - register_exception() - sys.stderr.write("Error in retrieving fulltext from internal url %s: %s\n" % (url_direct, e)) - break # try other fulltext files... - else: - # read fulltext file: - try: - url = urllib2.urlopen(url_direct) - no_src_delete = False - except Exception, e: - register_exception() - sys.stderr.write("Error: Cannot read %s: %s\n" % (url_direct, e)) - break # try other fulltext files... - - tmp_fd, tmp_name = tempfile.mkstemp('invenio.tmp') - data_chunk = url.read(8*1024) - while data_chunk: - os.write(tmp_fd, data_chunk) - data_chunk = url.read(8*1024) - os.close(tmp_fd) - - dummy_fd, tmp_dst_name = tempfile.mkstemp('invenio.tmp.txt', dir=CFG_TMPDIR) - - bingo = 0 - # try all available conversion programs according to their order: - for conv_program in CONV_PROGRAMS.get(ext, []): - if os.path.exists(conv_program): - # intelligence on how to run various conversion programs: - cmd = "" # will keep command to run - bingo = 0 # had we success? - if os.path.basename(conv_program) == "pdftotext": - cmd = "%s -enc UTF-8 %s %s" % (conv_program, escape_shell_arg(tmp_name), escape_shell_arg(tmp_dst_name)) - elif os.path.basename(conv_program) == "pstotext": - if ext == "ps.gz": - # is there gzip available? - if os.path.exists(CONV_PROGRAMS_HELPERS["gz"]): - cmd = "%s -cd %s | %s > %s" \ - % (CONV_PROGRAMS_HELPERS["gz"], escape_shell_arg(tmp_name), conv_program, escape_shell_arg(tmp_dst_name)) - else: - cmd = "%s %s > %s" \ - % (conv_program, escape_shell_arg(tmp_name), escape_shell_arg(tmp_dst_name)) - elif os.path.basename(conv_program) == "ps2ascii": - if ext == "ps.gz": - # is there gzip available? - if os.path.exists(CONV_PROGRAMS_HELPERS["gz"]): - cmd = "%s -cd %s | %s > %s"\ - % (CONV_PROGRAMS_HELPERS["gz"], escape_shell_arg(tmp_name), - conv_program, escape_shell_arg(tmp_dst_name)) - else: - cmd = "%s %s %s" \ - % (conv_program, escape_shell_arg(tmp_name), escape_shell_arg(tmp_dst_name)) - elif os.path.basename(conv_program) == "antiword": - cmd = "%s %s > %s" % (conv_program, escape_shell_arg(tmp_name), escape_shell_arg(tmp_dst_name)) - elif os.path.basename(conv_program) == "catdoc": - cmd = "%s %s > %s" % (conv_program, escape_shell_arg(tmp_name), escape_shell_arg(tmp_dst_name)) - elif os.path.basename(conv_program) == "wvText": - cmd = "%s %s %s" % (conv_program, escape_shell_arg(tmp_name), escape_shell_arg(tmp_dst_name)) - elif os.path.basename(conv_program) == "ppthtml": - # is there html2text available? - if os.path.exists(CONV_PROGRAMS_HELPERS["html"]): - cmd = "%s %s | %s > %s"\ - % (conv_program, escape_shell_arg(tmp_name), - CONV_PROGRAMS_HELPERS["html"], escape_shell_arg(tmp_dst_name)) - else: - cmd = "%s %s > %s" \ - % (conv_program, escape_shell_arg(tmp_name), escape_shell_arg(tmp_dst_name)) - elif os.path.basename(conv_program) == "xlhtml": - # is there html2text available? - if os.path.exists(CONV_PROGRAMS_HELPERS["html"]): - cmd = "%s %s | %s > %s" % \ - (conv_program, escape_shell_arg(tmp_name), - CONV_PROGRAMS_HELPERS["html"], escape_shell_arg(tmp_dst_name)) - else: - cmd = "%s %s > %s" % \ - (conv_program, escape_shell_arg(tmp_name), escape_shell_arg(tmp_dst_name)) - elif os.path.basename(conv_program) == "html2text": - cmd = "%s %s > %s" % \ - (conv_program, escape_shell_arg(tmp_name), escape_shell_arg(tmp_dst_name)) - else: - write_message("Error: Do not know how to handle %s conversion program." % conv_program, sys.stderr) - # try to run it: - try: - write_message("..... launching %s" % cmd, verbose=9) - # Note we replace ; in order to make happy internal file names - errcode = os.system(cmd) - if errcode == 0 and os.path.exists(tmp_dst_name): - bingo = 1 - break # bingo! - else: - write_message("Error while running %s for %s." % (cmd, url_direct), sys.stderr) - except: - write_message("Error running %s for %s." % (cmd, url_direct), sys.stderr) - # were we successful? - if bingo: - tmp_name_txt_file = open(tmp_dst_name) - for phrase in tmp_name_txt_file.xreadlines(): - for word in get_words_from_phrase(phrase, stemming_language): - if not words.has_key(word): - words[word] = 1 - tmp_name_txt_file.close() + try: + if bibdocfile_url_p(url_direct_or_indirect): + write_message("... %s is an internal document" % url_direct_or_indirect, verbose=2) + bibdoc = bibdocfile_url_to_bibdoc(url_direct_or_indirect) + perform_ocr = bool(re_perform_ocr.match(bibdoc.get_docname())) + write_message("... will extract words from %s (docid: %s) %s" % (bibdoc.get_docname(), bibdoc.get_id(), perform_ocr and 'with OCR' or '')) + if not bibdoc.has_text(require_up_to_date=True): + bibdoc.extract_text(perform_ocr=perform_ocr) + return get_words_from_phrase(bibdoc.get_text(), stemming_language) else: - write_message("No conversion for %s." % (url_direct), sys.stderr, verbose=2) - - # delete temp files (they might not exist): - try: - if not no_src_delete: - os.unlink(tmp_name) - os.close(dummy_fd) - os.unlink(tmp_dst_name) - except StandardError: - write_message("Error: Could not delete file. It didn't exist", sys.stderr) - - write_message(".... processing %s from %s ended" % (ext, url_direct), verbose=2) - - - write_message("... reading fulltext files from %s ended" % url_direct_or_indirect, verbose=2) - - return words.keys() + if CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY: + write_message("... %s is external URL but indexing only local files" % url_direct_or_indirect, verbose=2) + return [] + write_message("... %s is an external URL" % url_direct_or_indirect, verbose=2) + best_formats = [normalize_format(format) for format in CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM] + format = guess_format_from_url(url_direct_or_indirect) + if re.match(CFG_BIBINDEX_SPLASH_PAGES, url_direct_or_indirect): + urls = get_fulltext_urls_from_html_page(url_direct_or_indirect) + else: + urls = [url_direct_or_indirect] + write_message("... will extract words from %s" % ', '.join(urls)) + words = {} + for url in urls: + format = guess_format_from_url(url) + tmpdoc = download_url(url, format) + tmptext = convert_file(tmpdoc, format='.txt') + os.remove(tmpdoc) + text = open(tmptext).read() + os.remove(tmptext) + tmpwords = get_words_from_phrase(text, stemming_language) + words.update(dict(map(lambda x: (x, 1), tmpwords))) + return words.keys() + except Exception, e: + register_exception(prefix='ERROR: it\'s impossible to correctly extract words from %s' % url_direct_or_indirect, alert_admin=True) + write_message("ERROR: %s" % e, stream=sys.stderr) + return [] latex_markup_re = re.compile(r"\\begin(\[.+?\])?\{.+?\}|\\end\{.+?}|\\\w+(\[.+?\])?\{(?P.*?)\}|\{\\\w+ (?P.*?)\}") def remove_latex_markup(phrase): ret_phrase = '' index = 0 for match in latex_markup_re.finditer(phrase): ret_phrase += phrase[index:match.start()] ret_phrase += match.group('inside1') or match.group('inside2') or '' index = match.end() ret_phrase += phrase[index:] return ret_phrase def get_nothing_from_phrase(phrase, stemming_language=None): """ A dump implementation of get_words_from_phrase to be used when when a tag should not be indexed (such as when trying to extract phrases from 8564_u).""" return [] def swap_temporary_reindex_tables(index_id, reindex_prefix="tmp_"): """Atomically swap reindexed temporary table with the original one. Delete the now-old one.""" write_message("Putting new tmp index tables for id %s into production" % index_id) run_sql( "RENAME TABLE " + "idxWORD%02dR TO old_idxWORD%02dR," % (index_id, index_id) + "%sidxWORD%02dR TO idxWORD%02dR," % (reindex_prefix, index_id, index_id) + "idxWORD%02dF TO old_idxWORD%02dF," % (index_id, index_id) + "%sidxWORD%02dF TO idxWORD%02dF," % (reindex_prefix, index_id, index_id) + + "idxPAIR%02dR TO old_idxPAIR%02dR," % (index_id, index_id) + + "%sidxPAIR%02dR TO idxPAIR%02dR," % (reindex_prefix, index_id, index_id) + + "idxPAIR%02dF TO old_idxPAIR%02dF," % (index_id, index_id) + + "%sidxPAIR%02dF TO idxPAIR%02dF," % (reindex_prefix, index_id, index_id) + "idxPHRASE%02dR TO old_idxPHRASE%02dR," % (index_id, index_id) + "%sidxPHRASE%02dR TO idxPHRASE%02dR," % (reindex_prefix, index_id, index_id) + "idxPHRASE%02dF TO old_idxPHRASE%02dF," % (index_id, index_id) + "%sidxPHRASE%02dF TO idxPHRASE%02dF;" % (reindex_prefix, index_id, index_id) ) write_message("Dropping old index tables for id %s" % index_id) - run_sql("DROP TABLE old_idxWORD%02dR, old_idxWORD%02dF, old_idxPHRASE%02dR, old_idxPHRASE%02dF" % (index_id, index_id, index_id, index_id) + run_sql("DROP TABLE old_idxWORD%02dR, old_idxWORD%02dF, old_idxPAIR%02dR, old_idxPAIR%02dF, old_idxPHRASE%02dR, old_idxPHRASE%02dF" % (index_id, index_id, index_id, index_id, index_id, index_id) ) def init_temporary_reindex_tables(index_id, reindex_prefix="tmp_"): """Create reindexing temporary tables.""" write_message("Creating new tmp index tables for id %s" % index_id) res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxWORD%02dF ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) ENGINE=MyISAM""" % (reindex_prefix, index_id)) res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxWORD%02dR ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) ENGINE=MyISAM""" % (reindex_prefix, index_id)) + res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxPAIR%02dF ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) + ) ENGINE=MyISAM""" % (reindex_prefix, index_id)) + + res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxPAIR%02dR ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) + ) ENGINE=MyISAM""" % (reindex_prefix, index_id)) + res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxPHRASE%02dF ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) ENGINE=MyISAM""" % (reindex_prefix, index_id)) res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxPHRASE%02dR ( id_bibrec mediumint(9) unsigned NOT NULL default '0', termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) ENGINE=MyISAM""" % (reindex_prefix, index_id)) run_sql("UPDATE idxINDEX SET last_updated='0000-00-00 00:00:00' WHERE id=%s", (index_id,)) latex_formula_re = re.compile(r'\$.*?\$|\\\[.*?\\\]') def get_words_from_phrase(phrase, stemming_language=None): """Return list of words found in PHRASE. Note that the phrase is split into groups depending on the alphanumeric characters and punctuation characters definition present in the config file. """ words = {} formulas = [] if CFG_BIBINDEX_REMOVE_HTML_MARKUP and phrase.find(" -1: phrase = remove_html_markup(phrase) if CFG_BIBINDEX_REMOVE_LATEX_MARKUP: formulas = latex_formula_re.findall(phrase) phrase = remove_latex_markup(phrase) phrase = latex_formula_re.sub(' ', phrase) try: phrase = lower_index_term(phrase) except UnicodeDecodeError: # too bad the phrase is not UTF-8 friendly, continue... phrase = phrase.lower() # 1st split phrase into blocks according to whitespace for block in strip_accents(phrase).split(): # 2nd remove leading/trailing punctuation and add block: block = re_block_punctuation_begin.sub("", block) block = re_block_punctuation_end.sub("", block) if block: if stemming_language: block = apply_stemming_and_stopwords_and_length_check(block, stemming_language) if block: words[block] = 1 # 3rd break each block into subblocks according to punctuation and add subblocks: for subblock in re_punctuation.split(block): if stemming_language: subblock = apply_stemming_and_stopwords_and_length_check(subblock, stemming_language) if subblock: words[subblock] = 1 # 4th break each subblock into alphanumeric groups and add groups: for alphanumeric_group in re_separators.split(subblock): if stemming_language: alphanumeric_group = apply_stemming_and_stopwords_and_length_check(alphanumeric_group, stemming_language) if alphanumeric_group: words[alphanumeric_group] = 1 for block in formulas: words[block] = 1 return words.keys() +def get_pairs_from_phrase(phrase, stemming_language=None): + """Return list of words found in PHRASE. Note that the phrase is + split into groups depending on the alphanumeric characters and + punctuation characters definition present in the config file. + """ + words = {} + if CFG_BIBINDEX_REMOVE_HTML_MARKUP and phrase.find(" -1: + phrase = remove_html_markup(phrase) + if CFG_BIBINDEX_REMOVE_LATEX_MARKUP: + phrase = remove_latex_markup(phrase) + phrase = latex_formula_re.sub(' ', phrase) + try: + phrase = lower_index_term(phrase) + except UnicodeDecodeError: + # too bad the phrase is not UTF-8 friendly, continue... + phrase = phrase.lower() + # 1st split phrase into blocks according to whitespace + last_word = '' + for block in strip_accents(phrase).split(): + # 2nd remove leading/trailing punctuation and add block: + block = re_block_punctuation_begin.sub("", block) + block = re_block_punctuation_end.sub("", block) + if block: + if stemming_language: + block = apply_stemming_and_stopwords_and_length_check(block, stemming_language) + # 3rd break each block into subblocks according to punctuation and add subblocks: + for subblock in re_punctuation.split(block): + if stemming_language: + subblock = apply_stemming_and_stopwords_and_length_check(subblock, stemming_language) + if subblock: + # 4th break each subblock into alphanumeric groups and add groups: + for alphanumeric_group in re_separators.split(subblock): + if stemming_language: + alphanumeric_group = apply_stemming_and_stopwords_and_length_check(alphanumeric_group, stemming_language) + if alphanumeric_group: + if last_word: + words['%s %s' % (last_word, alphanumeric_group)] = 1 + last_word = alphanumeric_group + return words.keys() + phrase_delimiter_re = re.compile(r'[\.:;\?\!]') space_cleaner_re = re.compile(r'\s+') def get_phrases_from_phrase(phrase, stemming_language=None): """Return list of phrases found in PHRASE. Note that the phrase is split into groups depending on the alphanumeric characters and punctuation characters definition present in the config file. """ return [phrase] ## Note that we don't break phrases, they are used for exact style ## of searching. words = {} phrase = strip_accents(phrase) # 1st split phrase into blocks according to whitespace for block1 in phrase_delimiter_re.split(strip_accents(phrase)): block1 = block1.strip() if block1 and stemming_language: new_words = [] for block2 in re_punctuation.split(block1): block2 = block2.strip() if block2: for block3 in block2.split(): block3 = block3.strip() if block3: # Note that we don't stem phrases, they # are used for exact style of searching. new_words.append(block3) block1 = ' '.join(new_words) if block1: words[block1] = 1 return words.keys() def get_fuzzy_authors_from_phrase(phrase, stemming_language=None): """ Return list of fuzzy phrase-tokens suitable for storing into author phrase index. """ author_tokenizer = BibIndexFuzzyNameTokenizer() return author_tokenizer.tokenize(phrase) def get_exact_authors_from_phrase(phrase, stemming_language=None): """ Return list of exact phrase-tokens suitable for storing into exact author phrase index. """ author_tokenizer = BibIndexExactNameTokenizer() return author_tokenizer.tokenize(phrase) def apply_stemming_and_stopwords_and_length_check(word, stemming_language): """Return WORD after applying stemming and stopword and length checks. See the config file in order to influence these. """ # now check against stopwords: if is_stopword(word): return "" # finally check the word length: if len(word) < CFG_BIBINDEX_MIN_WORD_LENGTH: return "" # stem word, when configured so: if stemming_language: word = stem(word, stemming_language) return word def remove_subfields(s): "Removes subfields from string, e.g. 'foo $$c bar' becomes 'foo bar'." return re_subfields.sub(' ', s) def get_index_id_from_index_name(index_name): """Returns the words/phrase index id for INDEXNAME. Returns empty string in case there is no words table for this index. Example: field='author', output=4.""" out = 0 query = """SELECT w.id FROM idxINDEX AS w - WHERE w.name='%s' LIMIT 1""" % index_name - res = run_sql(query, None, 1) + WHERE w.name=%s LIMIT 1""" + res = run_sql(query, (index_name, ), 1) if res: out = res[0][0] return out +def get_index_name_from_index_id(index_id): + """Returns the words/phrase index name for INDEXID. + Returns '' in case there is no words table for this indexid. + Example: field=9, output='fulltext'.""" + res = run_sql("SELECT name FROM idxINDEX WHERE id=%s", (index_id, )) + if res: + return res[0][0] + return '' + def get_index_tags(indexname): """Returns the list of tags that are indexed inside INDEXNAME. Returns empty list in case there are no tags indexed in this index. Note: uses get_field_tags() defined before. Example: field='author', output=['100__%', '700__%'].""" out = [] query = """SELECT f.code FROM idxINDEX AS w, idxINDEX_field AS wf, - field AS f WHERE w.name='%s' AND w.id=wf.id_idxINDEX - AND f.id=wf.id_field""" % indexname - res = run_sql(query) + field AS f WHERE w.name=%s AND w.id=wf.id_idxINDEX + AND f.id=wf.id_field""" + res = run_sql(query, (indexname, )) for row in res: out.extend(get_field_tags(row[0])) return out def get_all_indexes(): """Returns the list of the names of all defined words indexes. Returns empty list in case there are no tags indexed in this index. Example: output=['global', 'author'].""" out = [] query = """SELECT name FROM idxINDEX""" res = run_sql(query) for row in res: out.append(row[0]) return out def split_ranges(parse_string): """Parse a string a return the list or ranges.""" recIDs = [] ranges = parse_string.split(",") for arange in ranges: tmp_recIDs = arange.split("-") if len(tmp_recIDs)==1: recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[0])]) else: if int(tmp_recIDs[0]) > int(tmp_recIDs[1]): # sanity check tmp = tmp_recIDs[0] tmp_recIDs[0] = tmp_recIDs[1] tmp_recIDs[1] = tmp recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[1])]) return recIDs def get_word_tables(tables): """ Given a list of table names it return a list of tuples (index_id, index_name, index_tags). If tables is empty it returns the whole list.""" wordTables = [] if tables: indexes = tables.split(",") for index in indexes: index_id = get_index_id_from_index_name(index) if index_id: wordTables.append((index_id, index, get_index_tags(index))) else: write_message("Error: There is no %s words table." % index, sys.stderr) else: for index in get_all_indexes(): index_id = get_index_id_from_index_name(index) wordTables.append((index_id, index, get_index_tags(index))) return wordTables def get_date_range(var): "Returns the two dates contained as a low,high tuple" limits = var.split(",") if len(limits)==1: low = get_datetime(limits[0]) return low, None if len(limits)==2: low = get_datetime(limits[0]) high = get_datetime(limits[1]) return low, high return None, None def create_range_list(res): """Creates a range list from a recID select query result contained in res. The result is expected to have ascending numerical order.""" if not res: return [] row = res[0] if not row: return [] else: - range_list = [[row[0], row[0]]] + range_list = [[row, row]] for row in res[1:]: - row_id = row[0] + row_id = row if row_id == range_list[-1][1] + 1: range_list[-1][1] = row_id else: range_list.append([row_id, row_id]) return range_list def beautify_range_list(range_list): """Returns a non overlapping, maximal range list""" ret_list = [] for new in range_list: found = 0 for old in ret_list: if new[0] <= old[0] <= new[1] + 1 or new[0] - 1 <= old[1] <= new[1]: old[0] = min(old[0], new[0]) old[1] = max(old[1], new[1]) found = 1 break if not found: ret_list.append(new) return ret_list def truncate_index_table(index_name): """Properly truncate the given index.""" index_id = get_index_id_from_index_name(index_name) if index_id: write_message('Truncating %s index table in order to reindex.' % index_name, verbose=2) run_sql("UPDATE idxINDEX SET last_updated='0000-00-00 00:00:00' WHERE id=%s", (index_id,)) run_sql("TRUNCATE idxWORD%02dF" % index_id) run_sql("TRUNCATE idxWORD%02dR" % index_id) run_sql("TRUNCATE idxPHRASE%02dF" % index_id) run_sql("TRUNCATE idxPHRASE%02dR" % index_id) def update_index_last_updated(index_id, starting_time=None): """Update last_updated column of the index table in the database. Puts starting time there so that if the task was interrupted for record download, the records will be reindexed next time.""" if starting_time is None: return None write_message("updating last_updated to %s..." % starting_time, verbose=9) return run_sql("UPDATE idxINDEX SET last_updated=%s WHERE id=%s", (starting_time, index_id,)) +#def update_text_extraction_date(first_recid, last_recid): + #"""for all the bibdoc connected to the specified recid, set + #the text_extraction_date to the task_starting_time.""" + #run_sql("UPDATE bibdoc JOIN bibrec_bibdoc ON id=id_bibdoc SET text_extraction_date=%s WHERE id_bibrec BETWEEN %s AND %s", (task_get_task_param('task_starting_time'), first_recid, last_recid)) + class WordTable: "A class to hold the words table." - def __init__(self, index_id, fields_to_index, table_name_pattern, default_get_words_fnc, tag_to_words_fnc_map, wash_index_terms=True): + def __init__(self, index_id, fields_to_index, table_name_pattern, default_get_words_fnc, tag_to_words_fnc_map, wash_index_terms=True, is_fulltext_index=False): """Creates words table instance. @param index_id: the index integer identificator @param fields_to_index: a list of fields to index @param table_name_pattern: i.e. idxWORD%02dF or idxPHRASE%02dF @parm default_get_words_fnc: the default function called to extract words from a metadata @param tag_to_words_fnc_map: a mapping to specify particular function to extract words from particular metdata (such as 8564_u) """ self.index_id = index_id self.tablename = table_name_pattern % index_id self.recIDs_in_mem = [] self.fields_to_index = fields_to_index self.value = {} self.stemming_language = get_index_stemming_language(index_id) + self.is_fulltext_index = is_fulltext_index self.wash_index_terms = wash_index_terms # tagToFunctions mapping. It offers an indirection level necessary for # indexing fulltext. The default is get_words_from_phrase self.tag_to_words_fnc_map = tag_to_words_fnc_map self.default_get_words_fnc = default_get_words_fnc if self.stemming_language and self.tablename.startswith('idxWORD'): write_message('%s has stemming enabled, language %s' % (self.tablename, self.stemming_language)) def get_field(self, recID, tag): """Returns list of values of the MARC-21 'tag' fields for the record 'recID'.""" out = [] bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = """SELECT value FROM %s AS b, %s AS bb - WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id - AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID, tag); - res = run_sql(query) + WHERE bb.id_bibrec=%%s AND bb.id_bibxxx=b.id + AND tag LIKE %%s""" % (bibXXx, bibrec_bibXXx) + res = run_sql(query, (recID, tag)) for row in res: out.append(row[0]) return out def clean(self): "Cleans the words table." self.value = {} def put_into_db(self, mode="normal"): """Updates the current words table in the corresponding DB idxFOO table. Mode 'normal' means normal execution, mode 'emergency' means words index reverting to old state. """ write_message("%s %s wordtable flush started" % (self.tablename, mode)) write_message('...updating %d words into %s started' % \ (len(self.value), self.tablename)) task_update_progress("%s flushed %d/%d words" % (self.tablename, 0, len(self.value))) self.recIDs_in_mem = beautify_range_list(self.recIDs_in_mem) if mode == "normal": for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='TEMPORARY' WHERE id_bibrec - BETWEEN '%d' AND '%d' AND type='CURRENT'""" % \ - (self.tablename[:-1], group[0], group[1]) - write_message(query, verbose=9) - run_sql(query) + BETWEEN %%s AND %%s AND type='CURRENT'""" % self.tablename[:-1] + write_message(query % (group[0], group[1]), verbose=9) + run_sql(query, (group[0], group[1])) nb_words_total = len(self.value) nb_words_report = int(nb_words_total/10.0) nb_words_done = 0 for word in self.value.keys(): self.put_word_into_db(word) nb_words_done += 1 if nb_words_report != 0 and ((nb_words_done % nb_words_report) == 0): write_message('......processed %d/%d words' % (nb_words_done, nb_words_total)) task_update_progress("%s flushed %d/%d words" % (self.tablename, nb_words_done, nb_words_total)) write_message('...updating %d words into %s ended' % \ (nb_words_total, self.tablename)) write_message('...updating reverse table %sR started' % self.tablename[:-1]) if mode == "normal": for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec - BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \ - (self.tablename[:-1], group[0], group[1]) - write_message(query, verbose=9) - run_sql(query) + BETWEEN %%s AND %%s AND type='FUTURE'""" % self.tablename[:-1] + write_message(query % (group[0], group[1]), verbose=9) + run_sql(query, (group[0], group[1])) query = """DELETE FROM %sR WHERE id_bibrec - BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \ - (self.tablename[:-1], group[0], group[1]) - write_message(query, verbose=9) - run_sql(query) + BETWEEN %%s AND %%s AND type='TEMPORARY'""" % self.tablename[:-1] + write_message(query % (group[0], group[1]), verbose=9) + run_sql(query, (group[0], group[1])) + #if self.is_fulltext_index: + #update_text_extraction_date(group[0], group[1]) write_message('End of updating wordTable into %s' % self.tablename, verbose=9) elif mode == "emergency": for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec - BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \ - (self.tablename[:-1], group[0], group[1]) - write_message(query, verbose=9) - run_sql(query) + BETWEEN %%s AND %%s AND type='TEMPORARY'""" % self.tablename[:-1] + write_message(query % (group[0], group[1]), verbose=9) + run_sql(query, (group[0], group[1])) query = """DELETE FROM %sR WHERE id_bibrec - BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \ - (self.tablename[:-1], group[0], group[1]) - write_message(query, verbose=9) - run_sql(query) + BETWEEN %%s AND %%s AND type='FUTURE'""" % self.tablename[:-1] + write_message(query % (group[0], group[1]), verbose=9) + run_sql(query, (group[0], group[1])) write_message('End of emergency flushing wordTable into %s' % self.tablename, verbose=9) write_message('...updating reverse table %sR ended' % self.tablename[:-1]) self.clean() self.recIDs_in_mem = [] write_message("%s %s wordtable flush ended" % (self.tablename, mode)) task_update_progress("%s flush ended" % (self.tablename)) def load_old_recIDs(self, word): """Load existing hitlist for the word from the database index files.""" query = "SELECT hitlist FROM %s WHERE term=%%s" % self.tablename res = run_sql(query, (word,)) if res: return intbitset(res[0][0]) else: return None def merge_with_old_recIDs(self, word, set): """Merge the system numbers stored in memory (hash of recIDs with value +1 or -1 according to whether to add/delete them) with those stored in the database index and received in set universe of recIDs for the given word. Return False in case no change was done to SET, return True in case SET was changed. """ oldset = intbitset(set) set.update_with_signs(self.value[word]) return set != oldset def put_word_into_db(self, word): """Flush a single word to the database and delete it from memory""" set = self.load_old_recIDs(word) if set is not None: # merge the word recIDs found in memory: if not self.merge_with_old_recIDs(word,set): # nothing to update: write_message("......... unchanged hitlist for ``%s''" % word, verbose=9) pass else: # yes there were some new words: write_message("......... updating hitlist for ``%s''" % word, verbose=9) run_sql("UPDATE %s SET hitlist=%%s WHERE term=%%s" % self.tablename, (set.fastdump(), word)) - else: # the word is new, will create new set: write_message("......... inserting hitlist for ``%s''" % word, verbose=9) set = intbitset(self.value[word].keys()) try: run_sql("INSERT INTO %s (term, hitlist) VALUES (%%s, %%s)" % self.tablename, (word, set.fastdump())) except Exception, e: ## We send this exception to the admin only when is not ## already reparing the problem. register_exception(prefix="Error when putting the term '%s' into db (hitlist=%s): %s\n" % (repr(word), set, e), alert_admin=(task_get_option('cmd') != 'repair')) if not set: # never store empty words run_sql("DELETE from %s WHERE term=%%s" % self.tablename, (word,)) del self.value[word] def display(self): "Displays the word table." keys = self.value.keys() keys.sort() for k in keys: write_message("%s: %s" % (k, self.value[k])) def count(self): "Returns the number of words in the table." return len(self.value) def info(self): "Prints some information on the words table." write_message("The words table contains %d words." % self.count()) def lookup_words(self, word=""): "Lookup word from the words table." if not word: done = 0 while not done: try: word = raw_input("Enter word: ") done = 1 except (EOFError, KeyboardInterrupt): return if self.value.has_key(word): write_message("The word '%s' is found %d times." \ % (word, len(self.value[word]))) else: write_message("The word '%s' does not exist in the word file."\ % word) def add_recIDs(self, recIDs, opt_flush): """Fetches records which id in the recIDs range list and adds them to the wordTable. The recIDs range list is of the form: [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]]. """ global chunksize, _last_word_table flush_count = 0 records_done = 0 records_to_go = 0 for arange in recIDs: records_to_go = records_to_go + arange[1] - arange[0] + 1 time_started = time.time() # will measure profile time for arange in recIDs: i_low = arange[0] chunksize_count = 0 while i_low <= arange[1]: # calculate chunk group of recIDs and treat it: i_high = min(i_low+opt_flush-flush_count-1,arange[1]) i_high = min(i_low+chunksize-chunksize_count-1, i_high) try: self.chk_recID_range(i_low, i_high) except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) - register_exception() + register_exception(alert_admin=True) task_update_status("ERROR") self.put_into_db() sys.exit(1) write_message("%s adding records #%d-#%d started" % \ (self.tablename, i_low, i_high)) if CFG_CHECK_MYSQL_THREADS: kill_sleepy_mysql_threads() task_update_progress("%s adding recs %d-%d" % (self.tablename, i_low, i_high)) self.del_recID_range(i_low, i_high) just_processed = self.add_recID_range(i_low, i_high) flush_count = flush_count + i_high - i_low + 1 chunksize_count = chunksize_count + i_high - i_low + 1 records_done = records_done + just_processed write_message("%s adding records #%d-#%d ended " % \ (self.tablename, i_low, i_high)) if chunksize_count >= chunksize: chunksize_count = 0 # flush if necessary: if flush_count >= opt_flush: self.put_into_db() self.clean() write_message("%s backing up" % (self.tablename)) flush_count = 0 self.log_progress(time_started,records_done,records_to_go) # iterate: i_low = i_high + 1 if flush_count > 0: self.put_into_db() self.log_progress(time_started,records_done,records_to_go) def add_recIDs_by_date(self, dates, opt_flush): """Add records that were modified between DATES[0] and DATES[1]. If DATES is not set, then add records that were modified since the last update of the index. """ if not dates: table_id = self.tablename[-3:-1] - query = """SELECT last_updated FROM idxINDEX WHERE id='%s' - """ % table_id - res = run_sql(query) + query = """SELECT last_updated FROM idxINDEX WHERE id=%s""" + res = run_sql(query, (table_id, )) if not res: return if not res[0][0]: dates = ("0000-00-00", None) else: dates = (res[0][0], None) if dates[1] is None: - res = run_sql("""SELECT b.id FROM bibrec AS b - WHERE b.modification_date >= %s ORDER BY b.id ASC""", - (dates[0],)) + res = intbitset(run_sql("""SELECT b.id FROM bibrec AS b + WHERE b.modification_date >= %s""", + (dates[0],))) + if self.is_fulltext_index: + res |= intbitset(run_sql("""SELECT id_bibrec FROM bibrec_bibdoc JOIN bibdoc ON id_bibdoc=id WHERE text_extraction_date <= modification_date AND modification_date >= %s AND status<>'DELETED'""", (dates[0], ))) elif dates[0] is None: - res = run_sql("""SELECT b.id FROM bibrec AS b - WHERE b.modification_date <= %s ORDER BY b.id ASC""", - (dates[1],)) + res = intbitset(run_sql("""SELECT b.id FROM bibrec AS b + WHERE b.modification_date <= %s""", + (dates[1],))) + if self.is_fulltext_index: + res |= intbitset(run_sql("""SELECT id_bibrec FROM bibrec_bibdoc JOIN bibdoc ON id_bibdoc=id WHERE text_extraction_date <= modification_date AND modification_date <= %s AND status<>'DELETED'""", (dates[1], ))) else: - res = run_sql("""SELECT b.id FROM bibrec AS b + res = intbitset(run_sql("""SELECT b.id FROM bibrec AS b WHERE b.modification_date >= %s AND - b.modification_date <= %s ORDER BY b.id ASC""", - (dates[0], dates[1])) - alist = create_range_list(res) + b.modification_date <= %s""", + (dates[0], dates[1]))) + if self.is_fulltext_index: + res |= intbitset(run_sql("""SELECT id_bibrec FROM bibrec_bibdoc JOIN bibdoc ON id_bibdoc=id WHERE text_extraction_date <= modification_date AND modification_date >= %s AND modification_date <= %s AND status<>'DELETED'""", (dates[0], dates[1], ))) + alist = create_range_list(list(res)) if not alist: write_message( "No new records added. %s is up to date" % self.tablename) else: self.add_recIDs(alist, opt_flush) def add_recID_range(self, recID1, recID2): """Add records from RECID1 to RECID2.""" wlist = {} self.recIDs_in_mem.append([recID1,recID2]) # secondly fetch all needed tags: if self.fields_to_index == [CFG_JOURNAL_TAG]: # FIXME: quick hack for the journal index; a special # treatment where we need to associate more than one # subfield into indexed term for recID in range(recID1, recID2 + 1): new_words = get_words_from_journal_tag(recID, self.fields_to_index[0]) if not wlist.has_key(recID): wlist[recID] = [] wlist[recID] = list_union(new_words, wlist[recID]) else: # usual tag-by-tag indexing: for tag in self.fields_to_index: get_words_function = self.tag_to_words_fnc_map.get(tag, self.default_get_words_fnc) bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = """SELECT bb.id_bibrec,b.value FROM %s AS b, %s AS bb - WHERE bb.id_bibrec BETWEEN %d AND %d - AND bb.id_bibxxx=b.id AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID1, recID2, tag) - res = run_sql(query) + WHERE bb.id_bibrec BETWEEN %%s AND %%s + AND bb.id_bibxxx=b.id AND tag LIKE %%s""" % (bibXXx, bibrec_bibXXx) + res = run_sql(query, (recID1, recID2, tag)) for row in res: recID,phrase = row if not wlist.has_key(recID): wlist[recID] = [] new_words = get_words_function(phrase, stemming_language=self.stemming_language) # ,self.separators wlist[recID] = list_union(new_words, wlist[recID]) # were there some words for these recIDs found? if len(wlist) == 0: return 0 recIDs = wlist.keys() for recID in recIDs: # was this record marked as deleted? if "DELETED" in self.get_field(recID, "980__c"): wlist[recID] = [] write_message("... record %d was declared deleted, removing its word list" % recID, verbose=9) write_message("... record %d, termlist: %s" % (recID, wlist[recID]), verbose=9) # put words into reverse index table with FUTURE status: for recID in recIDs: run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'FUTURE')" % self.tablename[:-1], (recID, serialize_via_marshal(wlist[recID]))) # ... and, for new records, enter the CURRENT status as empty: try: run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'CURRENT')" % self.tablename[:-1], (recID, serialize_via_marshal([]))) except DatabaseError: # okay, it's an already existing record, no problem pass # put words into memory word list: put = self.put for recID in recIDs: for w in wlist[recID]: put(recID, w, 1) return len(recIDs) def log_progress(self, start, done, todo): """Calculate progress and store it. start: start time, done: records processed, todo: total number of records""" time_elapsed = time.time() - start # consistency check if time_elapsed == 0 or done > todo: return time_recs_per_min = done/(time_elapsed/60.0) write_message("%d records took %.1f seconds to complete.(%1.f recs/min)"\ % (done, time_elapsed, time_recs_per_min)) if time_recs_per_min: write_message("Estimated runtime: %.1f minutes" % \ ((todo-done)/time_recs_per_min)) def put(self, recID, word, sign): - "Adds/deletes a word to the word list." + """Adds/deletes a word to the word list.""" try: if self.wash_index_terms: word = wash_index_term(word) if self.value.has_key(word): # the word 'word' exist already: update sign self.value[word][recID] = sign else: self.value[word] = {recID: sign} except: write_message("Error: Cannot put word %s with sign %d for recID %s." % (word, sign, recID)) def del_recIDs(self, recIDs): """Fetches records which id in the recIDs range list and adds them to the wordTable. The recIDs range list is of the form: [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]]. """ count = 0 for arange in recIDs: self.del_recID_range(arange[0],arange[1]) count = count + arange[1] - arange[0] self.put_into_db() def del_recID_range(self, low, high): """Deletes records with 'recID' system number between low and high from memory words index table.""" write_message("%s fetching existing words for records #%d-#%d started" % \ (self.tablename, low, high), verbose=3) self.recIDs_in_mem.append([low,high]) query = """SELECT id_bibrec,termlist FROM %sR as bb WHERE bb.id_bibrec - BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high) - recID_rows = run_sql(query) + BETWEEN %%s AND %%s""" % (self.tablename[:-1]) + recID_rows = run_sql(query, (low, high)) for recID_row in recID_rows: recID = recID_row[0] wlist = deserialize_via_marshal(recID_row[1]) for word in wlist: self.put(recID, word, -1) write_message("%s fetching existing words for records #%d-#%d ended" % \ (self.tablename, low, high), verbose=3) def report_on_table_consistency(self): """Check reverse words index tables (e.g. idxWORD01R) for interesting states such as 'TEMPORARY' state. Prints small report (no of words, no of bad words). """ # find number of words: query = """SELECT COUNT(*) FROM %s""" % (self.tablename) res = run_sql(query, None, 1) if res: nb_words = res[0][0] else: nb_words = 0 # find number of records: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1]) res = run_sql(query, None, 1) if res: nb_records = res[0][0] else: nb_records = 0 # report stats: write_message("%s contains %d words from %d records" % (self.tablename, nb_words, nb_records)) # find possible bad states in reverse tables: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1]) res = run_sql(query) if res: nb_bad_records = res[0][0] else: nb_bad_records = 999999999 if nb_bad_records: write_message("EMERGENCY: %s needs to repair %d of %d index records" % \ (self.tablename, nb_bad_records, nb_records)) else: write_message("%s is in consistent state" % (self.tablename)) return nb_bad_records def repair(self, opt_flush): """Repair the whole table""" # find possible bad states in reverse tables: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1]) res = run_sql(query, None, 1) if res: nb_bad_records = res[0][0] else: nb_bad_records = 0 if nb_bad_records == 0: return - query = """SELECT id_bibrec FROM %sR WHERE type <> 'CURRENT' ORDER BY id_bibrec""" \ + query = """SELECT id_bibrec FROM %sR WHERE type <> 'CURRENT'""" \ % (self.tablename[:-1]) - res = run_sql(query) - recIDs = create_range_list(res) + res = intbitset(run_sql(query)) + recIDs = create_range_list(list(res)) flush_count = 0 records_done = 0 records_to_go = 0 for arange in recIDs: records_to_go = records_to_go + arange[1] - arange[0] + 1 time_started = time.time() # will measure profile time for arange in recIDs: i_low = arange[0] chunksize_count = 0 while i_low <= arange[1]: # calculate chunk group of recIDs and treat it: i_high = min(i_low+opt_flush-flush_count-1,arange[1]) i_high = min(i_low+chunksize-chunksize_count-1, i_high) try: self.fix_recID_range(i_low, i_high) except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) - register_exception() + register_exception(alert_admin=True) task_update_status("ERROR") self.put_into_db() sys.exit(1) flush_count = flush_count + i_high - i_low + 1 chunksize_count = chunksize_count + i_high - i_low + 1 records_done = records_done + i_high - i_low + 1 if chunksize_count >= chunksize: chunksize_count = 0 # flush if necessary: if flush_count >= opt_flush: self.put_into_db("emergency") self.clean() flush_count = 0 self.log_progress(time_started,records_done,records_to_go) # iterate: i_low = i_high + 1 if flush_count > 0: self.put_into_db("emergency") self.log_progress(time_started,records_done,records_to_go) write_message("%s inconsistencies repaired." % self.tablename) def chk_recID_range(self, low, high): """Check if the reverse index table is in proper state""" ## check db query = """SELECT COUNT(*) FROM %sR WHERE type <> 'CURRENT' - AND id_bibrec BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high) - res = run_sql(query, None, 1) + AND id_bibrec BETWEEN %%s AND %%s""" % self.tablename[:-1] + res = run_sql(query, (low, high), 1) if res[0][0]==0: write_message("%s for %d-%d is in consistent state" % (self.tablename,low,high)) return # okay, words table is consistent ## inconsistency detected! write_message("EMERGENCY: %s inconsistencies detected..." % self.tablename) write_message("""EMERGENCY: Errors found. You should check consistency of the %s - %sR tables.\nRunning 'bibindex --repair' is recommended.""" \ % (self.tablename, self.tablename[:-1])) raise StandardError def fix_recID_range(self, low, high): """Try to fix reverse index database consistency (e.g. table idxWORD01R) in the low,high doc-id range. Possible states for a recID follow: CUR TMP FUT: very bad things have happened: warn! CUR TMP : very bad things have happened: warn! CUR FUT: delete FUT (crash before flushing) CUR : database is ok TMP FUT: add TMP to memory and del FUT from memory flush (revert to old state) TMP : very bad things have happened: warn! FUT: very bad things have happended: warn! """ state = {} - query = "SELECT id_bibrec,type FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d'"\ - % (self.tablename[:-1], low, high) - res = run_sql(query) + query = "SELECT id_bibrec,type FROM %sR WHERE id_bibrec BETWEEN %%s AND %%s"\ + % self.tablename[:-1] + res = run_sql(query, (low, high)) for row in res: if not state.has_key(row[0]): state[row[0]]=[] state[row[0]].append(row[1]) ok = 1 # will hold info on whether we will be able to repair for recID in state.keys(): if not 'TEMPORARY' in state[recID]: if 'FUTURE' in state[recID]: if 'CURRENT' not in state[recID]: write_message("EMERGENCY: Index record %d is in inconsistent state. Can't repair it." % recID) ok = 0 else: write_message("EMERGENCY: Inconsistency in index record %d detected" % recID) query = """DELETE FROM %sR - WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID) - run_sql(query) - write_message("EMERGENCY: Inconsistency in index record %d repaired." % recID) + WHERE id_bibrec=%%s""" % self.tablename[:-1] + run_sql(query, (recID, )) + write_message("EMERGENCY: Inconsistency in record %d repaired." % recID) else: if 'FUTURE' in state[recID] and not 'CURRENT' in state[recID]: self.recIDs_in_mem.append([recID,recID]) # Get the words file query = """SELECT type,termlist FROM %sR - WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID) + WHERE id_bibrec=%%s""" % self.tablename[:-1] write_message(query, verbose=9) - res = run_sql(query) + res = run_sql(query, (recID, )) for row in res: wlist = deserialize_via_marshal(row[1]) write_message("Words are %s " % wlist, verbose=9) if row[0] == 'TEMPORARY': sign = 1 else: sign = -1 for word in wlist: self.put(recID, word, sign) else: write_message("EMERGENCY: %s for %d is in inconsistent state. Couldn't repair it." % (self.tablename, recID)) ok = 0 if not ok: write_message("""EMERGENCY: Unrepairable errors found. You should check consistency of the %s - %sR tables. Deleting affected TEMPORARY and FUTURE entries from these tables is recommended; see the BibIndex Admin Guide.""" % (self.tablename, self.tablename[:-1])) raise StandardError -def test_fulltext_indexing(): - """Tests fulltext indexing programs on PDF, PS, DOC, PPT, - XLS. Prints list of words and word table on the screen. Does not - integrate anything into the database. Useful when debugging - problems with fulltext indexing: call this function instead of main(). - """ - print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=atlnot&categ=Communication&id=com-indet-2002-012") # protected URL - print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=agenda&categ=a00388&id=a00388s2t7") # XLS - print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=agenda&categ=a02883&id=a02883s1t6/transparencies") # PPT - print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=agenda&categ=a99149&id=a99149s1t10/transparencies") # DOC - print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=preprint&categ=cern&id=lhc-project-report-601") # PDF - sys.exit(0) - def main(): """Main that construct all the bibtask.""" task_init(authorization_action='runbibindex', authorization_msg="BibIndex Task Submission", description="""Examples: \t%s -a -i 234-250,293,300-500 -u admin@localhost \t%s -a -w author,fulltext -M 8192 -v3 \t%s -d -m +4d -A on --flush=10000\n""" % ((sys.argv[0],) * 3), help_specific_usage=""" Indexing options: -a, --add\t\tadd or update words for selected records -d, --del\t\tdelete words for selected records -i, --id=low[-high]\t\tselect according to doc recID -m, --modified=from[,to]\tselect according to modification date -c, --collection=c1[,c2]\tselect according to collection -R, --reindex\treindex the selected indexes from scratch Repairing options: -k, --check\t\tcheck consistency for all records in the table(s) -r, --repair\t\ttry to repair all records in the table(s) Specific options: -w, --windex=w1[,w2]\tword/phrase indexes to consider (all) -M, --maxmem=XXX\tmaximum memory usage in kB (no limit) -f, --flush=NNN\t\tfull consistent table flush after NNN records (10000) """, version=__revision__, specific_params=("adi:m:c:w:krRM:f:", [ "add", "del", "id=", "modified=", "collection=", "windex=", "check", "repair", "reindex", "maxmem=", "flush=", ]), task_stop_helper_fnc=task_stop_table_close_fnc, task_submit_elaborate_specific_parameter_fnc=task_submit_elaborate_specific_parameter, task_run_fnc=task_run_core, task_submit_check_options_fnc=task_submit_check_options) def task_submit_check_options(): """Check for options compatibility.""" if task_get_option("reindex"): if task_get_option("cmd") != "add" or task_get_option('id') or task_get_option('collection'): print >> sys.stderr, "ERROR: You can use --reindex only when adding modified record." return False return True def task_submit_elaborate_specific_parameter(key, value, opts, args): """ Given the string key it checks it's meaning, eventually using the value. Usually it fills some key in the options dict. It must return True if it has elaborated the key, False, if it doesn't know that key. eg: if key in ['-n', '--number']: self.options['number'] = value return True return False """ if key in ("-a", "--add"): task_set_option("cmd", "add") if ("-x","") in opts or ("--del","") in opts: raise StandardError, "Can not have --add and --del at the same time!" elif key in ("-k", "--check"): task_set_option("cmd", "check") elif key in ("-r", "--repair"): task_set_option("cmd", "repair") elif key in ("-d", "--del"): task_set_option("cmd", "del") elif key in ("-i", "--id"): task_set_option('id', task_get_option('id') + split_ranges(value)) elif key in ("-m", "--modified"): task_set_option("modified", get_date_range(value)) elif key in ("-c", "--collection"): task_set_option("collection", value) elif key in ("-R", "--reindex"): task_set_option("reindex", True) elif key in ("-w", "--windex"): task_set_option("windex", value) elif key in ("-M", "--maxmem"): task_set_option("maxmem", int(value)) if task_get_option("maxmem") < base_process_size + 1000: raise StandardError, "Memory usage should be higher than %d kB" % \ (base_process_size + 1000) elif key in ("-f", "--flush"): task_set_option("flush", int(value)) else: return False return True def task_stop_table_close_fnc(): """ Close tables to STOP. """ global _last_word_table if _last_word_table: _last_word_table.put_into_db() def task_run_core(): """Runs the task by fetching arguments from the BibSched task queue. This is what BibSched will be invoking via daemon call. The task prints Fibonacci numbers for up to NUM on the stdout, and some messages on stderr. Return 1 in case of success and 0 in case of failure.""" global _last_word_table if task_get_option("cmd") == "check": wordTables = get_word_tables(task_get_option("windex")) for index_id, index_name, index_tags in wordTables: if index_name == 'year' and CFG_INSPIRE_SITE: fnc_get_words_from_phrase = get_words_from_date_tag else: fnc_get_words_from_phrase = get_words_from_phrase wordTable = WordTable(index_id, index_tags, 'idxWORD%02dF', fnc_get_words_from_phrase, {'8564_u': get_words_from_fulltext}) _last_word_table = wordTable wordTable.report_on_table_consistency() task_sleep_now_if_required(can_stop_too=True) - _last_word_table = None - return True - if task_get_option("cmd") == "check": - wordTables = get_word_tables(task_get_option("windex")) - for index_id, index_name, index_tags in wordTables: + wordTable = WordTable(index_id, index_tags, 'idxPAIR%02dF', get_pairs_from_phrase, {'8564_u': get_nothing_from_phrase}, False) + _last_word_table = wordTable + wordTable.report_on_table_consistency() + task_sleep_now_if_required(can_stop_too=True) + if index_name == 'author': fnc_get_phrases_from_phrase = get_fuzzy_authors_from_phrase elif index_name == 'exactauthor': fnc_get_phrases_from_phrase = get_exact_authors_from_phrase else: fnc_get_phrases_from_phrase = get_phrases_from_phrase wordTable = WordTable(index_id, index_tags, 'idxPHRASE%02dF', fnc_get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False) _last_word_table = wordTable wordTable.report_on_table_consistency() task_sleep_now_if_required(can_stop_too=True) _last_word_table = None return True # Let's work on single words! wordTables = get_word_tables(task_get_option("windex")) for index_id, index_name, index_tags in wordTables: + is_fulltext_index = index_name == 'fulltext' reindex_prefix = "" if task_get_option("reindex"): reindex_prefix = "tmp_" init_temporary_reindex_tables(index_id, reindex_prefix) if index_name == 'year' and CFG_INSPIRE_SITE: fnc_get_words_from_phrase = get_words_from_date_tag else: fnc_get_words_from_phrase = get_words_from_phrase wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxWORD%02dF', - fnc_get_words_from_phrase, {'8564_u': get_words_from_fulltext}) + fnc_get_words_from_phrase, {'8564_u': get_words_from_fulltext}, is_fulltext_index=is_fulltext_index) _last_word_table = wordTable wordTable.report_on_table_consistency() try: if task_get_option("cmd") == "del": if task_get_option("id"): wordTable.del_recIDs(task_get_option("id")) task_sleep_now_if_required(can_stop_too=True) elif task_get_option("collection"): l_of_colls = task_get_option("collection").split(",") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID,recID]) wordTable.del_recIDs(recIDs_range) task_sleep_now_if_required(can_stop_too=True) else: write_message("Missing IDs of records to delete from index %s." % wordTable.tablename, sys.stderr) raise StandardError elif task_get_option("cmd") == "add": if task_get_option("id"): wordTable.add_recIDs(task_get_option("id"), task_get_option("flush")) task_sleep_now_if_required(can_stop_too=True) elif task_get_option("collection"): l_of_colls = task_get_option("collection").split(",") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID,recID]) wordTable.add_recIDs(recIDs_range, task_get_option("flush")) task_sleep_now_if_required(can_stop_too=True) else: wordTable.add_recIDs_by_date(task_get_option("modified"), task_get_option("flush")) ## here we used to update last_updated info, if run via automatic mode; ## but do not update here anymore, since idxPHRASE will be acted upon later task_sleep_now_if_required(can_stop_too=True) elif task_get_option("cmd") == "repair": wordTable.repair(task_get_option("flush")) task_sleep_now_if_required(can_stop_too=True) else: write_message("Invalid command found processing %s" % \ wordTable.tablename, sys.stderr) raise StandardError + except StandardError, e: + write_message("Exception caught: %s" % e, sys.stderr) + register_exception(alert_admin=True) + task_update_status("ERROR") + if _last_word_table: + _last_word_table.put_into_db() + sys.exit(1) + + wordTable.report_on_table_consistency() + task_sleep_now_if_required(can_stop_too=True) + + # Let's work on pairs now + wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxPAIR%02dF', get_pairs_from_phrase, {'8564_u': get_nothing_from_phrase}, False) + _last_word_table = wordTable + wordTable.report_on_table_consistency() + try: + if task_get_option("cmd") == "del": + if task_get_option("id"): + wordTable.del_recIDs(task_get_option("id")) + task_sleep_now_if_required(can_stop_too=True) + elif task_get_option("collection"): + l_of_colls = task_get_option("collection").split(",") + recIDs = perform_request_search(c=l_of_colls) + recIDs_range = [] + for recID in recIDs: + recIDs_range.append([recID,recID]) + wordTable.del_recIDs(recIDs_range) + task_sleep_now_if_required(can_stop_too=True) + else: + write_message("Missing IDs of records to delete from index %s." % wordTable.tablename, + sys.stderr) + raise StandardError + elif task_get_option("cmd") == "add": + if task_get_option("id"): + wordTable.add_recIDs(task_get_option("id"), task_get_option("flush")) + task_sleep_now_if_required(can_stop_too=True) + elif task_get_option("collection"): + l_of_colls = task_get_option("collection").split(",") + recIDs = perform_request_search(c=l_of_colls) + recIDs_range = [] + for recID in recIDs: + recIDs_range.append([recID,recID]) + wordTable.add_recIDs(recIDs_range, task_get_option("flush")) + task_sleep_now_if_required(can_stop_too=True) + else: + wordTable.add_recIDs_by_date(task_get_option("modified"), task_get_option("flush")) + # let us update last_updated timestamp info, if run via automatic mode: + task_sleep_now_if_required(can_stop_too=True) + elif task_get_option("cmd") == "repair": + wordTable.repair(task_get_option("flush")) + task_sleep_now_if_required(can_stop_too=True) + else: + write_message("Invalid command found processing %s" % \ + wordTable.tablename, sys.stderr) + raise StandardError except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) register_exception() task_update_status("ERROR") if _last_word_table: _last_word_table.put_into_db() sys.exit(1) wordTable.report_on_table_consistency() task_sleep_now_if_required(can_stop_too=True) # Let's work on phrases now if index_name == 'author': fnc_get_phrases_from_phrase = get_fuzzy_authors_from_phrase elif index_name == 'exactauthor': fnc_get_phrases_from_phrase = get_exact_authors_from_phrase else: fnc_get_phrases_from_phrase = get_phrases_from_phrase wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxPHRASE%02dF', fnc_get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False) _last_word_table = wordTable wordTable.report_on_table_consistency() try: if task_get_option("cmd") == "del": if task_get_option("id"): wordTable.del_recIDs(task_get_option("id")) task_sleep_now_if_required(can_stop_too=True) elif task_get_option("collection"): l_of_colls = task_get_option("collection").split(",") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID,recID]) wordTable.del_recIDs(recIDs_range) task_sleep_now_if_required(can_stop_too=True) else: write_message("Missing IDs of records to delete from index %s." % wordTable.tablename, sys.stderr) raise StandardError elif task_get_option("cmd") == "add": if task_get_option("id"): wordTable.add_recIDs(task_get_option("id"), task_get_option("flush")) task_sleep_now_if_required(can_stop_too=True) elif task_get_option("collection"): l_of_colls = task_get_option("collection").split(",") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID,recID]) wordTable.add_recIDs(recIDs_range, task_get_option("flush")) task_sleep_now_if_required(can_stop_too=True) else: wordTable.add_recIDs_by_date(task_get_option("modified"), task_get_option("flush")) # let us update last_updated timestamp info, if run via automatic mode: update_index_last_updated(index_id, task_get_task_param('task_starting_time')) task_sleep_now_if_required(can_stop_too=True) elif task_get_option("cmd") == "repair": wordTable.repair(task_get_option("flush")) task_sleep_now_if_required(can_stop_too=True) else: write_message("Invalid command found processing %s" % \ wordTable.tablename, sys.stderr) raise StandardError except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) register_exception() task_update_status("ERROR") if _last_word_table: _last_word_table.put_into_db() sys.exit(1) wordTable.report_on_table_consistency() task_sleep_now_if_required(can_stop_too=True) if task_get_option("reindex"): swap_temporary_reindex_tables(index_id, reindex_prefix) update_index_last_updated(index_id, task_get_task_param('task_starting_time')) task_sleep_now_if_required(can_stop_too=True) _last_word_table = None return True ## import optional modules: try: import psyco psyco.bind(get_words_from_phrase) psyco.bind(WordTable.merge_with_old_recIDs) except: pass ### okay, here we go: if __name__ == '__main__': main() diff --git a/modules/bibindex/lib/bibindex_engine_config.py b/modules/bibindex/lib/bibindex_engine_config.py index 5ef7aa307..a31c0a103 100644 --- a/modules/bibindex/lib/bibindex_engine_config.py +++ b/modules/bibindex/lib/bibindex_engine_config.py @@ -1,65 +1,38 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ BibIndex indexing engine configuration parameters. """ __revision__ = \ "$Id$" ## configuration parameters read from the general config file: from invenio.config import \ - CFG_VERSION, CFG_SITE_NAME,\ - CFG_PATH_PDFTOTEXT, \ - CFG_PATH_PSTOTEXT, \ - CFG_PATH_PSTOASCII, \ - CFG_PATH_ANTIWORD, \ - CFG_PATH_CATDOC, \ - CFG_PATH_WVTEXT, \ - CFG_PATH_PPTHTML, \ - CFG_PATH_XLHTML, \ - CFG_PATH_HTMLTOTEXT, \ - CFG_PATH_GZIP - + CFG_VERSION, CFG_SITE_NAME ## version number: BIBINDEX_ENGINE_VERSION = "CDS Invenio/%s bibindex/%s" % (CFG_VERSION, CFG_VERSION) -## programs used to convert fulltext files to text: -CONV_PROGRAMS = { ### PS switched off at the moment, since PDF is faster - #"ps": [CFG_PATH_PSTOTEXT, CFG_PATH_PSTOASCII], - #"ps.gz": [CFG_PATH_PSTOTEXT, CFG_PATH_PSTOASCII], - "pdf": [CFG_PATH_PDFTOTEXT, CFG_PATH_PSTOTEXT, CFG_PATH_PSTOASCII], - "doc": [CFG_PATH_ANTIWORD, CFG_PATH_CATDOC, CFG_PATH_WVTEXT], - "ppt": [CFG_PATH_PPTHTML], - "xls": [CFG_PATH_XLHTML], - "htm": [CFG_PATH_HTMLTOTEXT], - "html": [CFG_PATH_HTMLTOTEXT],} - -## helper programs used if the above programs convert only to html or -## other intermediate file formats: -CONV_PROGRAMS_HELPERS = {"html": CFG_PATH_HTMLTOTEXT, - "gz": CFG_PATH_GZIP} - ## safety parameters concerning DB thread-multiplication problem: CFG_CHECK_MYSQL_THREADS = 0 # to check or not to check the problem? CFG_MAX_MYSQL_THREADS = 50 # how many threads (connections) we # consider as still safe CFG_MYSQL_THREAD_TIMEOUT = 20 # we'll kill threads that were sleeping # for more than X seconds diff --git a/modules/bibindex/lib/bibindexadminlib.py b/modules/bibindex/lib/bibindexadminlib.py index 1016e8c06..34fa39b2f 100644 --- a/modules/bibindex/lib/bibindexadminlib.py +++ b/modules/bibindex/lib/bibindexadminlib.py @@ -1,1720 +1,1737 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDS Invenio BibIndex Administrator Interface.""" __revision__ = "$Id$" import cgi import re import os import urllib import time import random from zlib import compress,decompress from invenio.config import \ CFG_SITE_LANG, \ CFG_VERSION, \ CFG_SITE_URL, \ CFG_BINDIR from invenio.bibrankadminlib import write_outcome,modify_translations,get_def_name,get_i8n_name,get_name,get_rnk_nametypes,get_languages,check_user,is_adminuser,addadminbox,tupletotable,tupletotable_onlyselected,addcheckboxes,createhiddenform from invenio.dbquery import run_sql, get_table_status_info from invenio.webpage import page, pageheaderonly, pagefooteronly, adderrorbox from invenio.webuser import getUid, get_email from invenio.bibindex_engine_stemmer import get_stemming_language_map import invenio.template websearch_templates = invenio.template.load('websearch') def getnavtrail(previous = ''): """Get the navtrail""" navtrail = """Admin Area """ % (CFG_SITE_URL,) navtrail = navtrail + previous return navtrail def perform_index(ln=CFG_SITE_LANG, mtype='', content=''): """start area for modifying indexes mtype - the method that called this method. content - the output from that method.""" fin_output = """
    0. Show all 1. Overview of indexes 2. Edit index 3. Add new index 4. Manage logical fields 5. Guide
    """ % (CFG_SITE_URL, ln, CFG_SITE_URL, ln, CFG_SITE_URL, ln, CFG_SITE_URL, ln, CFG_SITE_URL, ln, CFG_SITE_URL) if mtype == "perform_showindexoverview" and content: fin_output += content elif mtype == "perform_showindexoverview" or not mtype: fin_output += perform_showindexoverview(ln, callback='') if mtype == "perform_editindexes" and content: fin_output += content elif mtype == "perform_editindexes" or not mtype: fin_output += perform_editindexes(ln, callback='') if mtype == "perform_addindex" and content: fin_output += content elif mtype == "perform_addindex" or not mtype: fin_output += perform_addindex(ln, callback='') return addadminbox("Menu", [fin_output]) def perform_field(ln=CFG_SITE_LANG, mtype='', content=''): """Start area for modifying fields mtype - the method that called this method. content - the output from that method.""" fin_output = """
    0. Show all 1. Overview of logical fields 2. Edit logical field 3. Add new logical field 4. Manage Indexes 5. Guide
    """ % (CFG_SITE_URL, ln, CFG_SITE_URL, ln, CFG_SITE_URL, ln, CFG_SITE_URL, ln, CFG_SITE_URL, ln, CFG_SITE_URL) if mtype == "perform_showfieldoverview" and content: fin_output += content elif mtype == "perform_showfieldoverview" or not mtype: fin_output += perform_showfieldoverview(ln, callback='') if mtype == "perform_editfields" and content: fin_output += content elif mtype == "perform_editfields" or not mtype: fin_output += perform_editfields(ln, callback='') if mtype == "perform_addfield" and content: fin_output += content elif mtype == "perform_addfield" or not mtype: fin_output += perform_addfield(ln, callback='') return addadminbox("Menu", [fin_output]) def perform_editfield(fldID, ln=CFG_SITE_LANG, mtype='', content='', callback='yes', confirm=-1): """form to modify a field. this method is calling other methods which again is calling this and sending back the output of the method. if callback, the method will call perform_editcollection, if not, it will just return its output. fldID - id of the field mtype - the method that called this method. content - the output from that method.""" fld_dict = dict(get_def_name('', "field")) if fldID in [-1, "-1"]: return addadminbox("Edit logical field", ["""Please go back and select a logical field"""]) fin_output = """
    Menu
    0. Show all 1. Modify field code 2. Modify translations 3. Modify MARC tags 4. Delete field
    5. Show field usage
    """ % (CFG_SITE_URL, fldID, ln, CFG_SITE_URL, fldID, ln, CFG_SITE_URL, fldID, ln, CFG_SITE_URL, fldID, ln, CFG_SITE_URL, fldID, ln, CFG_SITE_URL, fldID, ln) if mtype == "perform_modifyfield" and content: fin_output += content elif mtype == "perform_modifyfield" or not mtype: fin_output += perform_modifyfield(fldID, ln, callback='') if mtype == "perform_modifyfieldtranslations" and content: fin_output += content elif mtype == "perform_modifyfieldtranslations" or not mtype: fin_output += perform_modifyfieldtranslations(fldID, ln, callback='') if mtype == "perform_modifyfieldtags" and content: fin_output += content elif mtype == "perform_modifyfieldtags" or not mtype: fin_output += perform_modifyfieldtags(fldID, ln, callback='') if mtype == "perform_deletefield" and content: fin_output += content elif mtype == "perform_deletefield" or not mtype: fin_output += perform_deletefield(fldID, ln, callback='') return addadminbox("Edit logical field '%s'" % fld_dict[int(fldID)], [fin_output]) def perform_editindex(idxID, ln=CFG_SITE_LANG, mtype='', content='', callback='yes', confirm=-1): """form to modify a index. this method is calling other methods which again is calling this and sending back the output of the method. idxID - id of the index mtype - the method that called this method. content - the output from that method.""" if idxID in [-1, "-1"]: return addadminbox("Edit index", ["""Please go back and select a index"""]) fin_output = """
    Menu
    0. Show all 1. Modify index name / descriptor 2. Modify translations 3. Modify index fields 4. Modify index stemming language 5. Delete index
    """ % (CFG_SITE_URL, idxID, ln, CFG_SITE_URL, idxID, ln, CFG_SITE_URL, idxID, ln, CFG_SITE_URL, idxID, ln, CFG_SITE_URL, idxID, ln, CFG_SITE_URL, idxID, ln) if mtype == "perform_modifyindex" and content: fin_output += content elif mtype == "perform_modifyindex" or not mtype: fin_output += perform_modifyindex(idxID, ln, callback='') if mtype == "perform_modifyindextranslations" and content: fin_output += content elif mtype == "perform_modifyindextranslations" or not mtype: fin_output += perform_modifyindextranslations(idxID, ln, callback='') if mtype == "perform_modifyindexfields" and content: fin_output += content elif mtype == "perform_modifyindexfields" or not mtype: fin_output += perform_modifyindexfields(idxID, ln, callback='') if mtype == "perform_modifyindexstemming" and content: fin_output += content elif mtype == "perform_modifyindexstemming" or not mtype: fin_output += perform_modifyindexstemming(idxID, ln, callback='') if mtype == "perform_deleteindex" and content: fin_output += content elif mtype == "perform_deleteindex" or not mtype: fin_output += perform_deleteindex(idxID, ln, callback='') return addadminbox("Edit index", [fin_output]) def perform_showindexoverview(ln=CFG_SITE_LANG, callback='', confirm=0): subtitle = """1. Overview of indexes""" output = """""" output += """""" % ("ID", "Name", "Fwd.Idx Size", "Rev.Idx Size", "Fwd.Idx Words", "Rev.Idx Records", "Last updated", "Fields", "Translations", "Stemming Language") idx = get_idx() idx_dict = dict(get_def_name('', "idxINDEX")) stemming_language_map = get_stemming_language_map() stemming_language_map_reversed = dict([(elem[1], elem[0]) for elem in stemming_language_map.iteritems()]) for idxID, idxNAME, idxDESC, idxUPD, idxSTEM in idx: forward_table_status_info = get_table_status_info('idxWORD%sF' % (idxID < 10 and '0%s' % idxID or idxID)) reverse_table_status_info = get_table_status_info('idxWORD%sR' % (idxID < 10 and '0%s' % idxID or idxID)) if str(idxUPD)[-3:] == ".00": idxUPD = str(idxUPD)[0:-3] lang = get_lang_list("idxINDEXNAME", "id_idxINDEX", idxID) idx_fld = get_idx_fld(idxID) fld = "" for row in idx_fld: fld += row[3] + ", " if fld.endswith(", "): fld = fld[:-2] if len(fld) == 0: fld = """None""" date = (idxUPD and idxUPD or """Not updated""") stemming_lang = stemming_language_map_reversed.get(idxSTEM, None) if not stemming_lang: stemming_lang = """None""" if forward_table_status_info and reverse_table_status_info: output += """""" % \ (idxID, """%s""" % (CFG_SITE_URL, idxID, ln, idxDESC, idx_dict.get(idxID, idxNAME)), "%s MB" % websearch_templates.tmpl_nice_number(forward_table_status_info['Data_length'] / 1048576.0, max_ndigits_after_dot=3), "%s MB" % websearch_templates.tmpl_nice_number(reverse_table_status_info['Data_length'] / 1048576.0, max_ndigits_after_dot=3), websearch_templates.tmpl_nice_number(forward_table_status_info['Rows']), websearch_templates.tmpl_nice_number(reverse_table_status_info['Rows'], max_ndigits_after_dot=3), date, fld, lang, stemming_lang) elif not forward_table_status_info: output += """""" % \ (idxID, """%s""" % (CFG_SITE_URL, idxID, ln, idx_dict.get(idxID, idxNAME)), "Error", "%s MB" % websearch_templates.tmpl_nice_number(reverse_table_status_info['Data_length'] / 1048576.0, max_ndigits_after_dot=3), "Error", websearch_templates.tmpl_nice_number(reverse_table_status_info['Rows'], max_ndigits_after_dot=3), date, "", lang) elif not reverse_table_status_info: output += """""" % \ (idxID, """%s""" % (CFG_SITE_URL, idxID, ln, idx_dict.get(idxID, idxNAME)), "%s MB" % websearch_templates.tmpl_nice_number(forward_table_status_info['Data_length'] / 1048576.0, max_ndigits_after_dot=3), "Error", websearch_templates.tmpl_nice_number(forward_table_status_info['Rows'], max_ndigits_after_dot=3), "Error", date, "", lang) output += "
    %s%s%s%s%s%s%s%s%s%s
    %s%s%s%s%s%s%s%s%s%s
    %s%s%s%s%s%s%s%s%s
    %s%s%s%s%s%s%s%s%s
    " body = [output] if callback: return perform_index(ln, "perform_showindexoverview", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_editindexes(ln=CFG_SITE_LANG, callback='yes', content='', confirm=-1): """show a list of indexes that can be edited.""" subtitle = """2. Edit index   [?]""" % (CFG_SITE_URL) fin_output = '' idx = get_idx() output = "" if len(idx) > 0: text = """ Index name """ output += createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/editindex" % CFG_SITE_URL, text=text, button="Edit", ln=ln, confirm=1) else: output += """No indexes exists""" body = [output] if callback: return perform_index(ln, "perform_editindexes", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_editfields(ln=CFG_SITE_LANG, callback='yes', content='', confirm=-1): """show a list of all logical fields that can be edited.""" subtitle = """2. Edit logical field   [?]""" % (CFG_SITE_URL) fin_output = '' res = get_fld() output = "" if len(res) > 0: text = """ Field name """ output += createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/editfield" % CFG_SITE_URL, text=text, button="Edit", ln=ln, confirm=1) else: output += """No logical fields exists""" body = [output] if callback: return perform_field(ln, "perform_editfields", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_addindex(ln=CFG_SITE_LANG, idxNAME='', callback="yes", confirm=-1): """form to add a new index. idxNAME - the name of the new index""" output = "" subtitle = """3. Add new index""" text = """ Index name
    """ % idxNAME output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addindex" % CFG_SITE_URL, text=text, ln=ln, button="Add index", confirm=1) if idxNAME and confirm in ["1", 1]: res = add_idx(idxNAME) output += write_outcome(res) + """
    Configure this index.""" % (CFG_SITE_URL, res[1], ln) elif confirm not in ["-1", -1]: output += """Please give the index a name. """ body = [output] if callback: return perform_index(ln, "perform_addindex", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyindextranslations(idxID, ln=CFG_SITE_LANG, sel_type='', trans=[], confirm=-1, callback='yes'): """Modify the translations of a index sel_type - the nametype to modify trans - the translations in the same order as the languages from get_languages()""" output = '' subtitle = '' langs = get_languages() if confirm in ["2", 2] and idxID: finresult = modify_translations(idxID, langs, sel_type, trans, "idxINDEX") idx_dict = dict(get_def_name('', "idxINDEX")) if idxID and idx_dict.has_key(int(idxID)): idxID = int(idxID) subtitle = """2. Modify translations for index.   [?]""" % CFG_SITE_URL if type(trans) is str: trans = [trans] if sel_type == '': sel_type = get_idx_nametypes()[0][0] header = ['Language', 'Translation'] actions = [] types = get_idx_nametypes() if len(types) > 1: text = """ Name type """ output += createhiddenform(action="modifyindextranslations#2", text=text, button="Select", idxID=idxID, ln=ln, confirm=0) if confirm in [-1, "-1", 0, "0"]: trans = [] for (key, value) in langs: try: trans_names = get_name(idxID, key, sel_type, "idxINDEX") trans.append(trans_names[0][0]) except StandardError, e: trans.append('') for nr in range(0,len(langs)): actions.append(["%s %s" % (langs[nr][1], (langs[nr][0]==CFG_SITE_LANG and '(def)' or ''))]) actions[-1].append('' % trans[nr]) text = tupletotable(header=header, tuple=actions) output += createhiddenform(action="modifyindextranslations#2", text=text, button="Modify", idxID=idxID, sel_type=sel_type, ln=ln, confirm=2) if sel_type and len(trans): if confirm in ["2", 2]: output += write_outcome(finresult) body = [output] if callback: return perform_editindex(idxID, ln, "perform_modifyindextranslations", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyfieldtranslations(fldID, ln=CFG_SITE_LANG, sel_type='', trans=[], confirm=-1, callback='yes'): """Modify the translations of a field sel_type - the nametype to modify trans - the translations in the same order as the languages from get_languages()""" output = '' subtitle = '' langs = get_languages() if confirm in ["2", 2] and fldID: finresult = modify_translations(fldID, langs, sel_type, trans, "field") fld_dict = dict(get_def_name('', "field")) if fldID and fld_dict.has_key(int(fldID)): fldID = int(fldID) subtitle = """3. Modify translations for logical field '%s'   [?]""" % (fld_dict[fldID], CFG_SITE_URL) if type(trans) is str: trans = [trans] if sel_type == '': sel_type = get_fld_nametypes()[0][0] header = ['Language', 'Translation'] actions = [] types = get_fld_nametypes() if len(types) > 1: text = """ Name type """ output += createhiddenform(action="modifyfieldtranslations#3", text=text, button="Select", fldID=fldID, ln=ln, confirm=0) if confirm in [-1, "-1", 0, "0"]: trans = [] for (key, value) in langs: try: trans_names = get_name(fldID, key, sel_type, "field") trans.append(trans_names[0][0]) except StandardError, e: trans.append('') for nr in range(0,len(langs)): actions.append(["%s %s" % (langs[nr][1], (langs[nr][0]==CFG_SITE_LANG and '(def)' or ''))]) actions[-1].append('' % trans[nr]) text = tupletotable(header=header, tuple=actions) output += createhiddenform(action="modifyfieldtranslations#3", text=text, button="Modify", fldID=fldID, sel_type=sel_type, ln=ln, confirm=2) if sel_type and len(trans): if confirm in ["2", 2]: output += write_outcome(finresult) body = [output] if callback: return perform_editfield(fldID, ln, "perform_modifytranslations", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_showdetailsfieldtag(fldID, tagID, ln=CFG_SITE_LANG, callback="yes", confirm=-1): """form to add a new field. fldNAME - the name of the new field code - the field code""" fld_dict = dict(get_def_name('', "field")) fldID = int(fldID) tagname = run_sql("SELECT name from tag where id=%s", (tagID, ))[0][0] output = "" subtitle = """Showing details for MARC tag '%s'""" % tagname output += "
    This MARC tag is used directly in these logical fields: " fld_tag = get_fld_tags('', tagID) exist = {} for (id_field,id_tag, tname, tvalue, score) in fld_tag: output += "%s, " % fld_dict[int(id_field)] exist[id_field] = 1 output += "
    This MARC tag is used indirectly in these logical fields: " tag = run_sql("SELECT value from tag where id=%s", (id_tag, )) tag = tag[0][0] for i in range(0, len(tag) - 1): res = run_sql("SELECT id_field,id_tag FROM field_tag,tag WHERE tag.id=field_tag.id_tag AND tag.value=%s", ('%s%%' % tag[0:i], )) for (id_field, id_tag) in res: output += "%s, " % fld_dict[int(id_field)] exist[id_field] = 1 res = run_sql("SELECT id_field,id_tag FROM field_tag,tag WHERE tag.id=field_tag.id_tag AND tag.value like %s", (tag, )) for (id_field, id_tag) in res: if not exist.has_key(id_field): output += "%s, " % fld_dict[int(id_field)] body = [output] if callback: return perform_modifyfieldtags(fldID, ln, "perform_showdetailsfieldtag", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_showdetailsfield(fldID, ln=CFG_SITE_LANG, callback="yes", confirm=-1): """form to add a new field. fldNAME - the name of the new field code - the field code""" fld_dict = dict(get_def_name('', "field")) col_dict = dict(get_def_name('', "collection")) fldID = int(fldID) col_fld = get_col_fld('', '', fldID) sort_types = dict(get_sort_nametypes()) fin_output = "" subtitle = """5. Show usage for logical field '%s'""" % fld_dict[fldID] output = "This logical field is used in these collections:
    " ltype = '' exist = {} for (id_collection, id_field, id_fieldvalue, ftype, score, score_fieldvalue) in col_fld: if ltype != ftype: output += "
    %s: " % sort_types[ftype] ltype = ftype exist = {} if not exist.has_key(id_collection): output += "%s, " % col_dict[int(id_collection)] exist[id_collection] = 1 if not col_fld: output = "This field is not used by any collections." fin_output = addadminbox('Collections', [output]) body = [fin_output] if callback: return perform_editfield(ln, "perform_showdetailsfield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_addfield(ln=CFG_SITE_LANG, fldNAME='', code='', callback="yes", confirm=-1): """form to add a new field. fldNAME - the name of the new field code - the field code""" output = "" subtitle = """3. Add new logical field""" code = str.replace(code,' ', '') text = """ Field name
    Field code
    """ % (fldNAME, code) output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addfield" % CFG_SITE_URL, text=text, ln=ln, button="Add field", confirm=1) if fldNAME and code and confirm in ["1", 1]: res = add_fld(fldNAME, code) output += write_outcome(res) elif confirm not in ["-1", -1]: output += """Please give the logical field a name and code. """ body = [output] if callback: return perform_field(ln, "perform_addfield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_deletefield(fldID, ln=CFG_SITE_LANG, callback='yes', confirm=0): """form to remove a field. fldID - the field id from table field. """ fld_dict = dict(get_def_name('', "field")) if not fld_dict.has_key(int(fldID)): return """Field does not exist""" subtitle = """4. Delete the logical field '%s'   [?]""" % (fld_dict[int(fldID)], CFG_SITE_URL) output = "" if fldID: fldID = int(fldID) if confirm in ["0", 0]: check = run_sql("SELECT id_field from idxINDEX_field where id_field=%s", (fldID, )) text = "" if check: text += """This field is used in an index, deletion may cause problems.
    """ text += """Do you want to delete the logical field '%s' and all its relations and definitions.""" % (fld_dict[fldID]) output += createhiddenform(action="deletefield#4", text=text, button="Confirm", fldID=fldID, confirm=1) elif confirm in ["1", 1]: res = delete_fld(fldID) if res[0] == 1: return """
    Field deleted.""" + write_outcome(res) else: output += write_outcome(res) body = [output] if callback: return perform_editfield(fldID, ln, "perform_deletefield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_deleteindex(idxID, ln=CFG_SITE_LANG, callback='yes', confirm=0): """form to delete an index. idxID - the index id from table idxINDEX. """ if idxID: subtitle = """5. Delete the index.   [?]""" % CFG_SITE_URL output = "" if confirm in ["0", 0]: idx = get_idx(idxID) if idx: text = "" text += """By deleting an index, you may also loose any indexed data in the forward and reverse table for this index.
    """ text += """Do you want to delete the index '%s' and all its relations and definitions.""" % (idx[0][1]) output += createhiddenform(action="deleteindex#5", text=text, button="Confirm", idxID=idxID, confirm=1) else: return """
    Index specified does not exist.""" elif confirm in ["1", 1]: res = delete_idx(idxID) if res[0] == 1: return """
    Index deleted.""" + write_outcome(res) else: output += write_outcome(res) body = [output] if callback: return perform_editindex(idxID, ln, "perform_deleteindex", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_showfieldoverview(ln=CFG_SITE_LANG, callback='', confirm=0): subtitle = """1. Logical fields overview""" output = """""" output += """""" % ("Field", "MARC Tags", "Translations") query = "SELECT id,name FROM field" res = run_sql(query) col_dict = dict(get_def_name('', "collection")) fld_dict = dict(get_def_name('', "field")) for field_id,field_name in res: query = "SELECT tag.value FROM tag, field_tag WHERE tag.id=field_tag.id_tag AND field_tag.id_field=%s ORDER BY field_tag.score DESC,tag.value ASC" res = run_sql(query, (field_id, )) field_tags = "" for row in res: field_tags = field_tags + row[0] + ", " if field_tags.endswith(", "): field_tags = field_tags[:-2] if not field_tags: field_tags = """None""" lang = get_lang_list("fieldname", "id_field", field_id) output += """""" % ("""%s""" % (CFG_SITE_URL, field_id, ln, fld_dict[field_id]), field_tags, lang) output += "
    %s%s%s
    %s%s%s
    " body = [output] if callback: return perform_field(ln, "perform_showfieldoverview", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyindex(idxID, ln=CFG_SITE_LANG, idxNAME='', idxDESC='', callback='yes', confirm=-1): """form to modify an index name. idxID - the index name to change. idxNAME - new name of index idxDESC - description of index content""" subtitle = "" output = "" idx = get_idx(idxID) if not idx: idxID = -1 if idxID not in [-1, "-1"]: subtitle = """1. Modify index name.   [?]""" % CFG_SITE_URL if confirm in [-1, "-1"]: idxNAME = idx[0][1] idxDESC = idx[0][2] text = """ Index name
    Index description
    """ % (idxNAME, idxDESC) output += createhiddenform(action="modifyindex#1", text=text, button="Modify", idxID=idxID, ln=ln, confirm=1) if idxID > -1 and idxNAME and confirm in [1, "1"]: res = modify_idx(idxID, idxNAME, idxDESC) output += write_outcome(res) elif confirm in [1, "1"]: output += """
    Please give a name for the index.""" else: output = """No index to modify.""" body = [output] if callback: return perform_editindex(idxID, ln, "perform_modifyindex", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyindexstemming(idxID, ln=CFG_SITE_LANG, idxSTEM='', callback='yes', confirm=-1): """form to modify an index name. idxID - the index name to change. idxSTEM - new stemming language code""" subtitle = "" output = "" stemming_language_map = get_stemming_language_map() stemming_language_map['None'] = '' idx = get_idx(idxID) if not idx: idxID = -1 if idxID not in [-1, "-1"]: subtitle = """4. Modify index stemming language.   [?]""" % CFG_SITE_URL if confirm in [-1, "-1"]: idxSTEM = idx[0][4] if not idxSTEM: idxSTEM = '' language_html_element = """""" text = """ Index stemming language """ + language_html_element output += createhiddenform(action="modifyindexstemming#4", text=text, button="Modify", idxID=idxID, ln=ln, confirm=0) if confirm in [0, "0"] and get_idx(idxID)[0][4] == idxSTEM: output += """Stemming language has not been changed""" elif confirm in [0, "0"]: text = """ You are going to change the stemming language for this index. Please note you should not enable stemming for structured-data indexes like "report number", "year", "author" or "collection". On the contrary, it is advisable to enable stemming for indexes like "fulltext", "abstract", "title", etc. since this would improve retrieval quality.
    Beware that after changing the stemming language of an index you will have to reindex it. It is a good idea to change the stemming language and to reindex during low usage hours of your service, since searching will be not functional until the reindexing will be completed
    .
    Are you sure you want to change the stemming language of this index?""" output += createhiddenform(action="modifyindexstemming#4", text=text, button="Modify", idxID=idxID, idxSTEM=idxSTEM, ln=ln, confirm=1) elif idxID > -1 and confirm in [1, "1"]: res = modify_idx_stemming(idxID, idxSTEM) output += write_outcome(res) output += """
    Please note you must run as soon as possible:
    $> %s/bibindex --reindex -w %s
    """ % (CFG_BINDIR, get_idx(idxID)[0][1]) elif confirm in [1, "1"]: output += """
    Please give a name for the index.""" else: output = """No index to modify.""" body = [output] if callback: return perform_editindex(idxID, ln, "perform_modifyindexstemming", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyfield(fldID, ln=CFG_SITE_LANG, code='', callback='yes', confirm=-1): """form to modify a field. fldID - the field to change.""" subtitle = "" output = "" fld_dict = dict(get_def_name('', "field")) if fldID not in [-1, "-1"]: if confirm in [-1, "-1"]: res = get_fld(fldID) code = res[0][2] else: code = str.replace("%s" % code, " ", "") fldID = int(fldID) subtitle = """1. Modify field code for logical field '%s'   [?]""" % (fld_dict[int(fldID)], CFG_SITE_URL) text = """ Field code
    """ % code output += createhiddenform(action="modifyfield#2", text=text, button="Modify", fldID=fldID, ln=ln, confirm=1) if fldID > -1 and confirm in [1, "1"]: fldID = int(fldID) res = modify_fld(fldID, code) output += write_outcome(res) else: output = """No field to modify. """ body = [output] if callback: return perform_editfield(fldID, ln, "perform_modifyfield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyindexfields(idxID, ln=CFG_SITE_LANG, callback='yes', content='', confirm=-1): """Modify which logical fields to use in this index..""" output = '' subtitle = """3. Modify index fields.   [?]""" % CFG_SITE_URL output = """
    Menu
    Add field to index
    Manage fields
    """ % (CFG_SITE_URL, idxID, ln, CFG_SITE_URL, ln) header = ['Field', ''] actions = [] idx_fld = get_idx_fld(idxID) if len(idx_fld) > 0: for (idxID, idxNAME,fldID, fldNAME, regexp_punct, regexp_alpha_sep) in idx_fld: actions.append([fldNAME]) for col in [(('Remove','removeindexfield'),)]: actions[-1].append('%s' % (CFG_SITE_URL, col[0][1], idxID, fldID, ln, col[0][0])) for (str, function) in col[1:]: actions[-1][-1] += ' / %s' % (CFG_SITE_URL, function, idxID, fldID, ln, str) output += tupletotable(header=header, tuple=actions) else: output += """No index fields exists""" output += content body = [output] if callback: return perform_editindex(idxID, ln, "perform_modifyindexfields", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyfieldtags(fldID, ln=CFG_SITE_LANG, callback='yes', content='', confirm=-1): """show the sort fields of this collection..""" output = '' fld_dict = dict(get_def_name('', "field")) fld_type = get_fld_nametypes() fldID = int(fldID) subtitle = """3. Modify MARC tags for the logical field '%s'   [?]""" % (fld_dict[int(fldID)], CFG_SITE_URL) output = """
    Menu
    Add MARC tag
    Delete unused MARC tags
    """ % (CFG_SITE_URL, fldID, ln, CFG_SITE_URL, fldID, ln) header = ['', 'Value', 'Comment', 'Actions'] actions = [] res = get_fld_tags(fldID) if len(res) > 0: i = 0 for (fldID, tagID, tname, tvalue, score) in res: move = "" if i != 0: move += """""" % (CFG_SITE_URL, fldID, tagID, res[i - 1][1], ln, random.randint(0, 1000), CFG_SITE_URL) else: move += "   " i += 1 if i != len(res): move += '' % (CFG_SITE_URL, fldID, tagID, res[i][1], ln, random.randint(0, 1000), CFG_SITE_URL) actions.append([move, tvalue, tname]) for col in [(('Details','showdetailsfieldtag'), ('Modify','modifytag'),('Remove','removefieldtag'),)]: actions[-1].append('%s' % (CFG_SITE_URL, col[0][1], fldID, tagID, ln, col[0][0])) for (str, function) in col[1:]: actions[-1][-1] += ' / %s' % (CFG_SITE_URL, function, fldID, tagID, ln, str) output += tupletotable(header=header, tuple=actions) else: output += """No fields exists""" output += content body = [output] if callback: return perform_editfield(fldID, ln, "perform_modifyfieldtags", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_addtag(fldID, ln=CFG_SITE_LANG, value=['',-1], name='', callback="yes", confirm=-1): """form to add a new field. fldNAME - the name of the new field code - the field code""" output = "" subtitle = """Add MARC tag to logical field""" text = """ Add new tag:
    Tag value
    Tag comment
    """ % ((name=='' and value[0] or name), value[0]) text += """Or existing tag:
    Tag """ output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addtag" % CFG_SITE_URL, text=text, fldID=fldID, ln=ln, button="Add tag", confirm=1) if (value[0] and value[1] in [-1, "-1"]) or (not value[0] and value[1] not in [-1, "-1"]): if confirm in ["1", 1]: res = add_fld_tag(fldID, name, (value[0] !='' and value[0] or value[1])) output += write_outcome(res) elif confirm not in ["-1", -1]: output += """Please choose to add either a new or an existing MARC tag, but not both. """ body = [output] if callback: return perform_modifyfieldtags(fldID, ln, "perform_addtag", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifytag(fldID, tagID, ln=CFG_SITE_LANG, name='', value='', callback='yes', confirm=-1): """form to modify a field. fldID - the field to change.""" subtitle = "" output = "" fld_dict = dict(get_def_name('', "field")) fldID = int(fldID) tagID = int(tagID) tag = get_tags(tagID) if confirm in [-1, "-1"] and not value and not name: name = tag[0][1] value = tag[0][2] subtitle = """Modify MARC tag""" text = """ Any modifications will apply to all logical fields using this tag.
    Tag value
    Comment
    """ % (value, name) output += createhiddenform(action="modifytag#4.1", text=text, button="Modify", fldID=fldID, tagID=tagID, ln=ln, confirm=1) if name and value and confirm in [1, "1"]: res = modify_tag(tagID, name, value) output += write_outcome(res) body = [output] if callback: return perform_modifyfieldtags(fldID, ln, "perform_modifytag", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_removefieldtag(fldID, tagID, ln=CFG_SITE_LANG, callback='yes', confirm=0): """form to remove a tag from a field. fldID - the current field, remove the tag from this field. tagID - remove the tag with this id""" subtitle = """Remove MARC tag from logical field""" output = "" fld_dict = dict(get_def_name('', "field")) if fldID and tagID: fldID = int(fldID) tagID = int(tagID) tag = get_fld_tags(fldID, tagID) if confirm not in ["1", 1]: text = """Do you want to remove the tag '%s - %s ' from the field '%s'.""" % (tag[0][3], tag[0][2], fld_dict[fldID]) output += createhiddenform(action="removefieldtag#4.1", text=text, button="Confirm", fldID=fldID, tagID=tagID, confirm=1) elif confirm in ["1", 1]: res = remove_fldtag(fldID, tagID) output += write_outcome(res) body = [output] if callback: return perform_modifyfieldtags(fldID, ln, "perform_removefieldtag", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_addindexfield(idxID, ln=CFG_SITE_LANG, fldID='', callback="yes", confirm=-1): """form to add a new field. fldNAME - the name of the new field code - the field code""" output = "" subtitle = """Add logical field to index""" text = """ Field name """ output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addindexfield" % CFG_SITE_URL, text=text, idxID=idxID, ln=ln, button="Add field", confirm=1) if fldID and not fldID in [-1, "-1"] and confirm in ["1", 1]: res = add_idx_fld(idxID, fldID) output += write_outcome(res) elif confirm in ["1", 1]: output += """Please select a field to add.""" body = [output] if callback: return perform_modifyindexfields(idxID, ln, "perform_addindexfield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_removeindexfield(idxID, fldID, ln=CFG_SITE_LANG, callback='yes', confirm=0): """form to remove a field from an index. idxID - the current index, remove the field from this index. fldID - remove the field with this id""" subtitle = """Remove field from index""" output = "" if fldID and idxID: fldID = int(fldID) idxID = int(idxID) fld = get_fld(fldID) idx = get_idx(idxID) if fld and idx and confirm not in ["1", 1]: text = """Do you want to remove the field '%s' from the index '%s'.""" % (fld[0][1], idx[0][1]) output += createhiddenform(action="removeindexfield#3.1", text=text, button="Confirm", idxID=idxID, fldID=fldID, confirm=1) elif confirm in ["1", 1]: res = remove_idxfld(idxID, fldID) output += write_outcome(res) body = [output] if callback: return perform_modifyindexfields(idxID, ln, "perform_removeindexfield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_switchtagscore(fldID, id_1, id_2, ln=CFG_SITE_LANG): """Switch the score of id_1 and id_2 in the table type. colID - the current collection id_1/id_2 - the id's to change the score for. type - like "format" """ output = "" name_1 = run_sql("select name from tag where id=%s", (id_1, ))[0][0] name_2 = run_sql("select name from tag where id=%s", (id_2, ))[0][0] res = switch_score(fldID, id_1, id_2) output += write_outcome(res) return perform_modifyfieldtags(fldID, ln, content=output) def perform_deletetag(fldID, ln=CFG_SITE_LANG, tagID=-1, callback='yes', confirm=-1): """form to delete an MARC tag not in use. fldID - the collection id of the current collection. fmtID - the format id to delete.""" subtitle = """Delete an unused MARC tag""" output = """
    Deleting an MARC tag will also delete the translations associated.
    """ fldID = int(fldID) if tagID not in [-1," -1"] and confirm in [1, "1"]: ares = delete_tag(tagID) fld_tag = get_fld_tags() fld_tag = dict(map(lambda x: (x[1], x[0]), fld_tag)) tags = get_tags() text = """ MARC tag
    """ if i == 0: output += """No unused MARC tags
    """ else: output += createhiddenform(action="deletetag#4.1", text=text, button="Delete", fldID=fldID, ln=ln, confirm=0) if tagID not in [-1,"-1"]: tagID = int(tagID) tags = get_tags(tagID) if confirm in [0, "0"]: text = """Do you want to delete the MARC tag '%s'.""" % tags[0][2] output += createhiddenform(action="deletetag#4.1", text=text, button="Confirm", fldID=fldID, tagID=tagID, ln=ln, confirm=1) elif confirm in [1, "1"]: output += write_outcome(ares) elif confirm not in [-1, "-1"]: output += """Choose a MARC tag to delete.""" body = [output] output = "
    " + addadminbox(subtitle, body) return perform_modifyfieldtags(fldID, ln, content=output) def compare_on_val(first, second): """Compare the two values""" return cmp(first[1], second[1]) def get_col_fld(colID=-1, type = '', id_field=''): """Returns either all portalboxes associated with a collection, or based on either colID or language or both. colID - collection id ln - language id""" sql = "SELECT id_collection,id_field,id_fieldvalue,type,score,score_fieldvalue FROM collection_field_fieldvalue, field WHERE id_field=field.id" params = [] try: if id_field: sql += " AND id_field=%s" params.append(id_field) sql += " ORDER BY type, score desc, score_fieldvalue desc" res = run_sql(sql, tuple(params)) return res except StandardError, e: return "" def get_idx(idxID=''): sql = "SELECT id,name,description,last_updated,stemming_language FROM idxINDEX" params = [] try: if idxID: sql += " WHERE id=%s" params.append(idxID) sql += " ORDER BY id asc" res = run_sql(sql, tuple(params)) return res except StandardError, e: return "" def get_fld_tags(fldID='', tagID=''): """Returns tags associated with a field. fldID - field id tagID - tag id""" sql = "SELECT id_field,id_tag, tag.name, tag.value, score FROM field_tag,tag WHERE tag.id=field_tag.id_tag" params = [] try: if fldID: sql += " AND id_field=%s" params.append(fldID) if tagID: sql += " AND id_tag=%s" params.append(tagID) sql += " ORDER BY score desc, tag.value, tag.name" res = run_sql(sql, tuple(params)) return res except StandardError, e: return "" def get_tags(tagID=''): """Returns all or a given tag. tagID - tag id ln - language id""" sql = "SELECT id, name, value FROM tag" params = [] try: if tagID: sql += " WHERE id=%s" params.append(tagID) sql += " ORDER BY name, value" res = run_sql(sql, tuple(params)) return res except StandardError, e: return "" def get_fld(fldID=''): """Returns all fields or only the given field""" try: if not fldID: res = run_sql("SELECT id, name, code FROM field ORDER by name, code") else: res = run_sql("SELECT id, name, code FROM field WHERE id=%s ORDER by name, code", (fldID, )) return res except StandardError, e: return "" def get_fld_value(fldvID = ''): """Returns fieldvalue""" try: sql = "SELECT id, name, value FROM fieldvalue" params = [] if fldvID: sql += " WHERE id=%s" params.append(fldvID) res = run_sql(sql, tuple(params)) return res except StandardError, e: return "" def get_idx_fld(idxID=''): """Return a list of fields associated with one or all indexes""" try: sql = "SELECT id_idxINDEX, idxINDEX.name, id_field, field.name, regexp_punctuation, regexp_alphanumeric_separators FROM idxINDEX, field, idxINDEX_field WHERE idxINDEX.id = idxINDEX_field.id_idxINDEX AND field.id = idxINDEX_field.id_field" params = [] if idxID: sql += " AND id_idxINDEX=%s" params.append(idxID) sql += " ORDER BY id_idxINDEX asc" res = run_sql(sql, tuple(params)) return res except StandardError, e: return "" def get_col_nametypes(): """Return a list of the various translationnames for the fields""" type = [] type.append(('ln', 'Long name')) return type def get_fld_nametypes(): """Return a list of the various translationnames for the fields""" type = [] type.append(('ln', 'Long name')) return type def get_idx_nametypes(): """Return a list of the various translationnames for the index""" type = [] type.append(('ln', 'Long name')) return type def get_sort_nametypes(): """Return a list of the various translationnames for the fields""" type = {} type['soo'] = 'Sort options' type['seo'] = 'Search options' type['sew'] = 'Search within' return type def remove_fld(colID,fldID, fldvID=''): """Removes a field from the collection given. colID - the collection the format is connected to fldID - the field which should be removed from the collection.""" try: sql = "DELETE FROM collection_field_fieldvalue WHERE id_collection=%s AND id_field=%s" params = [colID, fldID] if fldvID: sql += " AND id_fieldvalue=%s" params.append(fldvID) res = run_sql(sql, tuple(params)) return (1, "") except StandardError, e: return (0, e) def remove_idxfld(idxID, fldID): """Remove a field from a index in table idxINDEX_field idxID - index id from idxINDEX fldID - field id from field table""" try: sql = "DELETE FROM idxINDEX_field WHERE id_field=%s and id_idxINDEX=%s" res = run_sql(sql, (fldID, idxID)) return (1, "") except StandardError, e: return (0, e) def remove_fldtag(fldID,tagID): """Removes a tag from the field given. fldID - the field the tag is connected to tagID - the tag which should be removed from the field.""" try: sql = "DELETE FROM field_tag WHERE id_field=%s AND id_tag=%s" res = run_sql(sql, (fldID, tagID)) return (1, "") except StandardError, e: return (0, e) def delete_tag(tagID): """Deletes all data for the given field fldID - delete all data in the tables associated with field and this id """ try: res = run_sql("DELETE FROM tag where id=%s", (tagID, )) return (1, "") except StandardError, e: return (0, e) def delete_idx(idxID): """Deletes all data for the given index together with the idxWORDXXR and idxWORDXXF tables""" try: idxID = int(idxID) res = run_sql("DELETE FROM idxINDEX WHERE id=%s", (idxID, )) res = run_sql("DELETE FROM idxINDEXNAME WHERE id_idxINDEX=%s", (idxID, )) res = run_sql("DELETE FROM idxINDEX_field WHERE id_idxINDEX=%s", (idxID, )) res = run_sql("DROP TABLE idxWORD%02dF" % idxID) res = run_sql("DROP TABLE idxWORD%02dR" % idxID) + res = run_sql("DROP TABLE idxPAIR%02dF" % idxID) + res = run_sql("DROP TABLE idxPAIR%02dR" % idxID) res = run_sql("DROP TABLE idxPHRASE%02dF" % idxID) res = run_sql("DROP TABLE idxPHRASE%02dR" % idxID) return (1, "") except StandardError, e: return (0, e) def delete_fld(fldID): """Deletes all data for the given field fldID - delete all data in the tables associated with field and this id """ try: res = run_sql("DELETE FROM collection_field_fieldvalue WHERE id_field=%s", (fldID, )) res = run_sql("DELETE FROM field_tag WHERE id_field=%s", (fldID, )) res = run_sql("DELETE FROM idxINDEX_field WHERE id_field=%s", (fldID, )) res = run_sql("DELETE FROM field WHERE id=%s", (fldID, )) return (1, "") except StandardError, e: return (0, e) def add_idx(idxNAME): """Add a new index. returns the id of the new index. idxID - the id for the index, number idxNAME - the default name for the default language of the format.""" try: idxID = 0 res = run_sql("SELECT id from idxINDEX WHERE name=%s", (idxNAME,)) if res: return (0, (0, "A index with the given name already exists.")) for i in xrange(1, 100): res = run_sql("SELECT id from idxINDEX WHERE id=%s", (i, )) res2 = get_table_status_info("idxWORD%02d%%" % i) if not res and not res2: idxID = i break if idxID == 0: return (0, (0, "Not possible to create new indexes, delete an index and try again.")) res = run_sql("INSERT INTO idxINDEX (id, name) VALUES (%s,%s)", (idxID, idxNAME)) type = get_idx_nametypes()[0][0] res = run_sql("INSERT INTO idxINDEXNAME (id_idxINDEX, ln, type, value) VALUES (%s,%s,%s,%s)", (idxID, CFG_SITE_LANG, type, idxNAME)) res = run_sql("""CREATE TABLE IF NOT EXISTS idxWORD%02dF ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) ENGINE=MyISAM""" % idxID) res = run_sql("""CREATE TABLE IF NOT EXISTS idxWORD%02dR ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) ENGINE=MyISAM""" % idxID) + res = run_sql("""CREATE TABLE IF NOT EXISTS idxPAIR%02dF ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) + ) ENGINE=MyISAM""" % idxID) + + res = run_sql("""CREATE TABLE IF NOT EXISTS idxPAIR%02dR ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) + ) ENGINE=MyISAM""" % idxID) + res = run_sql("""CREATE TABLE IF NOT EXISTS idxPHRASE%02dF ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) ENGINE=MyISAM""" % idxID) res = run_sql("""CREATE TABLE IF NOT EXISTS idxPHRASE%02dR ( id_bibrec mediumint(9) unsigned NOT NULL default '0', termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) ENGINE=MyISAM""" % idxID) res = run_sql("SELECT id from idxINDEX WHERE id=%s", (idxID, )) res2 = get_table_status_info("idxWORD%02dF" % idxID) res3 = get_table_status_info("idxWORD%02dR" % idxID) if res and res2 and res3: return (1, res[0][0]) elif not res: return (0, (0, "Could not add the new index to idxINDEX")) elif not res2: return (0, (0, "Forward table not created for unknown reason.")) elif not res3: return (0, (0, "Reverse table not created for unknown reason.")) except StandardError, e: return (0, e) def add_fld(name, code): """Add a new logical field. Returns the id of the field. code - the code for the field, name - the default name for the default language of the field.""" try: type = get_fld_nametypes()[0][0] res = run_sql("INSERT INTO field (name, code) VALUES (%s,%s)", (name, code)) fldID = run_sql("SELECT id FROM field WHERE code=%s", (code,)) res = run_sql("INSERT INTO fieldname (id_field, type, ln, value) VALUES (%s,%s,%s,%s)", (fldID[0][0], type, CFG_SITE_LANG, name)) if fldID: return (1, fldID[0][0]) else: raise StandardError except StandardError, e: return (0, e) def add_fld_tag(fldID, name, value): """Add a sort/search/field to the collection. colID - the id of the collection involved fmtID - the id of the format. score - the score of the format, decides sorting, if not given, place the format on top""" try: res = run_sql("SELECT score FROM field_tag WHERE id_field=%s ORDER BY score desc", (fldID, )) if res: score = int(res[0][0]) + 1 else: score = 0 res = run_sql("SELECT id FROM tag WHERE value=%s", (value,)) if not res: if name == '': name = value res = run_sql("INSERT INTO tag (name, value) VALUES (%s,%s)", (name, value)) res = run_sql("SELECT id FROM tag WHERE value=%s", (value,)) res = run_sql("INSERT INTO field_tag(id_field, id_tag, score) values(%s, %s, %s)", (fldID, res[0][0], score)) return (1, "") except StandardError, e: return (0, e) def add_idx_fld(idxID, fldID): """Add a field to an index""" try: sql = "SELECT id_idxINDEX FROM idxINDEX_field WHERE id_idxINDEX=%s and id_field=%s" res = run_sql(sql, (idxID, fldID)) if res: return (0, (0, "The field selected already exists for this index")) sql = "INSERT INTO idxINDEX_field(id_idxINDEX, id_field) values (%s, %s)" res = run_sql(sql, (idxID, fldID)) return (1, "") except StandardError, e: return (0, e) def modify_idx(idxID, idxNAME, idxDESC): """Modify index name or index description in idxINDEX table""" try: res = run_sql("UPDATE idxINDEX SET name=%s WHERE id=%s", (idxNAME, idxID)) res = run_sql("UPDATE idxINDEX SET description=%s WHERE ID=%s", (idxDESC, idxID)) return (1, "") except StandardError, e: return (0, e) def modify_idx_stemming(idxID, idxSTEM): """Modify the index stemming language in idxINDEX table""" try: res = run_sql("UPDATE idxINDEX SET stemming_language=%s WHERE ID=%s", (idxSTEM, idxID)) return (1, "") except StandardError, e: return (0, e) def modify_fld(fldID, code): """Modify the code of field fldID - the id of the field to modify code - the new code""" try: sql = "UPDATE field SET code=%s" sql += " WHERE id=%s" res = run_sql(sql, (code, fldID)) return (1, "") except StandardError, e: return (0, e) def modify_tag(tagID, name, value): """Modify the name and value of a tag. tagID - the id of the tag to modify name - the new name of the tag value - the new value of the tag""" try: sql = "UPDATE tag SET name=%s WHERE id=%s" res = run_sql(sql, (name, tagID)) sql = "UPDATE tag SET value=%s WHERE id=%s" res = run_sql(sql, (value, tagID)) return (1, "") except StandardError, e: return (0, e) def switch_score(fldID, id_1, id_2): """Switch the scores of id_1 and id_2 in the table given by the argument. colID - collection the id_1 or id_2 is connected to id_1/id_2 - id field from tables like format..portalbox... table - name of the table""" try: res1 = run_sql("SELECT score FROM field_tag WHERE id_field=%s and id_tag=%s", (fldID, id_1)) res2 = run_sql("SELECT score FROM field_tag WHERE id_field=%s and id_tag=%s", (fldID, id_2)) res = run_sql("UPDATE field_tag SET score=%s WHERE id_field=%s and id_tag=%s", (res2[0][0], fldID, id_1)) res = run_sql("UPDATE field_tag SET score=%s WHERE id_field=%s and id_tag=%s", (res1[0][0], fldID, id_2)) return (1, "") except StandardError, e: return (0, e) def get_lang_list(table, field, id): langs = run_sql("select ln from %s where %s=%%s" % (table, field), (id, )) exists = {} lang = '' for lng in langs: if not exists.has_key(lng[0]): lang += lng[0] + ", " exists[lng[0]] = 1 if lang.endswith(", "): lang = lang [:-2] if len(exists) == 0: lang = """None""" return lang diff --git a/modules/bibrank/doc/admin/bibrank-admin-guide.webdoc b/modules/bibrank/doc/admin/bibrank-admin-guide.webdoc index d866d7c67..6374ed0e4 100644 --- a/modules/bibrank/doc/admin/bibrank-admin-guide.webdoc +++ b/modules/bibrank/doc/admin/bibrank-admin-guide.webdoc @@ -1,551 +1,551 @@ ## -*- mode: html; coding: utf-8; -*- ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

    Contents

    1.Overview
    2.Configuration Conventions
    3.BibRank Admin Interface
           3.1.Main interface
           3.2.Add rank method
           3.3.Show details of rank method
           3.4.Modify rank method
           3.5.Delete rank method
           3.6.Modify translations
           3.7.Modify visibility toward collections
    4.BibRank Daemon
           4.1.Command Line Interface
           4.2.Using BibRank
    5.BibRank Methods
          5.1.Single tag rank method
          5.2.Word Similarity/Similar Records
          5.3.Combined method
    6.bibrankgkb Tool
          6.1.Command Line Interface
          6.2.Using bibrankgkb
    7.Additional Information

    1. Overview

    The bibrank module consist currently of two tools:

    bibrank - Generates ranking data for ranking search results based on methods like:

     Journal Impact Factor
     Word Similarity/Similar Records
     Combined Method
      ##Number of downloads
      ##Author Impact
      ##Citation Impact
     

    bibrankgkb - For generating knowledge base files for use with bibrank

    The bibrankgkb may not be necessary to use, it depends on which ranking methods you are planning to use, and what data you already got. This guide will take you through the necessary steps in detail in order to create different kinds of ranking methods for the search engine to use.

    2. Configuration Conventions

     - comment line starts with '#' sign in the first column
     - each section in a configuration file is declared inside '[' ']' signs
     - values in knowledgebasefiles are separated by '---'
     

    3. BibRank Admin Interface

    The bibrank web interface enables you to modify the configuration of most aspects of BibRank. For full functionality, it is advised to let the http-daemon have write/read access to your invenio/etc/bibrank directory. If this is not wanted, you have to edit the configuration files from the console using your favourite text editor.

    3.1 Main interface

    In the main interface screen, you see a list of all rank methods currently added. Each rank method is identified by the rank method code. To find out about the functionality available, check out the topics below.

    Explanation of concepts
     Rank method:
     A method responsible for creating the necessary data to rank a result.
     Translations:
     Each rank method may have many names in many languages.
     Collections:
     Which collections the rank method should be visible in.
     

    3.2 Add rank method

    When pressing the link in the upper right corner from the main interface, you will see the interface for adding a new rank method. The two available options that needs to be decided upon, are the bibrank code and the template to use, both values can be changed later. The bibrank code is used by the bibrank daemon to run the method, and should be fairly short without spaces. Which template you are using, decides how the ranking will be done, and must before used, be changed to suit your CDS Invenio configuration. When confirming to add a new rank method, it will be added to the list of possible rank methods, and a configuration file will be created if the httpd user has proper rights to the 'invenio/etc/bibrank' directory. If not, the file has to manually be created with the name 'bibrankcode.cfg' where bibrankcode is the same as given in the interface.

    3.3 Show details of rank method

    This interface gives you an overview of the current status of the rank method, and gives direct access to the various interfaces for changing the configuration. In the overview section, you see the bibrank code, for use with the bibrank daemon, and the date for the last run of the rank method. In the statistics section you see how many records have been added to the rank method and other statistic data. In the collection part, the collections which the rank method is visible to is shown. The translations part shows the various translations in the languages available in CDS Invenio. On the bottom the configuration file is shown, if accessible.

    3.4 Modify rank method

    This interface gives access to modify the bibrank code given when creating the rank method and the configuration file of the rank method, if the file can be accessed. If not, it may not exist, or the httpd user doesn't have enough rights to read the file. On the bottom of the interface, it is possible to choose a template, see it, and copy it over the old rank method configuration if wanted. Remember that the values present in the template is an example, and must be changed where necessary. See this documentation for information about this, and the 'BibRank Internals' link below for additional information.

    3.5 Delete rank method

    If it is necessary to delete a rank method, some precautions must be taken since the configuration of the method will be lost. When deleting a rank method, the configuration file will also be deleted ('invenio/etc/bibrank/bibrankcode.cfg' where bibrankcode is the code of the rank method) if accessible to the httpd user. If not, the file can be deleted manually from console. Any bibrank tasks scheduled to run the deleted rank method must be modified or deleted manually.

    3.6 Modify translations

    If you want to use internalisation of the rank method names, you have to add them using the 'Modify translations' interface. Below a list of all the languages used in the CDS Invenio installation will be shown with the possibility to add the translation for each language.

    3.7 Modify visibility toward collections

    If a rank method should be visible to the users of the CDS Invenio search interface, it must be enabled for one or several collections. A rank method can be visible in the search interface of the whole site, or just one collection. The collections in the upper list box does not show the rank method in the search interface to the user. To change this select the wanted collection and press 'Enable' to enable the rank method for this collection. The collections that the method has been activated for, is shown in the lower list box. To remove a collection, select it and press the 'Disable' button to remove it from the list of collections which the rank method is enabled for.

    4. BibRank Daemon

    The bibrank daemon read the necessary metadata from the CDS Invenio database and combines the read metadata in different ways to create the ranking data necessary at searchtime to fast be able to rank the results.

    4.1 Command Line Interface

     Usage bibrank:
            bibrank -wjif -a --id=0-30000,30001-860000 --verbose=9
            bibrank -wjif -d --modified='2002-10-27 13:57:26'
            bibrank -wwrd --recalculate --collection=Articles
            bibrank -wwrd -a -i 234-250,293,300-500 -u admin@localhost
     
      Ranking options:
      -w, --run=r1[,r2]         runs each rank method in the order given
     
      -c, --collection=c1[,c2]  select according to collection
      -i, --id=low[-high]       select according to doc recID
      -m, --modified=from[,to]  select according to modification date
      -l, --lastupdate          select according to last update
     
      -a, --add                 add or update words for selected records
      -d, --del                 delete words for selected records
      -S, --stat                show statistics for a method
     
      -R, --recalculate         recalculate weigth data, used by word frequency method
                                should be used if ca 1% of the document has been changed
                                since last time -R was used
      Repairing options:
      -k,  --check              check consistency for all records in the table(s)
                                check if update of ranking data is necessary
      -r, --repair              try to repair all records in the table(s)
      Scheduling options:
      -u, --user=USER           user name to store task, password needed
      -s, --sleeptime=SLEEP     time after which to repeat tasks (no)
                                 e.g.: 1s, 30m, 24h, 7d
      -t, --time=TIME           moment for the task to be active (now)
                                 e.g.: +15s, 5m, 3h , 2002-10-27 13:57:26
      General options:
      -h, --help                print this help and exit
      -V, --version             print version and exit
      -v, --verbose=LEVEL       verbose level (from 0 to 9, default 1)
     

    4.2 Using BibRank

    Step 1 - Adding the rank option to the search interface

    To be able to add the needed ranking data to the database, you first have to add the rank method to the database, and add the wished code you want to use together with it. The name of the configuration file in the next section, needs to have the same name as the code stored in the database.

    Step 2 - Get necessary external data (ex. jif values)

    Find out what is necessary of data for each method. The bibrankgkb documentation below may be of assistance.

    Example of necessary data (jif.kb - journal impact factor knowledge base)

     Phys. Rev., D---3.838
     Phys. Rev. Lett.---6.462
     Phys. Lett., B---4.213
     Nucl. Instrum. Methods Phys. Res., A---0.964
     J. High Energy Phys.---8.664
     

    Step 3 - Modify the configuration file

    The configuration files for the different rank methods has different option, so verify that you are using the correct configuration file and rank method. A template for each rank method exists as examples, but may not work on all configurations of CDS Invenio. For a description of each rank method and the configuration necessary, check section 6 below.

    Step 4 - Add the ranking method as a scheduled task

    When the configuration is okay, you can add the bibrank daemon to the task scheduler using the scheduling options. The daemon can then do a update of the rank method once each day or similar automatically.

    Example
     $ bibrank -wjif -r
     Task #53 was successfully scheduled for execution.
     

    It is adviced to run the BibRank daemon using no parameters, since the default settings then will be used.

    Example
     $ bibrank
     Task #2505 was successfully scheduled for execution.
     

    Step 5 - Running bibrank manually

    If BibRank is scheduled without any parameters, and no records has been modified, you may get a output like shown below.

    Example
     $ bibrank 2505
     2004-09-07 17:51:46 --> Task #2505 started.
     2004-09-07 17:51:46 -->
     2004-09-07 17:51:46 --> Running rank method: Number of downloads.
     2004-09-07 17:51:47 --> No new records added since last time method was run
     2004-09-07 17:52:10 -->
     2004-09-07 17:52:10 --> Running rank method: Journal Impact Factor.
     2004-09-07 17:52:10 --> No new records added since last time method was run
     2004-09-07 17:52:11 --> Reading knowledgebase file: /home/invenio/etc/bibrank/cern_jif.kb
     2004-09-07 17:52:11 --> Number of lines read from knowledgebase file: 420
     2004-09-07 17:52:11 --> Number of records available in rank method: 0
     2004-09-07 17:52:12 -->
     2004-09-07 17:52:12 --> Running rank method: Word frequency
     2004-09-07 17:52:13 --> rnkWORD01F contains 256842 words from 677912 records
     2004-09-07 17:52:14 --> rnkWORD01F is in consistent state
     2004-09-07 17:52:14 --> Using the last update time for the rank method
     2004-09-07 17:52:14 --> No new records added. rnkWORD01F is up to date
     2004-09-07 17:52:14 --> rnkWORD01F contains 256842 words from 677912 records
     2004-09-07 17:52:14 --> rnkWORD01F is in consistent state
     2004-09-07 17:52:14 --> Task #2505 finished.
     

    Step 6 - Fast update of modified records

    If you just want to update the latest additions or modified records, you may want to do a faster update by running the daemon without the recalculate option. (the recalculate option is off by default). This may cause lower accurancy when ranking.

    5. BibRank Methods

    Each BibRank method has a configuration file which contains different parameters and sections necessary to do the ranking.

    5.1 Single tag rank method

    This method uses one MARC tag together with a file containing possible values for this MARC tag together with a ranking value. This data is used to create a structure containing the record id associated with the ranking value based on the content of the tag. The method can be used for various ways of ranking like ranking by Journal Impact Factor, or use it to let certain authors always appear top of a search. The parameters needed to be configured for this method is the 'tag','kb_src' and 'check_mandatory_tags'.



    Example
     
     [rank_method]
     function = single_tag_rank_method
     
     [single_tag_rank]
     tag = 909C4p
     kb_src = /home/invenio/etc/bibrank/jif.kb
     check_mandatory_tags = 909C4c,909C4v,909C4y
     
     
    Explanation:
     [rank_method]
       ##The function which is responsible for doing the work. Should not be changed
       function = single_tag_rank_method
      
       ##This section must be available if the single_tag_rank_method is going to be used
       [single_tag_kb]
      
       ##The tag which got the value to be searched for on the left side in the kb file (like the journal name)
       tag = 909C4p
      
       ##The path to the kb file which got the content of the tag above on left side, and value on the left side
       kb_src = /home/invenio/etc/bibrank/jif.kb
      
       ##Tags that must be included for a record to be added to the ranking data, to disable remove tags
       check_mandatory_tags = 909C4c,909C4v,909C4y
      
     

    The kb_src file must contain data on the form:

     Phys. Rev., D---3.838
     Phys. Rev. Lett.---6.462
     Phys. Lett., B---4.213
     Nucl. Instrum. Methods Phys. Res., A---0.964
     J. High Energy Phys.---8.664
     

    The left side must match the content of the tag mentioned in the tag variable.

    5.2 Word Similarity/Similar Records

    The Word Similarity/Similar Records method uses the content of the tags selected to determine which records is most relevant to a query, or most similar to a selected record. This method got a lot of parameters to configure, and it may need some tweaking to get the best result. The BibRank code for this method has to be 'wrd' for it to work. For best result, it is adviced to install the stemming module mentioned in INSTALL, and use a stopword list containing stopwords in the languages the records exists in. The stemmer and stopword list is used to get better results and to limit the size of the index, thus making ranking faster and more accurate. For best result with the stemmer, it is important to mark each tag to be used with the most common language the value of the tag may be in. It is adviced to not change the 'function','table' and the parameters under [find_similar]. If the stemmer is not installed, to assure that no problems exists, the 'stem_if_avail' parameter should be set to 'no'. Each tag to be used by the method has to be given a point. The number of points describes how important one word is in this tag.

    When running BibRank to update the index for this rank method, it is not necessary to recalculate each time, but when large number of records has been updated/added, it can be wise to recalculate using the recalculate parameter of BibRank.

    Example
     [rank_method]
     function = word_similarity
     
     [word_similarity]
     stemming = en
     table = rnkWORD01F
     stopword = True
     relevance_number_output_prologue = (
     relevance_number_output_epilogue = )
      #MARC tag,tag points, tag language
     tag1 = 6531_a, 2, en
     tag2 = 695__a, 1, en
     tag3 = 6532_a, 1, en
     tag4 = 245__%, 10, en
     tag5 = 246_%, 1, fr
     tag6 = 250__a, 1, en
     tag7 = 711__a, 1, en
     tag8 = 210__a, 1, en
     tag9 = 222__a, 1, en
     tag10 = 520__%, 1, en
     tag11 = 590__%, 1, fr
     tag12 = 111__a, 1, en
     tag13 = 100__%, 2, none
     tag14 = 700__%, 1, none
     tag15 = 721__a, 1, none
     
     
     [find_similar]
     max_word_occurence = 0.05
     min_word_occurence = 0.00
     min_word_length = 3
     min_nr_words_docs = 3
     max_nr_words_upper = 20
     max_nr_words_lower = 10
     default_min_relevance = 75
     
    Explanation:
     [rank_method]
      #internal name for the bibrank program, do not modify
     function = word_similarity
     
     [word_similarity]
      #if stemmer is available, default stemminglanguage should be given here. Adviced to turn off if not installed
     stemming = en
      #the internal table to load the index tables from.
     table = rnkWORD01F
      #remove stopwords?
     stopword = True
      #text to show before the rank value when the search result is presented. <-- to hide result
     relevance_number_output_prologue = (
      #text to show after the rank value when the search result is presented. --> to hide result
     relevance_number_output_epilogue = )
     
      #MARC tag,tag points, tag language
      #a list of the tags to be used, together with a number describing the importance of the tag, and the
      #most common language for the content. Not all languages are supported. Among the supported ones are:
      #fr/french, en/english, no/norwegian, se/swedish, de/german, it/italian, pt/portugese
     
      #keyword
     tag1 = 6531_a, 1, en #keyword
     tag2 = 695__a, 1, en #keyword
     tag3 = 6532_a, 1, en #keyword
     tag4 = 245__%, 10, en #title, the words in the title is usually describing a record very good.
     tag5 = 246_% , 1, fr #french title
     tag6 = 250__a, 1, en #title
     tag7 = 711__a, 1, en #title
     tag8 = 210__a, 1, en #abbreviated
     tag9 = 222__a, 1, en #key title
     
     [find_similar]
      #term should exist in maximum X/100% of documents
     max_word_occurence = 0.05
      #term should exist in minimum X/100% of documents
     min_word_occurence = 0.00
      #term should be atleast 3 characters long
     min_word_length = 3
      #term should be in atleast 3 documents or more
     min_nr_words_docs = 3
      #do not use more than 20 terms for "find similar"
     max_nr_words_upper = 20
      #if a document contains less than 10 terms, use much used terms too, if not ignore them
     max_nr_words_lower = 10
      #default minimum relevance value to use for find similar
     default_min_relevance = 75
     

    Tip: When executing a search using a ranking method, you can add "verbose=1" to the list of parameteres in the URL to see which terms have been used in the ranking.

    5.3 Combine method

    The 'Combine method' is running each method mentioned in the config file and adding the score together based on the importance of the method given by the percentage.

    Example
     [rank_method]
     function = combine_method
     [combine_method]
     method1 = cern_jif,33
     method2 = cern_acc,33
     method3 = wrd,33
     relevance_number_output_prologue = (
     relevance_number_output_epilogue = )
     
    Explanation:
     [rank_method]
      #tells which method to use, do not change
     function = combine_method
     [combine_method]
      #each line tells which method to use, the code is the same as in the BibRank interface, the number describes how
      #much of the total score the method should count.
     method1 = jif,50
     method2 = wrd,50
      #text to be shown before the rank value on the search result screen.
     relevance_number_output_prologue = (
      #text to be shown after the rank value on the search result screen.
     relevance_number_output_epilogue = )
     

    6. bibrankgkb Tool

    For some ranking methods, like the single_tag_rank method, a knowledge base file (kb) with the needed data in the correct format is necessary. This file can be created using the bibrankgkb tool which can read the data either from the CDS Invenio database, from several web pages using regular expressions, or from another file. In case one source has another naming convention, bibrank can convert between them using a convert file.

    6.1 Command Line Interface

     Usage: bibrankgkb %s [options]
          Examples:
            bibrankgkb --input=bibrankgkb.cfg --output=test.kb
            bibrankgkb -otest.cfg -v9
            bibrankgkb
     
      Generate options:
      -i,  --input=file          input file, default from /etc/bibrank/bibrankgkb.cfg
      -o,  --output=file         output file, will be placed in current folder
      General options:
      -h,  --help                print this help and exit
      -V,  --version             print version and exit
      -v,  --verbose=LEVEL       verbose level (from 0 to 9, default 1)
     

    6.2 Using bibrankgkb

    Step 1 - Find sources

    Since some of the data used for ranking purposes is not freely available, it cannot be bundled with CDS Invenio. To get hold of the necessary data, you may find it useful to ask your library if they have a copy of the data that can be used (like the Journal Impact Factors from the Science Citation Index), or use google to search the web for any public source.

    Step 2 - Create configuration file

    The default configuration file is shown below.

      ##The main section
     [bibrankgkb]
      ##The url to a web page with the data to be read, does not need to have the same name as this one, but if there
     are several links, the url parameter should end with _0->
     url_0 = http://www.taelinke.land.ru/impact_A.html
     url_1 = http://www.taelinke.land.ru/impact_B.html
     url_2 = http://www.taelinke.land.ru/impact_C.html
     url_3 = http://www.taelinke.land.ru/impact_DE.html
     url_4 = http://www.taelinke.land.ru/impact_FH.html
     url_5 = http://www.taelinke.land.ru/impact_I.html
     url_6 = http://www.taelinke.land.ru/impact_J.html
     url_7 = http://www.taelinke.land.ru/impact_KN.html
     url_8 = http://www.taelinke.land.ru/impact_QQ.html
     url_9 = http://www.taelinke.land.ru/impact_RZ.html
      ##The regular expression for the url mentioned should be given here
     url_regexp =
     
      ##The various sources that can be read in, can either be a file, web page or from the database
     kb_1 = /home/invenio/modules/bibrank/etc/cern_jif.kb
     kb_2 = /home/invenio/modules/bibrank/etc/demo_jif.kb
     kb_2_filter = /home/invenio/modules/bibrank/etc/convert.kb
     kb_3 = SELECT id_bibrec,value FROM bib93x,bibrec_bib93x WHERE tag='938__f' AND id_bibxxx=id
     kb_4 = SELECT id_bibrec,value FROM bib21x,bibrec_bib21x WHERE tag='210__a' AND id_bibxxx=id
      ##This points to the url above (the common part of the url is 'url_' followed by a number
     kb_5 = url_%s
     
      ##This is the part that will be read by the bibrankgkb tool to determine what to read.
      ##The first two part (separated by ,,) gives where to look for the conversion file (which convert
      ##the names between to formats), and the second part is the data source. A conversion file is not
      ##needed, as shown in create_0. If the source is from a file, url or the database, it must be
      ##given with file,www or db. If several create lines exists, each will be read in turn, and added
      ##to a common kb file.
      ##So this means that:
      ##create_0: Load from file in variable kb_1 without converting
    - ##create_1: Load from file in variable kb_2 using convertion from file kb_2_filter
    + ##create_1: Load from file in variable kb_2 using conversion from file kb_2_filter
      ##create_3: Load from www using url in variable kb_5 and regular expression in url_regexp
      ##create_4: Load from database using sql statements in kb_4 and kb_5
     create_0 = ,, ,,file,,%(kb_1)s
     create_1 = file,,%(kb_2_filter)s,,file,,%(kb_2)s
      #create_2 = ,, ,,www,,%(kb_5)s,,%(url_regexp)s
      #create_3 = ,, ,,db,,%(kb_4)s,,%(kb_4)s
     

    When you have found a source for the data, created the configuration file, it may be necessary to -create an convertion file, but this depends on the conversions used in the available data versus -the convertion used in your CDS Invenio installation.

    +create an conversion file, but this depends on the conversions used in the available data versus +the conversion used in your CDS Invenio installation.

    The available data may look like this:

     COLLOID SURFACE A---1.98
     

    But in CDS Invenio you are using:

     Colloids Surf., A---1.98
     
    -

    By using a convertion file like:

    +

    By using a conversion file like:

     COLLOID SURFACE A---Colloids Surf., A
     

    You can convert the source to the correct naming convention.

     Colloids Surf., A---1.98
     

    Step 3 - Run tool

    When ready to run the tool, you may either use the default file (/etc/bibrank/bibrankgkb.cfg), or use another one by giving it using the input variable '--input'. If you want to test the configuration, you can use '--verbose=9' to output on screen, or if you want to save it to a file, use '--output=filename', but remember that the file will be saved in the program directory.

    The output may look like this:

     $ ./bibrankgkb -v9
     2004-03-11 17:30:17 --> Running: Generate Knowledge base.
     2004-03-11 17:30:17 --> Reading data from file: /home/invenio/etc/bibrank/jif.kb
     2004-03-11 17:30:17 --> Reading data from file: /home/invenio/etc/bibrank/conv.kb
     2004-03-11 17:30:17 --> Using last resource for converting values.
     2004-03-11 17:30:17 --> Reading data from file: /home/invenio/etc/bibrank/jif2.kb
     2004-03-11 17:30:17 --> Converting between naming conventions given.
     2004-03-11 17:30:17 --> Colloids Surf., A---1.98
     2004-03-11 17:30:17 --> Phys. Rev. Lett.---6.462
     2004-03-11 17:30:17 --> J. High Energy Phys.---8.664
     2004-03-11 17:30:17 --> Nucl. Instrum. Methods Phys. Res., A---0.964
     2004-03-11 17:30:17 --> Phys. Lett., B---4.213
     2004-03-11 17:30:17 --> Phys. Rev., D---3.838
     2004-03-11 17:30:17 --> Total nr of lines: 6
     2004-03-11 17:30:17 --> Time used: 0 second(s).
     

    7. Additional Information

    BibRank Internals diff --git a/modules/bibrank/doc/hacking/bibrank-bibrankgkb.webdoc b/modules/bibrank/doc/hacking/bibrank-bibrankgkb.webdoc index 6ed3ecac6..6741efbad 100644 --- a/modules/bibrank/doc/hacking/bibrank-bibrankgkb.webdoc +++ b/modules/bibrank/doc/hacking/bibrank-bibrankgkb.webdoc @@ -1,90 +1,90 @@ ## -*- mode: html; coding: utf-8; -*- ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
     
        1. Read default configuration file or the one specified by the user
     
        2. Read each create_ line from the cfg file, for each line, read the
           source(s) from either database, file or www by calling get_from_source().
    -      Convert between naming conventions if source for convertion data is given.
    +      Convert between naming conventions if source for conversion data is given.
     
        3. Merge into one file, repeat 2. until last source is read.
     
        4. Save file if requested with --output
     
        Configuration:
     
        How to spesify a source:
            -create_x = filter, source
        Where x is a number from 0 and up. The source and the filter is read,
        and each line in the source is checked against the filter to be converted
        into the correct naming standard. If no filter is given, the source is
        directly translated into a .kb file.
     
        Read filter from:
     
        File:
          [bibrankgkb]
          #give path to file containing lines like: COLLOID SURFACE A---Colloids Surf., A
          kb_1_filter = /bibrank/bibrankgkb_jif_conv.kb
          #replace filter with the line below (switch kb_1_filter with the variable names you used)
          create_0 = file,,%(kb_1_filter)s
     
        Read source from:
     
        Database:
          [bibrankgkb]
          #Specify sql statements
          kb_2 = SELECT id_bibrec,value FROM bib93x,bibrec_bib93x WHERE tag='938__f' AND id_bibxxx=id
          kb_3 = SELECT id_bibrec,value FROM bib21x,bibrec_bib21x WHERE tag='210__a' AND id_bibxxx=id
          #replace source with the line below (switch kb_2 and kb_3 with the variable names you used)
          db,,%(kb_2)s,,%(kb_3)s
     
        File:
          [bibrankgkb]
          #give path to file containing lines like: COLLOID SURFACE A---1.98
          kb_1 = /bibrank/bibrankgkb_jif_example.kb
          #replace source with the line below (switch kb_1 with the variable names you used)
          create_0 = file,,%(kb_1)s
     
        Internet:
          [bibrankgkb]
          #specify the urls to the file containing JIF data
          url_0 = http://www.sciencegateway.org/impact/if03a.htm
          url_1 = http://www.sciencegateway.org/impact/if03bc.htm
          url_2 = http://www.sciencegateway.org/impact/if03df.htm
          url_3 = http://www.sciencegateway.org/impact/if03gi.htm
          url_4 = http://www.sciencegateway.org/impact/if03j.htm
          url_5 = http://www.sciencegateway.org/impact/if03ko.htm
          url_6 = http://www.sciencegateway.org/impact/if03pr.htm
          url_7 = http://www.sciencegateway.org/impact/if03sz.htm
          #give the regular expression necessary to extract the key and value from the file
     
          url_regexp = (TR bgColor=\#ffffff>\s*?\n\s*?(?P.*?)\s*?\n\s*?.*?\s*?\n\s*?.*?\s*?\n\s*?\s*?\n\s*?(?P[\w|,]+))
     
          #replace source with the line below (switch kb_4 and url_regexp with the variable names you used)
          create_0 = www,,%(kb_4)s,,%(url_regexp)s
     
     
    diff --git a/modules/bibrank/lib/bibrank_regression_tests.py b/modules/bibrank/lib/bibrank_regression_tests.py index 24ae6e709..dab2121de 100644 --- a/modules/bibrank/lib/bibrank_regression_tests.py +++ b/modules/bibrank/lib/bibrank_regression_tests.py @@ -1,173 +1,173 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """BibRank Regression Test Suite.""" __revision__ = "$Id$" import unittest from invenio.config import CFG_SITE_URL from invenio.dbquery import run_sql from invenio.testutils import make_test_suite, run_test_suite, \ test_web_page_content, merge_error_messages class BibRankWebPagesAvailabilityTest(unittest.TestCase): """Check BibRank web pages whether they are up or not.""" def test_rank_by_word_similarity_pages_availability(self): """bibrank - availability of ranking search results pages""" baseurl = CFG_SITE_URL + '/search' _exports = ['?p=ellis&r=wrd'] error_messages = [] for url in [baseurl + page for page in _exports]: error_messages.extend(test_web_page_content(url)) if error_messages: self.fail(merge_error_messages(error_messages)) return def test_similar_records_pages_availability(self): """bibrank - availability of similar records results pages""" baseurl = CFG_SITE_URL + '/search' _exports = ['?p=recid%3A18&rm=wrd'] error_messages = [] for url in [baseurl + page for page in _exports]: error_messages.extend(test_web_page_content(url)) if error_messages: self.fail(merge_error_messages(error_messages)) return class BibRankIntlMethodNames(unittest.TestCase): """Check BibRank I18N ranking method names.""" def test_i18n_ranking_method_names(self): """bibrank - I18N ranking method names""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/collection/Articles%20%26%20Preprints?as=1', expected_text="times cited")) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/collection/Articles%20%26%20Preprints?as=1', expected_text="journal impact factor")) class BibRankWordSimilarityRankingTest(unittest.TestCase): """Check BibRank word similarity ranking tools.""" def test_search_results_ranked_by_similarity(self): """bibrank - search results ranked by word similarity""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis&rm=wrd&of=id', - expected_text="[8, 10, 11, 12, 47, 17, 13, 16, 18, 9, 14, 15]")) + expected_text="[8, 10, 11, 12, 47, 17, 13, 16, 9, 14, 18, 15]")) def test_similar_records_link(self): """bibrank - 'Similar records' link""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A77&rm=wrd&of=id', - expected_text="[84, 95, 85, 77]")) + expected_text="[84, 96, 95, 85, 77]")) class BibRankCitationRankingTest(unittest.TestCase): """Check BibRank citation ranking tools.""" def test_search_results_ranked_by_citations(self): """bibrank - search results ranked by number of citations""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?cc=Articles+%26+Preprints&p=Klebanov&rm=citation&of=id', expected_text="[85, 77, 84]")) def test_search_results_ranked_by_citations_verbose(self): """bibrank - search results ranked by number of citations, verbose output""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?cc=Articles+%26+Preprints&p=Klebanov&rm=citation&verbose=2', expected_text="find_citations retlist [[85, 0], [77, 2], [84, 3]]")) def test_detailed_record_citations_tab(self): """bibrank - detailed record, citations tab""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/record/79/citations', expected_text=["Cited by: 1 records", "Co-cited with: 2 records"])) class BibRankExtCitesTest(unittest.TestCase): """Check BibRank citation ranking tools with respect to the external cites.""" def _detect_extcite_info(self, extcitepubinfo): """ Helper function to return list of recIDs citing given extcitepubinfo. Could be move to the business logic, if interesting for other callers. """ res = run_sql("""SELECT id_bibrec FROM rnkCITATIONDATAEXT WHERE extcitepubinfo=%s""", (extcitepubinfo,)) return [int(x[0]) for x in res] def test_extcite_via_report_number(self): """bibrank - external cites, via report number""" # The external paper hep-th/0112258 is cited by 9 demo # records: you can find out via 999:"hep-th/0112258", and we # could eventually automatize this query, but it is maybe # safer to leave it manual in case queries fail for some # reason. test_case_repno = "hep-th/0112258" test_case_repno_cited_by = [77, 78, 81, 82, 85, 86, 88, 90, 91] self.assertEqual(self._detect_extcite_info(test_case_repno), test_case_repno_cited_by) def test_extcite_via_publication_reference(self): """bibrank - external cites, via publication reference""" # The external paper "J. Math. Phys. 4 (1963) 915" does not # have any report number, and is cited by 1 demo record. test_case_pubinfo = "J. Math. Phys. 4 (1963) 915" test_case_pubinfo_cited_by = [90] self.assertEqual(self._detect_extcite_info(test_case_pubinfo), test_case_pubinfo_cited_by) def test_intcite_via_report_number(self): """bibrank - external cites, no internal papers via report number""" # The internal paper hep-th/9809057 is cited by 2 demo # records, but it also exists as a demo record, so it should # not be found in the extcite table. test_case_repno = "hep-th/9809057" test_case_repno_cited_by = [] self.assertEqual(self._detect_extcite_info(test_case_repno), test_case_repno_cited_by) def test_intcite_via_publication_reference(self): """bibrank - external cites, no internal papers via publication reference""" # The internal paper #18 has only pubinfo, no repno, and is # cited by internal paper #96 via its pubinfo, so should not # be present in the extcite list: test_case_repno = "Phys. Lett., B 151 (1985) 357" test_case_repno_cited_by = [] self.assertEqual(self._detect_extcite_info(test_case_repno), test_case_repno_cited_by) TEST_SUITE = make_test_suite(BibRankWebPagesAvailabilityTest, BibRankIntlMethodNames, BibRankWordSimilarityRankingTest, BibRankCitationRankingTest, BibRankExtCitesTest) if __name__ == "__main__": run_test_suite(TEST_SUITE, warn_user=True) diff --git a/modules/bibrank/lib/bibrank_tag_based_indexer.py b/modules/bibrank/lib/bibrank_tag_based_indexer.py index 7104429d6..d880f4d85 100644 --- a/modules/bibrank/lib/bibrank_tag_based_indexer.py +++ b/modules/bibrank/lib/bibrank_tag_based_indexer.py @@ -1,488 +1,471 @@ # -*- coding: utf-8 -*- ## Ranking of records using different parameters and methods. ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __revision__ = "$Id$" import sys import time import marshal import ConfigParser from invenio.config import \ CFG_SITE_LANG, \ CFG_ETCDIR from invenio.search_engine import perform_request_search, HitSet from invenio.bibrank_citation_indexer import get_citation_weight, print_missing, get_cit_dict, insert_into_cit_db from invenio.bibrank_downloads_indexer import * from invenio.dbquery import run_sql, serialize_via_marshal, deserialize_via_marshal from invenio.errorlib import register_exception from invenio.bibtask import task_get_option, write_message, task_sleep_now_if_required +from invenio.bibindex_engine import create_range_list options = {} def remove_auto_cites(dic): """Remove auto-cites and dedupe.""" for key in dic.keys(): new_list = dic.fromkeys(dic[key]).keys() try: new_list.remove(key) except ValueError, e: pass dic[key] = new_list return dic def citation_repair_exec(): """Repair citation ranking method""" ## repair citations for rowname in ["citationdict","reversedict"]: ## get dic dic = get_cit_dict(rowname) ## repair write_message("Repairing %s" % rowname) dic = remove_auto_cites(dic) ## store healthy citation dic insert_into_cit_db(dic, rowname) return def download_weight_filtering_user_repair_exec (): """Repair download weight filtering user ranking method""" write_message("Repairing for this ranking method is not defined. Skipping.") return def download_weight_total_repair_exec(): """Repair download weight total ranking method""" write_message("Repairing for this ranking method is not defined. Skipping.") return def file_similarity_by_times_downloaded_repair_exec(): """Repair file similarity by times downloaded ranking method""" write_message("Repairing for this ranking method is not defined. Skipping.") return def single_tag_rank_method_repair_exec(): """Repair single tag ranking method""" write_message("Repairing for this ranking method is not defined. Skipping.") return def citation_exec(rank_method_code, name, config): """Rank method for citation analysis""" #first check if this is a specific task if task_get_option("cmd") == "print-missing": num = task_get_option("num") print_missing(num) dict = get_citation_weight(rank_method_code, config) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) if dict: intoDB(dict, date, rank_method_code) else: write_message("no need to update the indexes for citations") def download_weight_filtering_user(run): return bibrank_engine(run) def download_weight_total(run): return bibrank_engine(run) def file_similarity_by_times_downloaded(run): return bibrank_engine(run) def download_weight_filtering_user_exec (rank_method_code, name, config): """Ranking by number of downloads per User. Only one full Text Download is taken in account for one specific userIP address""" time1 = time.time() dic = fromDB(rank_method_code) last_updated = get_lastupdated(rank_method_code) keys = new_downloads_to_index(last_updated) filter_downloads_per_hour(keys, last_updated) dic = get_download_weight_filtering_user(dic, keys) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) intoDB(dic, date, rank_method_code) time2 = time.time() return {"time":time2-time1} def download_weight_total_exec(rank_method_code, name, config): """rankink by total number of downloads without check the user ip if users downloads 3 time the same full text document it has to be count as 3 downloads""" time1 = time.time() dic = fromDB(rank_method_code) last_updated = get_lastupdated(rank_method_code) keys = new_downloads_to_index(last_updated) filter_downloads_per_hour(keys, last_updated) dic = get_download_weight_total(dic, keys) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) intoDB(dic, date, rank_method_code) time2 = time.time() return {"time":time2-time1} def file_similarity_by_times_downloaded_exec(rank_method_code, name, config): """update dictionnary {recid:[(recid, nb page similarity), ()..]}""" time1 = time.time() dic = fromDB(rank_method_code) last_updated = get_lastupdated(rank_method_code) keys = new_downloads_to_index(last_updated) filter_downloads_per_hour(keys, last_updated) dic = get_file_similarity_by_times_downloaded(dic, keys) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) intoDB(dic, date, rank_method_code) time2 = time.time() return {"time":time2-time1} def single_tag_rank_method_exec(rank_method_code, name, config): """Creating the rank method data""" startCreate = time.time() rnkset = {} rnkset_old = fromDB(rank_method_code) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) rnkset_new = single_tag_rank(config) rnkset = union_dicts(rnkset_old, rnkset_new) intoDB(rnkset, date, rank_method_code) def single_tag_rank(config): """Connect the given tag with the data from the kb file given""" write_message("Loading knowledgebase file", verbose=9) kb_data = {} records = [] write_message("Reading knowledgebase file: %s" % \ config.get(config.get("rank_method", "function"), "kb_src")) input = open(config.get(config.get("rank_method", "function"), "kb_src"), 'r') data = input.readlines() for line in data: if not line[0:1] == "#": kb_data[string.strip((string.split(string.strip(line), "---"))[0])] = (string.split(string.strip(line), "---"))[1] write_message("Number of lines read from knowledgebase file: %s" % len(kb_data)) tag = config.get(config.get("rank_method", "function"), "tag") tags = config.get(config.get("rank_method", "function"), "check_mandatory_tags").split(", ") if tags == ['']: tags = "" records = [] for (recids, recide) in options["recid_range"]: task_sleep_now_if_required(can_stop_too=True) write_message("......Processing records #%s-%s" % (recids, recide)) recs = run_sql("SELECT id_bibrec, value FROM bib%sx, bibrec_bib%sx WHERE tag=%%s AND id_bibxxx=id and id_bibrec >=%%s and id_bibrec<=%%s" % (tag[0:2], tag[0:2]), (tag, recids, recide)) valid = HitSet(trailing_bits=1) valid.discard(0) for key in tags: newset = HitSet() newset += [recid[0] for recid in (run_sql("SELECT id_bibrec FROM bib%sx, bibrec_bib%sx WHERE id_bibxxx=id AND tag=%%s AND id_bibxxx=id and id_bibrec >=%%s and id_bibrec<=%%s" % (tag[0:2], tag[0:2]), (key, recids, recide)))] valid.intersection_update(newset) if tags: recs = filter(lambda x: x[0] in valid, recs) records = records + list(recs) write_message("Number of records found with the necessary tags: %s" % len(records)) records = filter(lambda x: x[0] in options["validset"], records) rnkset = {} for key, value in records: if kb_data.has_key(value): if not rnkset.has_key(key): rnkset[key] = float(kb_data[value]) else: if kb_data.has_key(rnkset[key]) and float(kb_data[value]) > float((rnkset[key])[1]): rnkset[key] = float(kb_data[value]) else: rnkset[key] = 0 write_message("Number of records available in rank method: %s" % len(rnkset)) return rnkset def get_lastupdated(rank_method_code): """Get the last time the rank method was updated""" res = run_sql("SELECT rnkMETHOD.last_updated FROM rnkMETHOD WHERE name=%s", (rank_method_code, )) if res: return res[0][0] else: raise Exception("Is this the first run? Please do a complete update.") def intoDB(dict, date, rank_method_code): """Insert the rank method data into the database""" mid = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, )) del_rank_method_codeDATA(rank_method_code) serdata = serialize_via_marshal(dict); midstr = str(mid[0][0]); run_sql("INSERT INTO rnkMETHODDATA(id_rnkMETHOD, relevance_data) VALUES (%s,%s)", (midstr, serdata,)) run_sql("UPDATE rnkMETHOD SET last_updated=%s WHERE name=%s", (date, rank_method_code)) def fromDB(rank_method_code): """Get the data for a rank method""" id = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, )) res = run_sql("SELECT relevance_data FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s", (id[0][0], )) if res: return deserialize_via_marshal(res[0][0]) else: return {} def del_rank_method_codeDATA(rank_method_code): """Delete the data for a rank method""" id = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, )) res = run_sql("DELETE FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s", (id[0][0], )) def del_recids(rank_method_code, range_rec): """Delete some records from the rank method""" id = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, )) res = run_sql("SELECT relevance_data FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s", (id[0][0] )) if res: rec_dict = deserialize_via_marshal(res[0][0]) write_message("Old size: %s" % len(rec_dict)) for (recids, recide) in range_rec: for i in range(int(recids), int(recide)): if rec_dict.has_key(i): del rec_dict[i] write_message("New size: %s" % len(rec_dict)) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) intoDB(rec_dict, date, rank_method_code) else: write_message("Create before deleting!") def union_dicts(dict1, dict2): "Returns union of the two dicts." union_dict = {} for (key, value) in dict1.iteritems(): union_dict[key] = value for (key, value) in dict2.iteritems(): union_dict[key] = value return union_dict def rank_method_code_statistics(rank_method_code): """Print statistics""" method = fromDB(rank_method_code) max = ('', -999999) maxcount = 0 min = ('', 999999) mincount = 0 for (recID, value) in method.iteritems(): if value < min and value > 0: min = value if value > max: max = value for (recID, value) in method.iteritems(): if value == min: mincount += 1 if value == max: maxcount += 1 write_message("Showing statistic for selected method") write_message("Method name: %s" % getName(rank_method_code)) write_message("Short name: %s" % rank_method_code) write_message("Last run: %s" % get_lastupdated(rank_method_code)) write_message("Number of records: %s" % len(method)) write_message("Lowest value: %s - Number of records: %s" % (min, mincount)) write_message("Highest value: %s - Number of records: %s" % (max, maxcount)) write_message("Divided into 10 sets:") for i in range(1, 11): setcount = 0 distinct_values = {} lower = -1.0 + ((float(max + 1) / 10)) * (i - 1) upper = -1.0 + ((float(max + 1) / 10)) * i for (recID, value) in method.iteritems(): if value >= lower and value <= upper: setcount += 1 distinct_values[value] = 1 write_message("Set %s (%s-%s) %s Distinct values: %s" % (i, lower, upper, len(distinct_values), setcount)) def check_method(rank_method_code): write_message("Checking rank method...") if len(fromDB(rank_method_code)) == 0: write_message("Rank method not yet executed, please run it to create the necessary data.") else: if len(add_recIDs_by_date(rank_method_code)) > 0: write_message("Records modified, update recommended") else: write_message("No records modified, update not necessary") def bibrank_engine(run): """Run the indexing task. Return 1 in case of success and 0 in case of failure. """ try: import psyco psyco.bind(single_tag_rank) psyco.bind(single_tag_rank_method_exec) psyco.bind(serialize_via_marshal) psyco.bind(deserialize_via_marshal) except StandardError, e: pass startCreate = time.time() sets = {} try: options["run"] = [] options["run"].append(run) for rank_method_code in options["run"]: task_sleep_now_if_required(can_stop_too=True) cfg_name = getName(rank_method_code) write_message("Running rank method: %s." % cfg_name) file = CFG_ETCDIR + "/bibrank/" + rank_method_code + ".cfg" config = ConfigParser.ConfigParser() try: config.readfp(open(file)) except StandardError, e: write_message("Cannot find configurationfile: %s" % file, sys.stderr) raise StandardError cfg_short = rank_method_code cfg_function = config.get("rank_method", "function") + "_exec" cfg_repair_function = config.get("rank_method", "function") + "_repair_exec" cfg_name = getName(cfg_short) options["validset"] = get_valid_range(rank_method_code) if task_get_option("collection"): l_of_colls = string.split(task_get_option("collection"), ", ") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID, recID]) options["recid_range"] = recIDs_range elif task_get_option("id"): options["recid_range"] = task_get_option("id") elif task_get_option("modified"): options["recid_range"] = add_recIDs_by_date(rank_method_code, task_get_option("modified")) elif task_get_option("last_updated"): options["recid_range"] = add_recIDs_by_date(rank_method_code) else: write_message("No records specified, updating all", verbose=2) min_id = run_sql("SELECT min(id) from bibrec")[0][0] max_id = run_sql("SELECT max(id) from bibrec")[0][0] options["recid_range"] = [[min_id, max_id]] if task_get_option("quick") == "no": write_message("Recalculate parameter not used, parameter ignored.", verbose=9) if task_get_option("cmd") == "del": del_recids(cfg_short, options["recid_range"]) elif task_get_option("cmd") == "add": func_object = globals().get(cfg_function) func_object(rank_method_code, cfg_name, config) elif task_get_option("cmd") == "stat": rank_method_code_statistics(rank_method_code) elif task_get_option("cmd") == "check": check_method(rank_method_code) elif task_get_option("cmd") == "print-missing": func_object = globals().get(cfg_function) func_object(rank_method_code, cfg_name, config) elif task_get_option("cmd") == "repair": func_object = globals().get(cfg_repair_function) func_object() else: write_message("Invalid command found processing %s" % rank_method_code, sys.stderr) raise StandardError except StandardError, e: write_message("\nException caught: %s" % e, sys.stderr) register_exception() raise StandardError if task_get_option("verbose"): showtime((time.time() - startCreate)) return 1 def get_valid_range(rank_method_code): """Return a range of records""" write_message("Getting records from collections enabled for rank method.", verbose=9) res = run_sql("SELECT collection.name FROM collection, collection_rnkMETHOD, rnkMETHOD WHERE collection.id=id_collection and id_rnkMETHOD=rnkMETHOD.id and rnkMETHOD.name=%s", (rank_method_code, )) l_of_colls = [] for coll in res: l_of_colls.append(coll[0]) if len(l_of_colls) > 0: recIDs = perform_request_search(c=l_of_colls) else: recIDs = [] valid = HitSet() valid += recIDs return valid def add_recIDs_by_date(rank_method_code, dates=""): """Return recID range from records modified between DATES[0] and DATES[1]. If DATES is not set, then add records modified since the last run of the ranking method RANK_METHOD_CODE. """ if not dates: try: dates = (get_lastupdated(rank_method_code), '') except Exception, e: dates = ("0000-00-00 00:00:00", '') if dates[0] is None: dates = ("0000-00-00 00:00:00", '') query = """SELECT b.id FROM bibrec AS b WHERE b.modification_date >= %s""" if dates[1]: query += " and b.modification_date <= %s" query += " ORDER BY b.id ASC""" if dates[1]: res = run_sql(query, (dates[0], dates[1])) else: res = run_sql(query, (dates[0], )) - list = create_range_list(res) - if not list: + alist = create_range_list([row[0] for row in res]) + if not alist: write_message("No new records added since last time method was run") - return list + return alist def getName(rank_method_code, ln=CFG_SITE_LANG, type='ln'): """Returns the name of the method if it exists""" try: rnkid = run_sql("SELECT id FROM rnkMETHOD where name=%s", (rank_method_code, )) if rnkid: rnkid = str(rnkid[0][0]) res = run_sql("SELECT value FROM rnkMETHODNAME where type=%s and ln=%s and id_rnkMETHOD=%s", (type, ln, rnkid)) if not res: res = run_sql("SELECT value FROM rnkMETHODNAME WHERE ln=%s and id_rnkMETHOD=%s and type=%s", (CFG_SITE_LANG, rnkid, type)) if not res: return rank_method_code return res[0][0] else: raise Exception except Exception, e: write_message("Cannot run rank method, either given code for method is wrong, or it has not been added using the webinterface.") raise Exception -def create_range_list(res): - """Creates a range list from a recID select query result contained - in res. The result is expected to have ascending numerical order.""" - if not res: - return [] - row = res[0] - if not row: - return [] - else: - range_list = [[row[0], row[0]]] - for row in res[1:]: - id = row[0] - if id == range_list[-1][1] + 1: - range_list[-1][1] = id - else: - range_list.append([id, id]) - return range_list - def single_tag_rank_method(run): return bibrank_engine(run) def showtime(timeused): """Show time used for method""" write_message("Time used: %d second(s)." % timeused, verbose=9) def citation(run): return bibrank_engine(run) diff --git a/modules/bibrank/lib/bibrank_word_indexer.py b/modules/bibrank/lib/bibrank_word_indexer.py index f2106f0ef..01c4e71d0 100644 --- a/modules/bibrank/lib/bibrank_word_indexer.py +++ b/modules/bibrank/lib/bibrank_word_indexer.py @@ -1,1206 +1,1206 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __revision__ = "$Id$" import sys import time import urllib import math import re import ConfigParser from invenio.config import \ CFG_SITE_LANG, \ CFG_ETCDIR from invenio.search_engine import perform_request_search, strip_accents, wash_index_term from invenio.dbquery import run_sql, DatabaseError, serialize_via_marshal, deserialize_via_marshal from invenio.bibindex_engine_stemmer import is_stemmer_available_for_language, stem from invenio.bibindex_engine_stopwords import is_stopword from invenio.bibindex_engine import beautify_range_list, \ kill_sleepy_mysql_threads, create_range_list from invenio.bibtask import write_message, task_get_option, task_update_progress, \ task_update_status, task_sleep_now_if_required from invenio.intbitset import intbitset from invenio.errorlib import register_exception options = {} # global variable to hold task options ## safety parameters concerning DB thread-multiplication problem: CFG_CHECK_MYSQL_THREADS = 0 # to check or not to check the problem? CFG_MAX_MYSQL_THREADS = 50 # how many threads (connections) we consider as still safe CFG_MYSQL_THREAD_TIMEOUT = 20 # we'll kill threads that were sleeping for more than X seconds ## override urllib's default password-asking behaviour: class MyFancyURLopener(urllib.FancyURLopener): def prompt_user_passwd(self, host, realm): # supply some dummy credentials by default return ("mysuperuser", "mysuperpass") def http_error_401(self, url, fp, errcode, errmsg, headers): # do not bother with protected pages raise IOError, (999, 'unauthorized access') return None #urllib._urlopener = MyFancyURLopener() nb_char_in_line = 50 # for verbose pretty printing chunksize = 1000 # default size of chunks that the records will be treated by base_process_size = 4500 # process base size ## Dictionary merging functions def dict_union(list1, list2): "Returns union of the two dictionaries." union_dict = {} for (e, count) in list1.iteritems(): union_dict[e] = count for (e, count) in list2.iteritems(): if not union_dict.has_key(e): union_dict[e] = count else: union_dict[e] = (union_dict[e][0] + count[0], count[1]) #for (e, count) in list2.iteritems(): # list1[e] = (list1.get(e, (0, 0))[0] + count[0], count[1]) #return list1 return union_dict # tagToFunctions mapping. It offers an indirection level necesary for # indexing fulltext. The default is get_words_from_phrase tagToWordsFunctions = {} def get_words_from_phrase(phrase, weight, lang="", chars_punctuation=r"[\.\,\:\;\?\!\"]", chars_alphanumericseparators=r"[1234567890\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]", split=str.split): "Returns list of words from phrase 'phrase'." words = {} phrase = strip_accents(phrase) phrase = phrase.lower() #Getting rid of strange characters phrase = re.sub("é", 'e', phrase) phrase = re.sub("è", 'e', phrase) phrase = re.sub("à", 'a', phrase) phrase = re.sub(" ", ' ', phrase) phrase = re.sub("«", ' ', phrase) phrase = re.sub("»", ' ', phrase) phrase = re.sub("ê", ' ', phrase) phrase = re.sub("&", ' ', phrase) if phrase.find(" -1: #Most likely html, remove html code phrase = re.sub("(?s)<[^>]*>|&#?\w+;", ' ', phrase) #removes http links phrase = re.sub("(?s)http://[^( )]*", '', phrase) phrase = re.sub(chars_punctuation, ' ', phrase) #By doing this like below, characters standing alone, like c a b is not added to the inedx, but when they are together with characters like c++ or c$ they are added. for word in split(phrase): if options["remove_stopword"] == "True" and not is_stopword(word, 1) and check_term(word, 0): if lang and lang !="none" and options["use_stemming"]: word = stem(word, lang) if not words.has_key(word): words[word] = (0, 0) else: if not words.has_key(word): words[word] = (0, 0) words[word] = (words[word][0] + weight, 0) elif options["remove_stopword"] == "True" and not is_stopword(word, 1): phrase = re.sub(chars_alphanumericseparators, ' ', word) for word_ in split(phrase): if lang and lang !="none" and options["use_stemming"]: word_ = stem(word_, lang) if word_: if not words.has_key(word_): words[word_] = (0,0) words[word_] = (words[word_][0] + weight, 0) return words class WordTable: "A class to hold the words table." def __init__(self, tablename, fields_to_index, separators="[^\s]"): "Creates words table instance." self.tablename = tablename self.recIDs_in_mem = [] self.fields_to_index = fields_to_index self.separators = separators self.value = {} def get_field(self, recID, tag): """Returns list of values of the MARC-21 'tag' fields for the record 'recID'.""" out = [] bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = """SELECT value FROM %s AS b, %s AS bb WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID, tag); res = run_sql(query) for row in res: out.append(row[0]) return out def clean(self): "Cleans the words table." self.value={} def put_into_db(self, mode="normal"): """Updates the current words table in the corresponding DB rnkWORD table. Mode 'normal' means normal execution, mode 'emergency' means words index reverting to old state. """ write_message("%s %s wordtable flush started" % (self.tablename,mode)) write_message('...updating %d words into %sR started' % \ (len(self.value), self.tablename[:-1])) task_update_progress("%s flushed %d/%d words" % (self.tablename, 0, len(self.value))) self.recIDs_in_mem = beautify_range_list(self.recIDs_in_mem) if mode == "normal": for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='TEMPORARY' WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='CURRENT'""" % \ (self.tablename[:-1], group[0], group[1]) write_message(query, verbose=9) run_sql(query) nb_words_total = len(self.value) nb_words_report = int(nb_words_total/10) nb_words_done = 0 for word in self.value.keys(): self.put_word_into_db(word, self.value[word]) nb_words_done += 1 if nb_words_report!=0 and ((nb_words_done % nb_words_report) == 0): write_message('......processed %d/%d words' % (nb_words_done, nb_words_total)) task_update_progress("%s flushed %d/%d words" % (self.tablename, nb_words_done, nb_words_total)) write_message('...updating %d words into %s ended' % \ (nb_words_total, self.tablename), verbose=9) #if options["verbose"]: # write_message('...updating reverse table %sR started' % self.tablename[:-1]) if mode == "normal": for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \ (self.tablename[:-1], group[0], group[1]) write_message(query, verbose=9) run_sql(query) query = """DELETE FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \ (self.tablename[:-1], group[0], group[1]) write_message(query, verbose=9) run_sql(query) write_message('End of updating wordTable into %s' % self.tablename, verbose=9) elif mode == "emergency": write_message("emergency") for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \ (self.tablename[:-1], group[0], group[1]) write_message(query, verbose=9) run_sql(query) query = """DELETE FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \ (self.tablename[:-1], group[0], group[1]) write_message(query, verbose=9) run_sql(query) write_message('End of emergency flushing wordTable into %s' % self.tablename, verbose=9) #if options["verbose"]: # write_message('...updating reverse table %sR ended' % self.tablename[:-1]) self.clean() self.recIDs_in_mem = [] write_message("%s %s wordtable flush ended" % (self.tablename, mode)) task_update_progress("%s flush ended" % (self.tablename)) def load_old_recIDs(self,word): """Load existing hitlist for the word from the database index files.""" query = "SELECT hitlist FROM %s WHERE term=%%s" % self.tablename res = run_sql(query, (word,)) if res: return deserialize_via_marshal(res[0][0]) else: return None def merge_with_old_recIDs(self,word,recIDs, set): """Merge the system numbers stored in memory (hash of recIDs with value[0] > 0 or -1 according to whether to add/delete them) with those stored in the database index and received in set universe of recIDs for the given word. Return 0 in case no change was done to SET, return 1 in case SET was changed. """ set_changed_p = 0 for recID,sign in recIDs.iteritems(): if sign[0] == -1 and set.has_key(recID): # delete recID if existent in set and if marked as to be deleted del set[recID] set_changed_p = 1 elif sign[0] > -1 and not set.has_key(recID): # add recID if not existent in set and if marked as to be added set[recID] = sign set_changed_p = 1 elif sign[0] > -1 and sign[0] != set[recID][0]: set[recID] = sign set_changed_p = 1 return set_changed_p def put_word_into_db(self, word, recIDs, split=str.split): """Flush a single word to the database and delete it from memory""" set = self.load_old_recIDs(word) #write_message("%s %s" % (word, self.value[word])) if set is not None: # merge the word recIDs found in memory: options["modified_words"][word] = 1 if not self.merge_with_old_recIDs(word, recIDs, set): # nothing to update: write_message("......... unchanged hitlist for ``%s''" % word, verbose=9) pass else: # yes there were some new words: write_message("......... updating hitlist for ``%s''" % word, verbose=9) run_sql("UPDATE %s SET hitlist=%%s WHERE term=%%s" % self.tablename, (serialize_via_marshal(set), word)) else: # the word is new, will create new set: write_message("......... inserting hitlist for ``%s''" % word, verbose=9) set = self.value[word] if len(set) > 0: #new word, add to list options["modified_words"][word] = 1 try: run_sql("INSERT INTO %s (term, hitlist) VALUES (%%s, %%s)" % self.tablename, (word, serialize_via_marshal(set))) except Exception, e: ## FIXME: This is for debugging encoding errors register_exception(prefix="Error when putting the term '%s' into db (hitlist=%s): %s\n" % (repr(word), set, e), alert_admin=True) if not set: # never store empty words run_sql("DELETE from %s WHERE term=%%s" % self.tablename, (word,)) del self.value[word] def display(self): "Displays the word table." keys = self.value.keys() keys.sort() for k in keys: write_message("%s: %s" % (k, self.value[k])) def count(self): "Returns the number of words in the table." return len(self.value) def info(self): "Prints some information on the words table." write_message("The words table contains %d words." % self.count()) def lookup_words(self, word=""): "Lookup word from the words table." if not word: done = 0 while not done: try: word = raw_input("Enter word: ") done = 1 except (EOFError, KeyboardInterrupt): return if self.value.has_key(word): write_message("The word '%s' is found %d times." \ % (word, len(self.value[word]))) else: write_message("The word '%s' does not exist in the word file."\ % word) def update_last_updated(self, rank_method_code, starting_time=None): """Update last_updated column of the index table in the database. Puts starting time there so that if the task was interrupted for record download, the records will be reindexed next time.""" if starting_time is None: return None write_message("updating last_updated to %s..." % starting_time, verbose=9) return run_sql("UPDATE rnkMETHOD SET last_updated=%s WHERE name=%s", (starting_time, rank_method_code,)) def add_recIDs(self, recIDs): - """Fetches records which id in the recIDs range list and adds - them to the wordTable. The recIDs range list is of the form: + """Fetches records which id in the recIDs arange list and adds + them to the wordTable. The recIDs arange list is of the form: [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]]. """ global chunksize flush_count = 0 records_done = 0 records_to_go = 0 - for range in recIDs: - records_to_go = records_to_go + range[1] - range[0] + 1 + for arange in recIDs: + records_to_go = records_to_go + arange[1] - arange[0] + 1 time_started = time.time() # will measure profile time - for range in recIDs: - i_low = range[0] + for arange in recIDs: + i_low = arange[0] chunksize_count = 0 - while i_low <= range[1]: + while i_low <= arange[1]: # calculate chunk group of recIDs and treat it: - i_high = min(i_low+task_get_option("flush")-flush_count-1,range[1]) + i_high = min(i_low+task_get_option("flush")-flush_count-1,arange[1]) i_high = min(i_low+chunksize-chunksize_count-1, i_high) try: self.chk_recID_range(i_low, i_high) except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) register_exception() task_update_status("ERROR") sys.exit(1) write_message("%s adding records #%d-#%d started" % \ (self.tablename, i_low, i_high)) if CFG_CHECK_MYSQL_THREADS: kill_sleepy_mysql_threads() task_update_progress("%s adding recs %d-%d" % (self.tablename, i_low, i_high)) self.del_recID_range(i_low, i_high) just_processed = self.add_recID_range(i_low, i_high) flush_count = flush_count + i_high - i_low + 1 chunksize_count = chunksize_count + i_high - i_low + 1 records_done = records_done + just_processed write_message("%s adding records #%d-#%d ended " % \ (self.tablename, i_low, i_high)) if chunksize_count >= chunksize: chunksize_count = 0 # flush if necessary: if flush_count >= task_get_option("flush"): self.put_into_db() self.clean() write_message("%s backing up" % (self.tablename)) flush_count = 0 self.log_progress(time_started,records_done,records_to_go) # iterate: i_low = i_high + 1 if flush_count > 0: self.put_into_db() self.log_progress(time_started,records_done,records_to_go) def add_recIDs_by_date(self, dates=""): """Add recIDs modified between DATES[0] and DATES[1]. If DATES is not set, then add records modified since the last run of the ranking method. """ if not dates: write_message("Using the last update time for the rank method") query = """SELECT last_updated FROM rnkMETHOD WHERE name='%s' """ % options["current_run"] res = run_sql(query) if not res: return if not res[0][0]: dates = ("0000-00-00",'') else: dates = (res[0][0],'') query = """SELECT b.id FROM bibrec AS b WHERE b.modification_date >= '%s'""" % dates[0] if dates[1]: query += "and b.modification_date <= '%s'" % dates[1] query += " ORDER BY b.id ASC""" res = run_sql(query) - list = create_range_list(res) - if not list: + alist = create_range_list([row[0] for row in res]) + if not alist: write_message( "No new records added. %s is up to date" % self.tablename) else: - self.add_recIDs(list) - return list + self.add_recIDs(alist) + return alist def add_recID_range(self, recID1, recID2): """Add records from RECID1 to RECID2.""" wlist = {} normalize = {} self.recIDs_in_mem.append([recID1,recID2]) # secondly fetch all needed tags: for (tag, weight, lang) in self.fields_to_index: if tag in tagToWordsFunctions.keys(): get_words_function = tagToWordsFunctions[tag] else: get_words_function = get_words_from_phrase bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = """SELECT bb.id_bibrec,b.value FROM %s AS b, %s AS bb WHERE bb.id_bibrec BETWEEN %d AND %d AND bb.id_bibxxx=b.id AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID1, recID2, tag) res = run_sql(query) nb_total_to_read = len(res) verbose_idx = 0 # for verbose pretty printing for row in res: recID, phrase = row if recID in options["validset"]: if not wlist.has_key(recID): wlist[recID] = {} new_words = get_words_function(phrase, weight, lang) # ,self.separators wlist[recID] = dict_union(new_words,wlist[recID]) # were there some words for these recIDs found? if len(wlist) == 0: return 0 recIDs = wlist.keys() for recID in recIDs: # was this record marked as deleted? if "DELETED" in self.get_field(recID, "980__c"): wlist[recID] = {} write_message("... record %d was declared deleted, removing its word list" % recID, verbose=9) write_message("... record %d, termlist: %s" % (recID, wlist[recID]), verbose=9) # put words into reverse index table with FUTURE status: for recID in recIDs: run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'FUTURE')" % self.tablename[:-1], (recID, serialize_via_marshal(wlist[recID]))) # ... and, for new records, enter the CURRENT status as empty: try: run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'CURRENT')" % self.tablename[:-1], (recID, serialize_via_marshal([]))) except DatabaseError: # okay, it's an already existing record, no problem pass # put words into memory word list: put = self.put for recID in recIDs: for (w, count) in wlist[recID].iteritems(): put(recID, w, count) return len(recIDs) def log_progress(self, start, done, todo): """Calculate progress and store it. start: start time, done: records processed, todo: total number of records""" time_elapsed = time.time() - start # consistency check if time_elapsed == 0 or done > todo: return time_recs_per_min = done/(time_elapsed/60.0) write_message("%d records took %.1f seconds to complete.(%1.f recs/min)"\ % (done, time_elapsed, time_recs_per_min)) if time_recs_per_min: write_message("Estimated runtime: %.1f minutes" % \ ((todo-done)/time_recs_per_min)) def put(self, recID, word, sign): "Adds/deletes a word to the word list." try: word = wash_index_term(word) if self.value.has_key(word): # the word 'word' exist already: update sign self.value[word][recID] = sign # PROBLEM ? else: self.value[word] = {recID: sign} except: write_message("Error: Cannot put word %s with sign %d for recID %s." % (word, sign, recID)) def del_recIDs(self, recIDs): """Fetches records which id in the recIDs range list and adds them to the wordTable. The recIDs range list is of the form: [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]]. """ count = 0 for range in recIDs: self.del_recID_range(range[0],range[1]) count = count + range[1] - range[0] self.put_into_db() def del_recID_range(self, low, high): """Deletes records with 'recID' system number between low and high from memory words index table.""" write_message("%s fetching existing words for records #%d-#%d started" % \ (self.tablename, low, high), verbose=3) self.recIDs_in_mem.append([low,high]) query = """SELECT id_bibrec,termlist FROM %sR as bb WHERE bb.id_bibrec BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high) recID_rows = run_sql(query) for recID_row in recID_rows: recID = recID_row[0] wlist = deserialize_via_marshal(recID_row[1]) for word in wlist: self.put(recID, word, (-1, 0)) write_message("%s fetching existing words for records #%d-#%d ended" % \ (self.tablename, low, high), verbose=3) def report_on_table_consistency(self): """Check reverse words index tables (e.g. rnkWORD01R) for interesting states such as 'TEMPORARY' state. Prints small report (no of words, no of bad words). """ # find number of words: query = """SELECT COUNT(*) FROM %s""" % (self.tablename) res = run_sql(query, None, 1) if res: nb_words = res[0][0] else: nb_words = 0 # find number of records: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1]) res = run_sql(query, None, 1) if res: nb_records = res[0][0] else: nb_records = 0 # report stats: write_message("%s contains %d words from %d records" % (self.tablename, nb_words, nb_records)) # find possible bad states in reverse tables: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1]) res = run_sql(query) if res: nb_bad_records = res[0][0] else: nb_bad_records = 999999999 if nb_bad_records: write_message("EMERGENCY: %s needs to repair %d of %d index records" % \ (self.tablename, nb_bad_records, nb_records)) else: write_message("%s is in consistent state" % (self.tablename)) return nb_bad_records def repair(self): """Repair the whole table""" # find possible bad states in reverse tables: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1]) res = run_sql(query, None, 1) if res: nb_bad_records = res[0][0] else: nb_bad_records = 0 # find number of records: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1]) res = run_sql(query) if res: nb_records = res[0][0] else: nb_records = 0 if nb_bad_records == 0: return query = """SELECT id_bibrec FROM %sR WHERE type <> 'CURRENT' ORDER BY id_bibrec""" \ % (self.tablename[:-1]) res = run_sql(query) - recIDs = create_range_list(res) + recIDs = create_range_list([row[0] for row in res]) flush_count = 0 records_done = 0 records_to_go = 0 for range in recIDs: records_to_go = records_to_go + range[1] - range[0] + 1 time_started = time.time() # will measure profile time for range in recIDs: i_low = range[0] chunksize_count = 0 while i_low <= range[1]: # calculate chunk group of recIDs and treat it: i_high = min(i_low+task_get_option("flush")-flush_count-1,range[1]) i_high = min(i_low+chunksize-chunksize_count-1, i_high) try: self.fix_recID_range(i_low, i_high) except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) register_exception() task_update_status("ERROR") sys.exit(1) flush_count = flush_count + i_high - i_low + 1 chunksize_count = chunksize_count + i_high - i_low + 1 records_done = records_done + i_high - i_low + 1 if chunksize_count >= chunksize: chunksize_count = 0 # flush if necessary: if flush_count >= task_get_option("flush"): self.put_into_db("emergency") self.clean() flush_count = 0 self.log_progress(time_started,records_done,records_to_go) # iterate: i_low = i_high + 1 if flush_count > 0: self.put_into_db("emergency") self.log_progress(time_started,records_done,records_to_go) write_message("%s inconsistencies repaired." % self.tablename) def chk_recID_range(self, low, high): """Check if the reverse index table is in proper state""" ## check db query = """SELECT COUNT(*) FROM %sR WHERE type <> 'CURRENT' AND id_bibrec BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high) res = run_sql(query, None, 1) if res[0][0]==0: write_message("%s for %d-%d is in consistent state"%(self.tablename,low,high)) return # okay, words table is consistent ## inconsistency detected! write_message("EMERGENCY: %s inconsistencies detected..." % self.tablename) write_message("""EMERGENCY: Errors found. You should check consistency of the %s - %sR tables.\nRunning 'bibrank --repair' is recommended.""" \ % (self.tablename, self.tablename[:-1])) raise StandardError def fix_recID_range(self, low, high): """Try to fix reverse index database consistency (e.g. table rnkWORD01R) in the low,high doc-id range. Possible states for a recID follow: CUR TMP FUT: very bad things have happened: warn! CUR TMP : very bad things have happened: warn! CUR FUT: delete FUT (crash before flushing) CUR : database is ok TMP FUT: add TMP to memory and del FUT from memory flush (revert to old state) TMP : very bad things have happened: warn! FUT: very bad things have happended: warn! """ state = {} query = "SELECT id_bibrec,type FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d'"\ % (self.tablename[:-1], low, high) res = run_sql(query) for row in res: if not state.has_key(row[0]): state[row[0]]=[] state[row[0]].append(row[1]) ok = 1 # will hold info on whether we will be able to repair for recID in state.keys(): if not 'TEMPORARY' in state[recID]: if 'FUTURE' in state[recID]: if 'CURRENT' not in state[recID]: write_message("EMERGENCY: Index record %d is in inconsistent state. Can't repair it" % recID) ok = 0 else: write_message("EMERGENCY: Inconsistency in index record %d detected" % recID) query = """DELETE FROM %sR WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID) run_sql(query) write_message("EMERGENCY: Inconsistency in index record %d repaired." % recID) else: if 'FUTURE' in state[recID] and not 'CURRENT' in state[recID]: self.recIDs_in_mem.append([recID,recID]) # Get the words file query = """SELECT type,termlist FROM %sR WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID) write_message(query, verbose=9) res = run_sql(query) for row in res: wlist = deserialize_via_marshal(row[1]) write_message("Words are %s " % wlist, verbose=9) if row[0] == 'TEMPORARY': sign = 1 else: sign = -1 for word in wlist: self.put(recID, word, wlist[word]) else: write_message("EMERGENCY: %s for %d is in inconsistent state. Couldn't repair it." % (self.tablename, recID)) ok = 0 if not ok: write_message("""EMERGENCY: Unrepairable errors found. You should check consistency of the %s - %sR tables. Deleting affected TEMPORARY and FUTURE entries from these tables is recommended; see the BibIndex Admin Guide. (The repairing procedure is similar for bibrank word indexes.)""" % (self.tablename, self.tablename[:-1])) raise StandardError def word_index(run): """Run the indexing task. The row argument is the BibSched task queue row, containing if, arguments, etc. Return 1 in case of success and 0 in case of failure. """ ## import optional modules: try: import psyco psyco.bind(get_words_from_phrase) psyco.bind(WordTable.merge_with_old_recIDs) psyco.bind(update_rnkWORD) psyco.bind(check_rnkWORD) except StandardError,e: print "Warning: Psyco", e pass global languages max_recid = 0 res = run_sql("SELECT max(id) FROM bibrec") if res and res[0][0]: max_recid = int(res[0][0]) options["run"] = [] options["run"].append(run) for rank_method_code in options["run"]: task_sleep_now_if_required(can_stop_too=True) method_starting_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) write_message("Running rank method: %s" % getName(rank_method_code)) try: file = CFG_ETCDIR + "/bibrank/" + rank_method_code + ".cfg" config = ConfigParser.ConfigParser() config.readfp(open(file)) except StandardError, e: write_message("Cannot find configurationfile: %s" % file, sys.stderr) raise StandardError options["current_run"] = rank_method_code options["modified_words"] = {} options["table"] = config.get(config.get("rank_method", "function"), "table") options["use_stemming"] = config.get(config.get("rank_method","function"),"stemming") options["remove_stopword"] = config.get(config.get("rank_method","function"),"stopword") tags = get_tags(config) #get the tags to include options["validset"] = get_valid_range(rank_method_code) #get the records from the collections the method is enabled for function = config.get("rank_method","function") wordTable = WordTable(options["table"], tags) wordTable.report_on_table_consistency() try: if task_get_option("cmd") == "del": if task_get_option("id"): wordTable.del_recIDs(task_get_option("id")) task_sleep_now_if_required(can_stop_too=True) elif task_get_option("collection"): l_of_colls = task_get_option("collection").split(",") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID,recID]) wordTable.del_recIDs(recIDs_range) task_sleep_now_if_required(can_stop_too=True) else: write_message("Missing IDs of records to delete from index %s.", wordTable.tablename, sys.stderr) raise StandardError elif task_get_option("cmd") == "add": if task_get_option("id"): wordTable.add_recIDs(task_get_option("id")) task_sleep_now_if_required(can_stop_too=True) elif task_get_option("collection"): l_of_colls = task_get_option("collection").split(",") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID,recID]) wordTable.add_recIDs(recIDs_range) task_sleep_now_if_required(can_stop_too=True) elif task_get_option("last_updated"): wordTable.add_recIDs_by_date("") # only update last_updated if run via automatic mode: wordTable.update_last_updated(rank_method_code, method_starting_time) task_sleep_now_if_required(can_stop_too=True) elif task_get_option("modified"): wordTable.add_recIDs_by_date(task_get_option("modified")) task_sleep_now_if_required(can_stop_too=True) else: wordTable.add_recIDs([[0,max_recid]]) task_sleep_now_if_required(can_stop_too=True) elif task_get_option("cmd") == "repair": wordTable.repair() check_rnkWORD(options["table"]) task_sleep_now_if_required(can_stop_too=True) elif task_get_option("cmd") == "check": check_rnkWORD(options["table"]) options["modified_words"] = {} task_sleep_now_if_required(can_stop_too=True) elif task_get_option("cmd") == "stat": rank_method_code_statistics(options["table"]) task_sleep_now_if_required(can_stop_too=True) else: write_message("Invalid command found processing %s" % \ wordTable.tablename, sys.stderr) raise StandardError update_rnkWORD(options["table"], options["modified_words"]) task_sleep_now_if_required(can_stop_too=True) except StandardError, e: register_exception(alert_admin=True) write_message("Exception caught: %s" % e, sys.stderr) sys.exit(1) wordTable.report_on_table_consistency() # We are done. State it in the database, close and quit return 1 def get_tags(config): """Get the tags that should be used creating the index and each tag's parameter""" tags = [] function = config.get("rank_method","function") i = 1 shown_error = 0 #try: if 1: while config.has_option(function,"tag%s"% i): tag = config.get(function, "tag%s" % i) tag = tag.split(",") tag[1] = int(tag[1].strip()) tag[2] = tag[2].strip() #check if stemmer for language is available if config.get(function, "stemming") and stem("information", "en") != "inform": if shown_error == 0: write_message("Warning: Stemming not working. Please check it out!") shown_error = 1 elif tag[2] and tag[2] != "none" and config.get(function,"stemming") and not is_stemmer_available_for_language(tag[2]): write_message("Warning: Stemming not available for language '%s'." % tag[2]) tags.append(tag) i += 1 #except Exception: # write_message("Could not read data from configuration file, please check for errors") # raise StandardError return tags def get_valid_range(rank_method_code): """Returns which records are valid for this rank method, according to which collections it is enabled for.""" #if options["verbose"] >=9: # write_message("Getting records from collections enabled for rank method.") #res = run_sql("SELECT collection.name FROM collection,collection_rnkMETHOD,rnkMETHOD WHERE collection.id=id_collection and id_rnkMETHOD=rnkMETHOD.id and rnkMETHOD.name='%s'" % rank_method_code) #l_of_colls = [] #for coll in res: # l_of_colls.append(coll[0]) #if len(l_of_colls) > 0: # recIDs = perform_request_search(c=l_of_colls) #else: # recIDs = [] valid = intbitset(trailing_bits=1) valid.discard(0) #valid.addlist(recIDs) return valid def check_term(term, termlength): """Check if term contains not allowed characters, or for any other reasons for not using this term.""" try: if len(term) <= termlength: return False reg = re.compile(r"[1234567890\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]") if re.search(reg, term): return False term = str.replace(term, "-", "") term = str.replace(term, ".", "") term = str.replace(term, ",", "") if int(term): return False except StandardError, e: pass return True def check_rnkWORD(table): """Checks for any problems in rnkWORD tables.""" i = 0 errors = {} termslist = run_sql("SELECT term FROM %s" % table) N = run_sql("select max(id_bibrec) from %sR" % table[:-1])[0][0] write_message("Checking integrity of rank values in %s" % table) terms = map(lambda x: x[0], termslist) while i < len(terms): query_params = () for j in range(i, ((i+5000)< len(terms) and (i+5000) or len(terms))): query_params += (terms[j],) terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE term IN (%s)" % (table, (len(query_params)*"%s,")[:-1]), query_params) for (t, hitlist) in terms_docs: term_docs = deserialize_via_marshal(hitlist) if (term_docs.has_key("Gi") and term_docs["Gi"][1] == 0) or not term_docs.has_key("Gi"): write_message("ERROR: Missing value for term: %s (%s) in %s: %s" % (t, repr(t), table, len(term_docs))) errors[t] = 1 i += 5000 write_message("Checking integrity of rank values in %sR" % table[:-1]) i = 0 while i < N: docs_terms = run_sql("SELECT id_bibrec, termlist FROM %sR WHERE id_bibrec>=%s and id_bibrec<=%s" % (table[:-1], i, i+5000)) for (j, termlist) in docs_terms: termlist = deserialize_via_marshal(termlist) for (t, tf) in termlist.iteritems(): if tf[1] == 0 and not errors.has_key(t): errors[t] = 1 write_message("ERROR: Gi missing for record %s and term: %s (%s) in %s" % (j,t,repr(t), table)) terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE term=%%s" % table, (t,)) termlist = deserialize_via_marshal(terms_docs[0][1]) i += 5000 if len(errors) == 0: write_message("No direct errors found, but nonconsistent data may exist.") else: write_message("%s errors found during integrity check, repair and rebalancing recommended." % len(errors)) options["modified_words"] = errors def rank_method_code_statistics(table): """Shows some statistics about this rank method.""" maxID = run_sql("select max(id) from %s" % table) maxID = maxID[0][0] terms = {} Gi = {} write_message("Showing statistics of terms in index:") write_message("Important: For the 'Least used terms', the number of terms is shown first, and the number of occurences second.") write_message("Least used terms---Most important terms---Least important terms") i = 0 while i < maxID: terms_docs=run_sql("SELECT term, hitlist FROM %s WHERE id>= %s and id < %s" % (table, i, i + 10000)) for (t, hitlist) in terms_docs: term_docs=deserialize_via_marshal(hitlist) terms[len(term_docs)] = terms.get(len(term_docs), 0) + 1 if term_docs.has_key("Gi"): Gi[t] = term_docs["Gi"] i=i + 10000 terms=terms.items() terms.sort(lambda x, y: cmp(y[1], x[1])) Gi=Gi.items() Gi.sort(lambda x, y: cmp(y[1], x[1])) for i in range(0, 20): write_message("%s/%s---%s---%s" % (terms[i][0],terms[i][1], Gi[i][0],Gi[len(Gi) - i - 1][0])) def update_rnkWORD(table, terms): """Updates rnkWORDF and rnkWORDR with Gi and Nj values. For each term in rnkWORDF, a Gi value for the term is added. And for each term in each document, the Nj value for that document is added. In rnkWORDR, the Gi value for each term in each document is added. For description on how things are computed, look in the hacking docs. table - name of forward index to update terms - modified terms""" stime = time.time() Gi = {} Nj = {} N = run_sql("select count(id_bibrec) from %sR" % table[:-1])[0][0] if len(terms) == 0 and task_get_option("quick") == "yes": write_message("No terms to process, ending...") return "" elif task_get_option("quick") == "yes": #not used -R option, fast calculation (not accurate) write_message("Beginning post-processing of %s terms" % len(terms)) #Locating all documents related to the modified/new/deleted terms, if fast update, #only take into account new/modified occurences write_message("Phase 1: Finding records containing modified terms") terms = terms.keys() i = 0 while i < len(terms): terms_docs = get_from_forward_index(terms, i, (i+5000), table) for (t, hitlist) in terms_docs: term_docs = deserialize_via_marshal(hitlist) if term_docs.has_key("Gi"): del term_docs["Gi"] for (j, tf) in term_docs.iteritems(): if (task_get_option("quick") == "yes" and tf[1] == 0) or task_get_option("quick") == "no": Nj[j] = 0 write_message("Phase 1: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms))) i += 5000 write_message("Phase 1: Finished finding records containing modified terms") #Find all terms in the records found in last phase write_message("Phase 2: Finding all terms in affected records") records = Nj.keys() i = 0 while i < len(records): docs_terms = get_from_reverse_index(records, i, (i + 5000), table) for (j, termlist) in docs_terms: doc_terms = deserialize_via_marshal(termlist) for (t, tf) in doc_terms.iteritems(): Gi[t] = 0 write_message("Phase 2: ......processed %s/%s records " % ((i+5000>len(records) and len(records) or (i+5000)), len(records))) i += 5000 write_message("Phase 2: Finished finding all terms in affected records") else: #recalculate max_id = run_sql("SELECT MAX(id) FROM %s" % table) max_id = max_id[0][0] write_message("Beginning recalculation of %s terms" % max_id) terms = [] i = 0 while i < max_id: terms_docs = get_from_forward_index_with_id(i, (i+5000), table) for (t, hitlist) in terms_docs: Gi[t] = 0 term_docs = deserialize_via_marshal(hitlist) if term_docs.has_key("Gi"): del term_docs["Gi"] for (j, tf) in term_docs.iteritems(): Nj[j] = 0 write_message("Phase 1: ......processed %s/%s terms" % ((i+5000)>max_id and max_id or (i+5000), max_id)) i += 5000 write_message("Phase 1: Finished finding which records contains which terms") write_message("Phase 2: Jumping over..already done in phase 1 because of -R option") terms = Gi.keys() Gi = {} i = 0 if task_get_option("quick") == "no": #Calculating Fi and Gi value for each term write_message("Phase 3: Calculating importance of all affected terms") while i < len(terms): terms_docs = get_from_forward_index(terms, i, (i+5000), table) for (t, hitlist) in terms_docs: term_docs = deserialize_via_marshal(hitlist) if term_docs.has_key("Gi"): del term_docs["Gi"] Fi = 0 Gi[t] = 1 for (j, tf) in term_docs.iteritems(): Fi += tf[0] for (j, tf) in term_docs.iteritems(): if tf[0] != Fi: Gi[t] = Gi[t] + ((float(tf[0]) / Fi) * math.log(float(tf[0]) / Fi) / math.log(2)) / math.log(N) write_message("Phase 3: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms))) i += 5000 write_message("Phase 3: Finished calculating importance of all affected terms") else: #Using existing Gi value instead of calculating a new one. Missing some accurancy. write_message("Phase 3: Getting approximate importance of all affected terms") while i < len(terms): terms_docs = get_from_forward_index(terms, i, (i+5000), table) for (t, hitlist) in terms_docs: term_docs = deserialize_via_marshal(hitlist) if term_docs.has_key("Gi"): Gi[t] = term_docs["Gi"][1] elif len(term_docs) == 1: Gi[t] = 1 else: Fi = 0 Gi[t] = 1 for (j, tf) in term_docs.iteritems(): Fi += tf[0] for (j, tf) in term_docs.iteritems(): if tf[0] != Fi: Gi[t] = Gi[t] + ((float(tf[0]) / Fi) * math.log(float(tf[0]) / Fi) / math.log(2)) / math.log(N) write_message("Phase 3: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms))) i += 5000 write_message("Phase 3: Finished getting approximate importance of all affected terms") write_message("Phase 4: Calculating normalization value for all affected records and updating %sR" % table[:-1]) records = Nj.keys() i = 0 while i < len(records): #Calculating the normalization value for each document, and adding the Gi value to each term in each document. docs_terms = get_from_reverse_index(records, i, (i + 5000), table) for (j, termlist) in docs_terms: doc_terms = deserialize_via_marshal(termlist) try: for (t, tf) in doc_terms.iteritems(): if Gi.has_key(t): Nj[j] = Nj.get(j, 0) + math.pow(Gi[t] * (1 + math.log(tf[0])), 2) Git = int(math.floor(Gi[t]*100)) if Git >= 0: Git += 1 doc_terms[t] = (tf[0], Git) else: Nj[j] = Nj.get(j, 0) + math.pow(tf[1] * (1 + math.log(tf[0])), 2) Nj[j] = 1.0 / math.sqrt(Nj[j]) Nj[j] = int(Nj[j] * 100) if Nj[j] >= 0: Nj[j] += 1 run_sql("UPDATE %sR SET termlist=%%s WHERE id_bibrec=%%s" % table[:-1], (serialize_via_marshal(doc_terms), j)) except (ZeroDivisionError, OverflowError), e: ## This is to try to isolate division by zero errors. register_exception(prefix="Error when analysing the record %s (%s): %s\n" % (j, repr(docs_terms), e), alert_admin=True) write_message("Phase 4: ......processed %s/%s records" % ((i+5000>len(records) and len(records) or (i+5000)), len(records))) i += 5000 write_message("Phase 4: Finished calculating normalization value for all affected records and updating %sR" % table[:-1]) write_message("Phase 5: Updating %s with new normalization values" % table) i = 0 terms = Gi.keys() while i < len(terms): #Adding the Gi value to each term, and adding the normalization value to each term in each document. terms_docs = get_from_forward_index(terms, i, (i+5000), table) for (t, hitlist) in terms_docs: try: term_docs = deserialize_via_marshal(hitlist) if term_docs.has_key("Gi"): del term_docs["Gi"] for (j, tf) in term_docs.iteritems(): if Nj.has_key(j): term_docs[j] = (tf[0], Nj[j]) Git = int(math.floor(Gi[t]*100)) if Git >= 0: Git += 1 term_docs["Gi"] = (0, Git) run_sql("UPDATE %s SET hitlist=%%s WHERE term=%%s" % table, (serialize_via_marshal(term_docs), t)) except (ZeroDivisionError, OverflowError), e: register_exception(prefix="Error when analysing the term %s (%s): %s\n" % (t, repr(terms_docs), e), alert_admin=True) write_message("Phase 5: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms))) i += 5000 write_message("Phase 5: Finished updating %s with new normalization values" % table) write_message("Time used for post-processing: %.1fmin" % ((time.time() - stime) / 60)) write_message("Finished post-processing") def get_from_forward_index(terms, start, stop, table): terms_docs = () for j in range(start, (stop < len(terms) and stop or len(terms))): terms_docs += run_sql("SELECT term, hitlist FROM %s WHERE term=%%s" % table, (terms[j],)) return terms_docs def get_from_forward_index_with_id(start, stop, table): terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE id BETWEEN %s AND %s" % (table, start, stop)) return terms_docs def get_from_reverse_index(records, start, stop, table): current_recs = "%s" % records[start:stop] current_recs = current_recs[1:-1] docs_terms = run_sql("SELECT id_bibrec, termlist FROM %sR WHERE id_bibrec IN (%s)" % (table[:-1], current_recs)) return docs_terms #def test_word_separators(phrase="hep-th/0101001"): #"""Tests word separating policy on various input.""" #print "%s:" % phrase #gwfp = get_words_from_phrase(phrase) #for (word, count) in gwfp.iteritems(): #print "\t-> %s - %s" % (word, count) def getName(methname, ln=CFG_SITE_LANG, type='ln'): """Returns the name of the rank method, either in default language or given language. methname = short name of the method ln - the language to get the name in type - which name "type" to get.""" try: rnkid = run_sql("SELECT id FROM rnkMETHOD where name='%s'" % methname) if rnkid: rnkid = str(rnkid[0][0]) res = run_sql("SELECT value FROM rnkMETHODNAME where type='%s' and ln='%s' and id_rnkMETHOD=%s" % (type, ln, rnkid)) if not res: res = run_sql("SELECT value FROM rnkMETHODNAME WHERE ln='%s' and id_rnkMETHOD=%s and type='%s'" % (CFG_SITE_LANG, rnkid, type)) if not res: return methname return res[0][0] else: raise Exception except Exception, e: write_message("Cannot run rank method, either given code for method is wrong, or it has not been added using the webinterface.") raise Exception def word_similarity(run): """Call correct method""" return word_index(run) diff --git a/modules/bibrank/lib/bibrankgkb.py b/modules/bibrank/lib/bibrankgkb.py index eb903e922..0302305cf 100644 --- a/modules/bibrank/lib/bibrankgkb.py +++ b/modules/bibrank/lib/bibrankgkb.py @@ -1,284 +1,284 @@ ## -*- mode: python; coding: utf-8; -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ Usage: bibrankgkb %s [options] Examples: bibrankgkb --input=bibrankgkb.cfg --output=test.kb bibrankgkb -otest.kb -v9 bibrankgkb -v9 Generate options: -i, --input=file input file, default from /etc/bibrank/bibrankgkb.cfg -o, --output=file output file, will be placed in current folder General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1) """ __revision__ = "$Id$" import getopt import sys import time import urllib import re import ConfigParser from invenio.config import CFG_ETCDIR from invenio.dbquery import run_sql opts_dict = {} task_id = -1 def bibrankgkb(config): """Generates a .kb file based on input from the configuration file""" if opts_dict["verbose"] >= 1: write_message("Running: Generate Knowledgebase.") journals = {} journal_src = {} i = 0 #Reading the configuration file while config.has_option("bibrankgkb","create_%s" % i): cfg = config.get("bibrankgkb", "create_%s" % i).split(",,") conv = {} temp = {} #Input source 1, either file, www or from db if cfg[0] == "file": conv = get_from_source(cfg[0], cfg[1]) del cfg[0:2] elif cfg[0] == "www": j = 0 urls = {} while config.has_option("bibrankgkb", cfg[1] % j): urls[j] = config.get("bibrankgkb", cfg[1] % j) j = j + 1 conv = get_from_source(cfg[0], (urls, cfg[2])) del cfg[0:3] elif cfg[0] == "db": conv = get_from_source(cfg[0], (cfg[1], cfg[2])) del cfg[0:3] if not conv: del cfg[0:2] else: if opts_dict["verbose"] >= 9: write_message("Using last resource for converting values.") #Input source 2, either file, www or from db if cfg[0] == "file": temp = get_from_source(cfg[0], cfg[1]) elif cfg[0] == "www": j = 0 urls = {} while config.has_option("bibrankgkb", cfg[1] % j): urls[j] = config.get("bibrankgkb", cfg[1] % j) j = j + 1 temp = get_from_source(cfg[0], (urls, cfg[2])) elif cfg[0] == "db": temp = get_from_source(cfg[0], (cfg[1], cfg[2])) i = i + 1 - #If a convertion file is given, the names will be converted to the correct convention + #If a conversion file is given, the names will be converted to the correct convention if len(conv) != 0: if opts_dict["verbose"] >= 9: write_message("Converting between naming conventions given.") temp = convert(conv, temp) if len(journals) != 0: for element in temp.keys(): if not journals.has_key(element): journals[element] = temp[element] else: journals = temp #Writing output file if opts_dict["output"]: f = open(opts_dict["output"], 'w') f.write("#Created by %s\n" % __revision__) f.write("#Sources:\n") for key in journals.keys(): f.write("%s---%s\n" % (key, journals[key])) f.close() if opts_dict["verbose"] >= 9: write_message("Output complete: %s" % opts_dict["output"]) write_message("Number of hits: %s" % len(journals)) if opts_dict["verbose"] >= 9: write_message("Result:") for key in journals.keys(): write_message("%s---%s" % (key, journals[key])) write_message("Total nr of lines: %s" % len(journals)) def showtime(timeused): if opts_dict["verbose"] >= 9: write_message("Time used: %d second(s)." % timeused) def get_from_source(type, data): """Read a source based on the input to the function""" datastruct = {} if type == "db": jvalue = run_sql(data[0]) jname = dict(run_sql(data[1])) if opts_dict["verbose"] >= 9: write_message("Reading data from database using SQL statements:") write_message(jvalue) write_message(jname) for key, value in jvalue: if jname.has_key(key): key2 = jname[key].strip() datastruct[key2] = value #print "%s---%s" % (key2, value) elif type == "file": input = open(data, 'r') if opts_dict["verbose"] >= 9: write_message("Reading data from file: %s" % data) data = input.readlines() datastruct = {} for line in data: #print line if not line[0:1] == "#": key = line.strip().split("---")[0].split() value = line.strip().split("---")[1] datastruct[key] = value #print "%s---%s" % (key,value) elif type == "www": if opts_dict["verbose"] >= 9: write_message("Reading data from www using regexp: %s" % data[1]) write_message("Reading data from url:") for link in data[0].keys(): if opts_dict["verbose"] >= 9: write_message(data[0][link]) page = urllib.urlopen(data[0][link]) input = page.read() #Using the regexp from config file reg = re.compile(data[1]) iterator = re.finditer(reg, input) for match in iterator: if match.group("value"): key = match.group("key").strip() value = match.group("value").replace(",", ".") datastruct[key] = value if opts_dict["verbose"] == 9: print "%s---%s" % (key, value) return datastruct def convert(convstruct, journals): """Converting between names""" if len(convstruct) > 0 and len(journals) > 0: invconvstruct = dict(map(lambda x: (x[1], x[0]), convstruct.items())) tempjour = {} for name in journals.keys(): if convstruct.has_key(name): tempjour[convstruct[name]] = journals[name] elif invconvstruct.has_key(name): tempjour[name] = journals[name] return tempjour else: return journals def write_message(msg, stream = sys.stdout): """Write message and flush output stream (may be sys.stdout or sys.stderr). Useful for debugging stuff.""" if stream == sys.stdout or stream == sys.stderr: stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) try: stream.write("%s\n" % msg) except UnicodeEncodeError: stream.write("%s\n" % msg.encode('ascii', 'backslashreplace')) stream.flush() else: sys.stderr.write("Unknown stream %s. [must be sys.stdout or sys.stderr]\n" % stream) return def usage(code, msg=''): "Prints usage for this module." if msg: sys.stderr.write("Error: %s.\n" % msg) print >> sys.stderr, \ """ Usage: %s [options] Examples: %s --input=bibrankgkb.cfg --output=test.kb %s -otest.kb -v9 %s -v9 Generate options: -i, --input=file input file, default from /etc/bibrank/bibrankgkb.cfg -o, --output=file output file, will be placed in current folder General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1) """ % ((sys.argv[0],) * 4) sys.exit(code) def command_line(): global opts_dict long_flags = ["input=", "output=", "help", "version", "verbose="] short_flags = "i:o:hVv:" format_string = "%Y-%m-%d %H:%M:%S" sleeptime = "" try: opts, args = getopt.getopt(sys.argv[1:], short_flags, long_flags) except getopt.GetoptError, err: write_message(err, sys.stderr) usage(1) if args: usage(1) opts_dict = {"input": "%s/bibrank/bibrankgkb.cfg" % CFG_ETCDIR, "output":"", "verbose":1} sched_time = time.strftime(format_string) user = "" try: for opt in opts: if opt == ("-h","") or opt == ("--help",""): usage(1) elif opt == ("-V","") or opt == ("--version",""): print __revision__ sys.exit(1) elif opt[0] in ["--input", "-i"]: opts_dict["input"] = opt[1] elif opt[0] in ["--output", "-o"]: opts_dict["output"] = opt[1] elif opt[0] in ["--verbose", "-v"]: opts_dict["verbose"] = int(opt[1]) else: usage(1) startCreate = time.time() file = opts_dict["input"] config = ConfigParser.ConfigParser() config.readfp(open(file)) bibrankgkb(config) if opts_dict["verbose"] >= 9: showtime((time.time() - startCreate)) except StandardError, e: write_message(e, sys.stderr) sys.exit(1) return def main(): command_line() if __name__ == "__main__": main() diff --git a/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc b/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc index bde9e6489..7e60cb42d 100644 --- a/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc +++ b/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc @@ -1,355 +1,349 @@ ## -*- mode: html; coding: utf-8; -*- ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

    Contents

    1. Overview
    2. Configuring BibUpload
    3. Running BibUpload
           3.1. Inserting new records
           3.2. Updating existing records
           3.3. Inserting and updating at the same time
           3.4. Updating preformatted output formats
           3.5. Uploading fulltext files

    1. Overview

    BibUpload enables you to upload bibliographic data in MARCXML format into CDS Invenio bibliographic database. It is also used internally by other CDS Invenio modules as the sole entrance of metadata into the bibliographic databases.

    Note that before uploading a MARCXML file, you may want to run provided /opt/cds-invenio/bin/xmlmarclint on it in order to verify its correctness.

    2. Configuring BibUpload

    BibUpload takes a MARCXML file as its input. There is nothing to be configured for these files. If the files have to be coverted into MARCXML from some other format, structured or not, this is usually done beforehand via BibConvert module.

    Note that if you are using external system numbers for your records, such as when your records are being synchronized from an external system, then BibUpload knows about the tag 970 as the one containing external system number. (To change this 970 tag into something else, you would have to edit BibUpload config source file.)

    Note also that in the similar way BibUpload knows about OAI identifiers, so that it will refuse to insert the same OAI harvested record twice, for example.

    3. Running BibUpload

    3.1 Inserting new records

    Consider that you have an MARCXML file containing new records that is to be uploaded into the CDS Invenio. (For example, it might have been produced by BibConvert.) To finish the upload, you would call the BibUpload script in the insert mode as follows:

     $ bibupload -i file.xml
     
     
    In the insert mode, all the records from the file will be treated as new. This means that they should not contain neither 001 tags (holding record IDs) nor 970 tags (holding external system numbers). BibUpload would refuse to upload records having these tags, in order to prevent potential double uploading. If your file does contain 001 or 970, then chances are that you want to update existing records, not re-upload them as new, and so BibUpload will warn you about this and will refuse to continue.

    For example, to insert a new record, your file should look like this:

         <record>
             <datafield tag="100" ind1=" " ind2=" ">
                 <subfield code="a">Doe, John</subfield>
             </datafield>
             <datafield tag="245" ind1=" " ind2=" ">
                 <subfield code="a">On The Foo And Bar</subfield>
             </datafield>
         </record>
     

    3.2 Updating existing records

    When you want to update existing records, with the new content from your input MARCXML file, then your input file should contain either tags 001 (holding record IDs) or tag 970 (holding external system numbers). BibUpload will try to match existing records via 001 and 970 and if it finds a record in the database that corresponds to a record from the file, it will update its content. Otherwise it will signal an error saying that it could not find the record-to-be-updated.

    For example, to update a title of record #123 via correct mode, your input file should contain record ID in the 001 tag and the title in 245 tag as follows:

         <record>
             <controlfield tag="001">123</controlfield>
             <datafield tag="245" ind1=" " ind2=" ">
                 <subfield code="a">My Newly Updated Title</subfield>
             </datafield>
         </record>
     

    There are several updating modes:

     
         -r, --replace Replace existing records by those from the XML
                       MARC file.  The original content is wiped out
                       and fully replaced.  Signals error if record
                       is not found via matching record IDs or system
                       numbers.
     
                       Note also that `-r' can be combined with `-i'
                       into an `-ir' option that would automatically
                       either insert records as new if they are not
                       found in the system, or correct existing
                       records if they are found to exist.
     
         -a, --append  Append fields from XML MARC file at the end of
                       existing records.  The original content is
                       enriched only.  Signals error if record is not
                       found via matching record IDs or system
                       numbers.
     
         -c, --correct Correct fields of existing records by those
                       from XML MARC file.  The original record
                       content is modified only on those fields from
                       the XML MARC file where both the tags and the
                       indicators match: the original fields are
                       removed and replaced by those from the XML
                       MARC file.  Fields not present in XML MARC
                       file are not changed (unlike the -r option).
                       Signals error if record is not found via
                       matching record IDs or system numbers.
     
         -d, --delete  Delete fields of existing records that are
                       contained in the XML MARC file. The fields in
                       the original record that are not present in
                       the XML MARC file are preserved.
                       This is incompatible with FFT (see below).
     

    3.3 Inserting and updating at the same time

    Note that the insert/update modes can be combined together. For example, if you have a file that contains a mixture of new records with possibly some records to be updated, then you can run:

     $ bibupload -i -r file.xml
     
     
    In this case BibUpload will try to do an update (for records having either 001 or 970 identifiers), or an insert (for the other ones).

    3.4 Updating preformatted output formats

    BibFormat can use this special upload mode during which metadata will not be updated, only the preformatted output formats for records:

         -f, --format        Upload only the format (FMT) fields.
                             The original content is not changed, and neither its modification date.
     
    This is useful for bibreformat daemon only; human administrators don't need to explicitly know about this mode.

    3.5 Uploading fulltext files

    The fulltext files can be uploaded and revised via a special FFT ("fulltext file transfer") tag with the following semantic:

         FFT $a  ...  location of the docfile to upload (a filesystem path or a URL)
             $n  ...  docfile name (optional; if not set, deduced from $a)
             $m  ...  new desired docfile name (optional; used for renaming files)
             $t  ...  docfile type (e.g. Main, Additional)
             $d  ...  docfile description (optional)
             $f  ...  format (optional; if not set, deduced from $a)
             $z  ...  comment (optional)
             $r  ...  restriction (optional, see below)
             $v  ...  version (used only with REVERT and DELETE-FILE, see below)
             $x  ...  url/path for an icon (optional)
     

    For example, to upload a new fulltext file thesis.pdf associated to record ID 123:

         <record>
             <controlfield tag="001">123</controlfield>
             <datafield tag="FFT" ind1=" " ind2=" ">
                 <subfield code="a">/tmp/thesis.pdf</subfield>
                 <subfield code="t">Main</subfield>
                 <subfield code="d">
                   This is the fulltext version of my thesis in the PDF format.
                   Chapter 5 still needs some revision.
                 </subfield>
             </datafield>
         </record>
     

    The FFT tag can be repetitive, so one can pass along another FFT tag instance containing a pointer to e.g. the thesis defence slides. The subfields of an FFT tag are non-repetitive.

    When more than one FFT tag is specified for the same document (e.g. for adding more than one format at a time), if $t (docfile type), $m (new desired docfile name), $r (restriction), $v (version), $x (url/path for an icon), are specified, they should be identically specified for each single entry of FFT. E.g. if you want to specify an icon for a document with two formats (say .pdf and .doc), you'll write two FFT tags, both containing the same $x subfield.

    The bibupload process, when it encounters FFT tags, will automatically populate fulltext storage space (/opt/cds-invenio/var/data/files) and metadata record associated tables (bibrec_bibdoc, bibdoc) as appropriate. It will also enrich the 856 tags (URL tags) of the MARC metadata of the record in question with references to the latest versions of each file.

    Note that for $a and $x subfields filesystem paths must be absolute (e.g. /tmp/icon.gif is valid, while Destkop/icon.gif is not) and they must be readable by the user/group of the bibupload process that will handle the FFT.

    The bibupload process supports the usual modes correct, append, replace, insert with a semantic that is somewhat similar to the semantic of the metadata upload:

    Metadata Fulltext
    objects being uploaded MARC field instances characterized by tags (010-999) fulltext files characterized by unique file names (FFT $n)
    insert insert new record; must not exist insert new files; must not exist
    append append new tag instances for the given tag XXX, regardless of existing tag instances append new files, if filename (i.e. new format) not already present
    correct correct tag instances for the given tag XXX; delete existing ones and replace with given ones correct files with the given filename; add new revision or delete file; if the docname does not exist the file is added
    replace replace all tags, whatever XXX are replace all files, whatever filenames are
    delete delete all existing tag instances not supported

    Note, in append and insert mode,

    $m
    is ignored.

    In order to rename a document just use the the correct mode specifing in the $n subfield the original docname that should be renamed and in $m the new name.

    Special values can be assigned to the $t subfield.

    ValueMeaning
    PURGEIn order to purge previous file revisions (i.e. in order to keep only the latest file version), please use the correct mode with $n docname and $t PURGE as the special keyword.
    DELETEIn order to delete all existing versions of a file, making it effectively hidden, please use the correct mode with $n docname and $t DELETE as the special keyword.
    EXPUNGEIn order to expunge (i.e. remove completely, also from the filesystem) all existing versions of a file, making it effectively disappear, please use the correct mode with $n docname and $t EXPUNGE as the special keyword.
    FIX-MARCIn order to synchronize MARC to the bibrec/bibdoc structure (e.g. after an update or a tweak in the database), please use the correct mode with $n docname and $t FIX-MARC as the special keyword.
    FIX-ALLIn order to fix a record (i.e. put all its linked documents in a coherent state) and synchronize the MARC to the table, please use the correct mode with $n docname and $t FIX-ALL as the special keyword.
    REVERTIn order to revert to a previous file revision (i.e. to create a new revision with the same content as some previous revision had), please use the correct mode with $n docname, $t REVERT as the special keyword and $v the number corresponding to the desired version.
    DELETE-FILEIn order to delete a particular file added by mistake, please use the correct mode with $n docname, $t DELETE-FILE, specifing $v version and $f format. Note that this operation is not reversible. Note that if you don't spcify a version, the last version will be used.

    In order to preserve previous comments and descriptions when correcting, please use the KEEP-OLD-VALUE special keyword with the desired $d and $z subfield.

    -

    In order to add an icon representing a document, you must use a $x -subfields. All the FFT for different format of the document must have -then the same $x subfield. You can use KEEP-OLD-VALUE in order to keep -the previous icon when correcting. -

    -

    The $r subfield can contain a keyword that can be use to restrict the given document. The same keyword must be specified for all the format of a given document. The keyword will be used as the status parameter for the "viewrestrdoc" action, which can be used to give access right/restriction to desired user. e.g. if you set the keyword "thesis", you can the connect the "thesisviewer" to the action "viewrestrdoc" with parameter "status" set to "thesis". Then all the user which are linked with the "thesisviewer" role will be able to download the document. Instead any other user will not be allowed. Note, if you use the keyword "KEEP-OLD-VALUE" the previous restrictions if applicable will be kept.

    Note that each time bibupload is called on a record, the 8564 tags pointing to locally stored files are recreated on the basis of the full-text files connected to the record. Thus, if you whish to update some 8564 tag pointing to a locally managed file, the only way to perform this is through the FFT tag, not by editing 8564 directly.

    diff --git a/modules/bibupload/lib/bibupload.py b/modules/bibupload/lib/bibupload.py index 3fe74aa47..236447288 100644 --- a/modules/bibupload/lib/bibupload.py +++ b/modules/bibupload/lib/bibupload.py @@ -1,2045 +1,2011 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ BibUpload: Receive MARC XML file and update the appropriate database tables according to options. Usage: bibupload [options] input.xml Examples: $ bibupload -i input.xml Options: -a, --append new fields are appended to the existing record -c, --correct fields are replaced by the new ones in the existing record -f, --format takes only the FMT fields into account. Does not update -i, --insert insert the new record in the database -r, --replace the existing record is entirely replaced by the new one -d, --delete specified fields are deleted if existing -z, --reference update references (update only 999 fields) -s, --stage=STAGE stage to start from in the algorithm (0: always done; 1: FMT tags; 2: FFT tags; 3: BibFmt; 4: Metadata update; 5: time update) -n, --notimechange do not change record last modification date when updating -o, --holdingpen Makes bibupload insert into holding pen instead the normal database Scheduling options: -u, --user=USER user name to store task, password needed General options: -h, --help print this help and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1) -V --version print the script version """ __revision__ = "$Id$" import os import re import sys import time from zlib import compress import urllib2 import socket import marshal import copy from invenio.config import CFG_OAI_ID_FIELD, \ CFG_BIBUPLOAD_REFERENCE_TAG, \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG, \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG, \ CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG, \ CFG_BIBUPLOAD_STRONG_TAGS, \ CFG_BIBUPLOAD_CONTROLLED_PROVENANCE_TAGS, \ CFG_BIBUPLOAD_SERIALIZE_RECORD_STRUCTURE from invenio.bibupload_config import CFG_BIBUPLOAD_CONTROLFIELD_TAGS, \ CFG_BIBUPLOAD_SPECIAL_TAGS from invenio.dbquery import run_sql, \ Error from invenio.bibrecord import create_records, \ record_add_field, \ record_delete_field, \ record_xml_output, \ record_get_field_instances, \ record_get_field_values, \ field_get_subfield_values, \ field_get_subfield_instances, \ record_extract_oai_id, \ record_modify_subfield, \ record_delete_subfield_from, \ record_delete_fields, \ record_add_subfield_into, \ record_find_field, \ record_extract_oai_id from invenio.search_engine import get_record from invenio.dateutils import convert_datestruct_to_datetext from invenio.errorlib import register_exception from invenio.intbitset import intbitset from invenio.config import CFG_WEBSUBMIT_FILEDIR from invenio.bibtask import task_init, write_message, \ task_set_option, task_get_option, task_get_task_param, task_update_status, \ task_update_progress, task_sleep_now_if_required, fix_argv_paths from invenio.bibdocfile import BibRecDocs, file_strip_ext, normalize_format, \ - get_docname_from_url, get_format_from_url, check_valid_url, download_url, \ + get_docname_from_url, check_valid_url, download_url, \ KEEP_OLD_VALUE, decompose_bibdocfile_url, InvenioWebSubmitFileError, \ - bibdocfile_url_p + bibdocfile_url_p, CFG_BIBDOCFILE_AVAILABLE_FLAGS, guess_format_from_url from invenio.search_engine import search_pattern #Statistic variables stat = {} stat['nb_records_to_upload'] = 0 stat['nb_records_updated'] = 0 stat['nb_records_inserted'] = 0 stat['nb_errors'] = 0 stat['nb_holdingpen'] = 0 stat['exectime'] = time.localtime() ## Let's set a reasonable timeout for URL request (e.g. FFT) socket.setdefaulttimeout(40) _re_find_001 = re.compile('\\s*(\\d*)\\s*', re.S) def bibupload_pending_recids(): """This function embed a bit of A.I. and is more a hack than an elegant algorithm. It should be updated in case bibupload/bibsched are modified in incompatible ways. This function return the intbitset of all the records that are being (or are scheduled to be) touched by other bibuploads. """ options = run_sql("""SELECT arguments FROM schTASK WHERE status<>'DONE' AND proc='bibupload' AND (status='RUNNING' OR status='CONTINUING' OR status='WAITING' OR status='SCHEDULED' OR status='ABOUT TO STOP' OR status='ABOUT TO SLEEP')""") ret = intbitset() xmls = [] if options: for arguments in options: arguments = marshal.loads(arguments[0]) for argument in arguments[1:]: if argument.startswith('/'): # XMLs files are recognizable because they're absolute # files... xmls.append(argument) for xmlfile in xmls: # Let's grep for the 001 try: xml = open(xmlfile).read() ret += [int(group[1]) for group in _re_find_001.findall(xml)] except: continue return ret ### bibupload engine functions: def bibupload(record, opt_tag=None, opt_mode=None, opt_stage_to_start_from=1, opt_notimechange=0, oai_rec_id = ""): """Main function: process a record and fit it in the tables bibfmt, bibrec, bibrec_bibxxx, bibxxx with proper record metadata. Return (error_code, recID) of the processed record. """ assert(opt_mode in ('insert', 'replace', 'replace_or_insert', 'reference', 'correct', 'append', 'format', 'holdingpen', 'delete')) error = None # If there are special tags to proceed check if it exists in the record if opt_tag is not None and not(record.has_key(opt_tag)): write_message(" Failed: Tag not found, enter a valid tag to update.", verbose=1, stream=sys.stderr) return (1, -1) # Extraction of the Record Id from 001, SYSNO or OAIID tags: rec_id = retrieve_rec_id(record, opt_mode) if rec_id == -1: return (1, -1) elif rec_id > 0: write_message(" -Retrieve record ID (found %s): DONE." % rec_id, verbose=2) if not record.has_key('001'): # Found record ID by means of SYSNO or OAIID, and the # input MARCXML buffer does not have this 001 tag, so we # should add it now: error = record_add_field(record, '001', controlfield_value=rec_id) if error is None: write_message(" Failed: " \ "Error during adding the 001 controlfield " \ "to the record", verbose=1, stream=sys.stderr) return (1, int(rec_id)) else: error = None write_message(" -Added tag 001: DONE.", verbose=2) write_message(" -Check if the xml marc file is already in the database: DONE" , verbose=2) # Reference mode check if there are reference tag if opt_mode == 'reference': error = extract_tag_from_record(record, CFG_BIBUPLOAD_REFERENCE_TAG) if error is None: write_message(" Failed: No reference tags has been found...", verbose=1, stream=sys.stderr) return (1, -1) else: error = None write_message(" -Check if reference tags exist: DONE", verbose=2) record_deleted_p = False if opt_mode == 'insert' or \ (opt_mode == 'replace_or_insert' and rec_id is None): insert_mode_p = True # Insert the record into the bibrec databases to have a recordId rec_id = create_new_record() write_message(" -Creation of a new record id (%d): DONE" % rec_id, verbose=2) # we add the record Id control field to the record error = record_add_field(record, '001', controlfield_value=rec_id) if error is None: write_message(" Failed: " \ "Error during adding the 001 controlfield " \ "to the record", verbose=1, stream=sys.stderr) return (1, int(rec_id)) else: error = None elif opt_mode != 'insert' and opt_mode != 'format' and \ opt_stage_to_start_from != 5: insert_mode_p = False # Update Mode # Retrieve the old record to update rec_old = get_record(rec_id) # Also save a copy to restore previous situation in case of errors original_record = get_record(rec_id) if rec_old is None: write_message(" Failed during the creation of the old record!", verbose=1, stream=sys.stderr) return (1, int(rec_id)) else: write_message(" -Retrieve the old record to update: DONE", verbose=2) # In Replace mode, take over old strong tags if applicable: if opt_mode == 'replace' or \ opt_mode == 'replace_or_insert': copy_strong_tags_from_old_record(record, rec_old) # Delete tags to correct in the record if opt_mode == 'correct' or opt_mode == 'reference': delete_tags_to_correct(record, rec_old, opt_tag) write_message(" -Delete the old tags to correct in the old record: DONE", verbose=2) # Delete tags specified if in delete mode if opt_mode == 'delete': record = delete_tags(record, rec_old) write_message(" -Delete specified tags in the old record: DONE", verbose=2) # Append new tag to the old record and update the new record with the old_record modified if opt_mode == 'append' or opt_mode == 'correct' or \ opt_mode == 'reference': record = append_new_tag_to_old_record(record, rec_old, opt_tag, opt_mode) write_message(" -Append new tags to the old record: DONE", verbose=2) # now we clear all the rows from bibrec_bibxxx from the old # record (they will be populated later (if needed) during # stage 4 below): delete_bibrec_bibxxx(rec_old, rec_id) record_deleted_p = True write_message(" -Clean bibrec_bibxxx: DONE", verbose=2) write_message(" -Stage COMPLETED", verbose=2) try: # Have a look if we have FMT tags write_message("Stage 1: Start (Insert of FMT tags if exist).", verbose=2) if opt_stage_to_start_from <= 1 and \ extract_tag_from_record(record, 'FMT') is not None: record = insert_fmt_tags(record, rec_id, opt_mode) if record is None: write_message(" Stage 1 failed: Error while inserting FMT tags", verbose=1, stream=sys.stderr) return (1, int(rec_id)) elif record == 0: # Mode format finished stat['nb_records_updated'] += 1 return (0, int(rec_id)) write_message(" -Stage COMPLETED", verbose=2) else: write_message(" -Stage NOT NEEDED", verbose=2) # Have a look if we have FFT tags write_message("Stage 2: Start (Process FFT tags if exist).", verbose=2) record_had_FFT = False if opt_stage_to_start_from <= 2 and \ extract_tag_from_record(record, 'FFT') is not None: record_had_FFT = True if not writing_rights_p(): write_message(" Stage 2 failed: Error no rights to write fulltext files", verbose=1, stream=sys.stderr) task_update_status("ERROR") sys.exit(1) try: record = elaborate_fft_tags(record, rec_id, opt_mode) except Exception, e: register_exception() write_message(" Stage 2 failed: Error while elaborating FFT tags: %s" % e, verbose=1, stream=sys.stderr) return (1, int(rec_id)) if record is None: write_message(" Stage 2 failed: Error while elaborating FFT tags", verbose=1, stream=sys.stderr) return (1, int(rec_id)) write_message(" -Stage COMPLETED", verbose=2) else: write_message(" -Stage NOT NEEDED", verbose=2) # Have a look if we have FFT tags write_message("Stage 2B: Start (Synchronize 8564 tags).", verbose=2) has_bibdocs = run_sql("SELECT count(id_bibdoc) FROM bibrec_bibdoc JOIN bibdoc ON id_bibdoc=id WHERE id_bibrec=%s AND status<>'DELETED'", (rec_id, ))[0][0] > 0 if opt_stage_to_start_from <= 2 and (has_bibdocs or record_had_FFT or extract_tag_from_record(record, '856') is not None): try: record = synchronize_8564(rec_id, record, record_had_FFT) except Exception, e: register_exception(alert_admin=True) write_message(" Stage 2B failed: Error while synchronizing 8564 tags: %s" % e, verbose=1, stream=sys.stderr) return (1, int(rec_id)) if record is None: write_message(" Stage 2B failed: Error while synchronizing 8564 tags", verbose=1, stream=sys.stderr) return (1, int(rec_id)) write_message(" -Stage COMPLETED", verbose=2) else: write_message(" -Stage NOT NEEDED", verbose=2) # Update of the BibFmt write_message("Stage 3: Start (Update bibfmt).", verbose=2) if opt_stage_to_start_from <= 3: # format the single record as xml rec_xml_new = record_xml_output(record) # Update bibfmt with the format xm of this record if opt_mode != 'format': error = update_bibfmt_format(rec_id, rec_xml_new, 'xm') if error == 1: write_message(" Failed: error during update_bibfmt_format 'xm'", verbose=1, stream=sys.stderr) return (1, int(rec_id)) if CFG_BIBUPLOAD_SERIALIZE_RECORD_STRUCTURE: error = update_bibfmt_format(rec_id, marshal.dumps(record), 'recstruct') if error == 1: write_message(" Failed: error during update_bibfmt_format 'recstruct'", verbose=1, stream=sys.stderr) return (1, int(rec_id)) # archive MARCXML format of this record for version history purposes: error = archive_marcxml_for_history(rec_id) if error == 1: write_message(" Failed to archive MARCXML for history", verbose=1, stream=sys.stderr) return (1, int(rec_id)) else: write_message(" -Archived MARCXML for history : DONE", verbose=2) write_message(" -Stage COMPLETED", verbose=2) # Update the database MetaData write_message("Stage 4: Start (Update the database with the metadata).", verbose=2) if opt_stage_to_start_from <= 4: if opt_mode in ('insert', 'replace', 'replace_or_insert', 'append', 'correct', 'reference', 'delete'): update_database_with_metadata(record, rec_id, oai_rec_id) record_deleted_p = False else: write_message(" -Stage NOT NEEDED in mode %s" % opt_mode, verbose=2) write_message(" -Stage COMPLETED", verbose=2) else: write_message(" -Stage NOT NEEDED", verbose=2) # Finally we update the bibrec table with the current date write_message("Stage 5: Start (Update bibrec table with current date).", verbose=2) if opt_stage_to_start_from <= 5 and \ opt_notimechange == 0 and \ not insert_mode_p: now = convert_datestruct_to_datetext(time.localtime()) write_message(" -Retrieved current localtime: DONE", verbose=2) update_bibrec_modif_date(now, rec_id) write_message(" -Stage COMPLETED", verbose=2) else: write_message(" -Stage NOT NEEDED", verbose=2) # Increase statistics if insert_mode_p: stat['nb_records_inserted'] += 1 else: stat['nb_records_updated'] += 1 # Upload of this record finish write_message("Record "+str(rec_id)+" DONE", verbose=1) return (0, int(rec_id)) finally: if record_deleted_p: ## BibUpload has failed living the record deleted. We should ## back the original record then. update_database_with_metadata(original_record, rec_id, oai_rec_id) write_message(" Restored original record", verbose=1, stream=sys.stderr) def find_record_ids_by_oai_id(oaiId): """ A method finding the records identifier provided the oai identifier returns a list of identifiers matching a given oai identifier """ # Is this record already in invenio (matching by oaiid) recids1 = search_pattern( p = oaiId, f = CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG, m = 'e' ).tolist() # Is this record already in invenio (matching by reportnumber i.e. # particularly 037. Idea: to avoid doubbles insertions) repnumber = oaiId.split(":")[-1] recids2 = search_pattern(p = repnumber, f = "reportnumber", m = 'e' ).tolist() # Is this record already in invenio (matching by reportnumber i.e. # particularly 037. Idea: to avoid doubbles insertions) repnumber = "arXiv:" + oaiId.split(":")[-1] recids3 = search_pattern(p = repnumber, f = "reportnumber", m = 'e' ).tolist() # now assuring, the results are unique res = {} for rid in recids1 + recids2 + recids3: res[rid] = 1 return res.keys() def insert_record_into_holding_pen(record, oai_id): query = "INSERT INTO bibHOLDINGPEN (oai_id, changeset_date, changeset_xml, id_bibrec) VALUES (%s, NOW(), %s, %s)" xml_record = record_xml_output(record) bibrec_ids = find_record_ids_by_oai_id(oai_id) # here determining the identifier of the record if len(bibrec_ids) > 0: bibrec_id = bibrec_ids[0] else: bibrec_id = 0 run_sql(query, (oai_id, xml_record, bibrec_id)) # record_id is logged as 0! ( We are not inserting into the main database) log_record_uploading(oai_id, task_get_task_param('task_id', 0), 0, 'H') stat['nb_holdingpen'] += 1 def print_out_bibupload_statistics(): """Print the statistics of the process""" out = "Task stats: %(nb_input)d input records, %(nb_updated)d updated, " \ "%(nb_inserted)d inserted, %(nb_errors)d errors, %(nb_holdingpen)d inserted to holding pen. " \ "Time %(nb_sec).2f sec." % { \ 'nb_input': stat['nb_records_to_upload'], 'nb_updated': stat['nb_records_updated'], 'nb_inserted': stat['nb_records_inserted'], 'nb_errors': stat['nb_errors'], 'nb_holdingpen': stat['nb_holdingpen'], 'nb_sec': time.time() - time.mktime(stat['exectime']) } write_message(out) def open_marc_file(path): """Open a file and return the data""" try: # open the file containing the marc document marc_file = open(path,'r') marc = marc_file.read() marc_file.close() except IOError, erro: write_message("Error: %s" % erro, verbose=1, stream=sys.stderr) write_message("Exiting.", sys.stderr) task_update_status("ERROR") sys.exit(1) return marc def xml_marc_to_records(xml_marc): """create the records""" # Creation of the records from the xml Marc in argument recs = create_records(xml_marc, 1, 1) if recs == []: write_message("Error: Cannot parse MARCXML file.", verbose=1, stream=sys.stderr) write_message("Exiting.", sys.stderr) task_update_status("ERROR") sys.exit(1) elif recs[0][0] is None: write_message("Error: MARCXML file has wrong format: %s" % recs, verbose=1, stream=sys.stderr) write_message("Exiting.", sys.stderr) task_update_status("ERROR") sys.exit(1) else: recs = map((lambda x:x[0]), recs) return recs def find_record_format(rec_id, format): """Look whether record REC_ID is formatted in FORMAT, i.e. whether FORMAT exists in the bibfmt table for this record. Return the number of times it is formatted: 0 if not, 1 if yes, 2 if found more than once (should never occur). """ out = 0 query = """SELECT COUNT(id) FROM bibfmt WHERE id_bibrec=%s AND format=%s""" params = (rec_id, format) res = [] try: res = run_sql(query, params) out = res[0][0] except Error, error: write_message(" Error during find_record_format() : %s " % error, verbose=1, stream=sys.stderr) return out def find_record_from_recid(rec_id): """ Try to find record in the database from the REC_ID number. Return record ID if found, None otherwise. """ try: res = run_sql("SELECT id FROM bibrec WHERE id=%s", (rec_id,)) except Error, error: write_message(" Error during find_record_bibrec() : %s " % error, verbose=1, stream=sys.stderr) if res: return res[0][0] else: return None def find_record_from_sysno(sysno): """ Try to find record in the database from the external SYSNO number. Return record ID if found, None otherwise. """ bibxxx = 'bib'+CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:2]+'x' bibrec_bibxxx = 'bibrec_' + bibxxx try: res = run_sql("""SELECT bb.id_bibrec FROM %(bibrec_bibxxx)s AS bb, %(bibxxx)s AS b WHERE b.tag=%%s AND b.value=%%s AND bb.id_bibxxx=b.id""" % \ {'bibxxx': bibxxx, 'bibrec_bibxxx': bibrec_bibxxx}, (CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG, sysno,)) except Error, error: write_message(" Error during find_record_from_sysno(): %s " % error, verbose=1, stream=sys.stderr) if res: return res[0][0] else: return None def find_records_from_extoaiid(extoaiid, extoaisrc=None): """ Try to find records in the database from the external EXTOAIID number. Return list of record ID if found, None otherwise. """ assert(CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[:5] == CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG[:5]) bibxxx = 'bib'+CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:2]+'x' bibrec_bibxxx = 'bibrec_' + bibxxx try: write_message(' Looking for extoaiid="%s" with extoaisrc="%s"' % (extoaiid, extoaisrc), verbose=9) id_bibrecs = intbitset(run_sql("""SELECT bb.id_bibrec FROM %(bibrec_bibxxx)s AS bb, %(bibxxx)s AS b WHERE b.tag=%%s AND b.value=%%s AND bb.id_bibxxx=b.id""" % \ {'bibxxx': bibxxx, 'bibrec_bibxxx': bibrec_bibxxx}, (CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG, extoaiid,))) write_message(' Partially found %s for extoaiid="%s"' % (id_bibrecs, extoaiid), verbose=9) ret = intbitset() for id_bibrec in id_bibrecs: record = get_record(id_bibrec) instances = record_get_field_instances(record, CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3], CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3], CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4]) write_message(' recid %s -> instances "%s"' % (id_bibrec, instances), verbose=9) for instance in instances: provenance = field_get_subfield_values(instance, CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG[5]) write_message(' recid %s -> provenance "%s"' % (id_bibrec, provenance), verbose=9) provenance = provenance and provenance[0] or None if provenance is None: if extoaisrc is None: write_message('Found recid %s for extoaiid="%s"' % (id_bibrec, extoaiid), verbose=9) ret.add(id_bibrec) break else: raise Error('Found recid %s for extoaiid="%s" that doesn\'t specify any provenance, while input record does.' % (id_bibrec, extoaiid)) else: if extoaiid is None: - raise Error('Found recid %s for extoaiid="%s" that specify as provenance "%s", while input record does not specify any provenance.' % (id_bibrec, extoaiid, provenance)) + raise Error('Found recid %s for extoaiid="%s" that specifies as provenance "%s", while input record does not specify any provenance.' % (id_bibrec, extoaiid, provenance)) elif provenance == extoaisrc: write_message('Found recid %s for extoaiid="%s" with provenance="%s"' % (id_bibrec, extoaiid, extoaisrc), verbose=9) ret.add(id_bibrec) break return ret except Error, error: write_message(" Error during find_records_from_extoaiid(): %s " % error, verbose=1, stream=sys.stderr) raise def find_record_from_oaiid(oaiid): """ Try to find record in the database from the OAI ID number and OAI SRC. Return record ID if found, None otherwise. """ bibxxx = 'bib'+CFG_OAI_ID_FIELD[0:2]+'x' bibrec_bibxxx = 'bibrec_' + bibxxx try: res = run_sql("""SELECT bb.id_bibrec FROM %(bibrec_bibxxx)s AS bb, %(bibxxx)s AS b WHERE b.tag=%%s AND b.value=%%s AND bb.id_bibxxx=b.id""" % \ {'bibxxx': bibxxx, 'bibrec_bibxxx': bibrec_bibxxx}, (CFG_OAI_ID_FIELD, oaiid,)) except Error, error: write_message(" Error during find_record_from_oaiid(): %s " % error, verbose=1, stream=sys.stderr) if res: return res[0][0] else: return None def extract_tag_from_record(record, tag_number): """ Extract the tag_number for record.""" # first step verify if the record is not already in the database if record: return record.get(tag_number, None) return None def retrieve_rec_id(record, opt_mode): """Retrieve the record Id from a record by using tag 001 or SYSNO or OAI ID tag. opt_mod is the desired mode.""" rec_id = None # 1st step: we look for the tag 001 tag_001 = extract_tag_from_record(record, '001') if tag_001 is not None: # We extract the record ID from the tag rec_id = tag_001[0][3] # if we are in insert mode => error if opt_mode == 'insert': write_message(" Failed : Error tag 001 found in the xml" \ " submitted, you should use the option replace," \ " correct or append to replace an existing" \ " record. (-h for help)", verbose=1, stream=sys.stderr) return -1 else: # we found the rec id and we are not in insert mode => continue # we try to match rec_id against the database: if find_record_from_recid(rec_id) is not None: # okay, 001 corresponds to some known record return int(rec_id) else: # The record doesn't exist yet. We shall have try to check # the SYSNO or OAI id later. write_message(" -Tag 001 value not found in database.", verbose=9) rec_id = None else: write_message(" -Tag 001 not found in the xml marc file.", verbose=9) if rec_id is None: # 2nd step we look for the SYSNO sysnos = record_get_field_values(record, CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3], CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] or "", CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] or "", CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6]) if sysnos: sysno = sysnos[0] # there should be only one external SYSNO write_message(" -Checking if SYSNO " + sysno + \ " exists in the database", verbose=9) # try to find the corresponding rec id from the database rec_id = find_record_from_sysno(sysno) if rec_id is not None: # rec_id found pass else: # The record doesn't exist yet. We will try to check # external and internal OAI ids later. write_message(" -Tag SYSNO value not found in database.", verbose=9) rec_id = None else: write_message(" -Tag SYSNO not found in the xml marc file.", verbose=9) if rec_id is None: # 2nd step we look for the external OAIID extoai_fields = record_get_field_instances(record, CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3], CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] or "", CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] or "") if extoai_fields: for field in extoai_fields: extoaiid = field_get_subfield_values(field, CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6]) extoaisrc = field_get_subfield_values(field, CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG[5:6]) if extoaiid: extoaiid = extoaiid[0] if extoaisrc: extoaisrc = extoaisrc[0] else: extoaisrc = None write_message(" -Checking if EXTOAIID %s (%s) exists in the database" % (extoaiid, extoaisrc), verbose=9) # try to find the corresponding rec id from the database try: rec_ids = find_records_from_extoaiid(extoaiid, extoaisrc) except Error, e: write_message(e, verbose=1, stream=sys.stderr) return -1 if rec_ids: # rec_id found rec_id = rec_ids.pop() break else: # The record doesn't exist yet. We will try to check # OAI id later. write_message(" -Tag EXTOAIID value not found in database.", verbose=9) rec_id = None else: write_message(" -Tag EXTOAIID not found in the xml marc file.", verbose=9) if rec_id is None: # 4th step we look for the OAI ID oaiidvalues = record_get_field_values(record, CFG_OAI_ID_FIELD[0:3], CFG_OAI_ID_FIELD[3:4] != "_" and \ CFG_OAI_ID_FIELD[3:4] or "", CFG_OAI_ID_FIELD[4:5] != "_" and \ CFG_OAI_ID_FIELD[4:5] or "", CFG_OAI_ID_FIELD[5:6]) if oaiidvalues: oaiid = oaiidvalues[0] # there should be only one OAI ID write_message(" -Check if local OAI ID " + oaiid + \ " exist in the database", verbose=9) # try to find the corresponding rec id from the database rec_id = find_record_from_oaiid(oaiid) if rec_id is not None: # rec_id found pass else: write_message(" -Tag OAI ID value not found in database.", verbose=9) rec_id = None else: write_message(" -Tag SYSNO not found in the xml marc file.", verbose=9) # Now we should have detected rec_id from SYSNO or OAIID # tags. (None otherwise.) if rec_id: if opt_mode == 'insert': write_message(" Failed : Record found in the database," \ " you should use the option replace," \ " correct or append to replace an existing" \ " record. (-h for help)", verbose=1, stream=sys.stderr) return -1 else: if opt_mode != 'insert' and \ opt_mode != 'replace_or_insert': write_message(" Failed : Record not found in the database."\ " Please insert the file before updating it."\ " (-h for help)", verbose=1, stream=sys.stderr) return -1 return rec_id and int(rec_id) or None ### Insert functions def create_new_record(): """Create new record in the database""" now = convert_datestruct_to_datetext(time.localtime()) query = """INSERT INTO bibrec (creation_date, modification_date) VALUES (%s, %s)""" params = (now, now) try: rec_id = run_sql(query, params) return rec_id except Error, error: write_message(" Error during the creation_new_record function : %s " % error, verbose=1, stream=sys.stderr) return None def insert_bibfmt(id_bibrec, marc, format, modification_date='1970-01-01 00:00:00'): """Insert the format in the table bibfmt""" # compress the marc value pickled_marc = compress(marc) try: time.strptime(modification_date, "%Y-%m-%d %H:%M:%S") except ValueError: modification_date = '1970-01-01 00:00:00' query = """INSERT INTO bibfmt (id_bibrec, format, last_updated, value) VALUES (%s, %s, %s, %s)""" try: row_id = run_sql(query, (id_bibrec, format, modification_date, pickled_marc)) return row_id except Error, error: write_message(" Error during the insert_bibfmt function : %s " % error, verbose=1, stream=sys.stderr) return None def insert_record_bibxxx(tag, value): """Insert the record into bibxxx""" # determine into which table one should insert the record table_name = 'bib'+tag[0:2]+'x' # check if the tag, value combination exists in the table query = """SELECT id,value FROM %s """ % table_name query += """ WHERE tag=%s AND value=%s""" params = (tag, value) try: res = run_sql(query, params) except Error, error: write_message(" Error during the insert_record_bibxxx function : %s " % error, verbose=1, stream=sys.stderr) # Note: compare now the found values one by one and look for # string binary equality (e.g. to respect lowercase/uppercase # match), regardless of the charset etc settings. Ideally we # could use a BINARY operator in the above SELECT statement, but # we would have to check compatibility on various MySQLdb versions # etc; this approach checks all matched values in Python, not in # MySQL, which is less cool, but more conservative, so it should # work better on most setups. for row in res: row_id = row[0] row_value = row[1] if row_value == value: return (table_name, row_id) # We got here only when the tag,value combination was not found, # so it is now necessary to insert the tag,value combination into # bibxxx table as new. query = """INSERT INTO %s """ % table_name query += """ (tag, value) values (%s , %s)""" params = (tag, value) try: row_id = run_sql(query, params) except Error, error: write_message(" Error during the insert_record_bibxxx function : %s " % error, verbose=1, stream=sys.stderr) return (table_name, row_id) def insert_record_bibrec_bibxxx(table_name, id_bibxxx, field_number, id_bibrec): """Insert the record into bibrec_bibxxx""" # determine into which table one should insert the record full_table_name = 'bibrec_'+ table_name # insert the proper row into the table query = """INSERT INTO %s """ % full_table_name query += """(id_bibrec,id_bibxxx, field_number) values (%s , %s, %s)""" params = (id_bibrec, id_bibxxx, field_number) try: res = run_sql(query, params) except Error, error: write_message(" Error during the insert_record_bibrec_bibxxx" " function 2nd query : %s " % error, verbose=1, stream=sys.stderr) return res def synchronize_8564(rec_id, record, record_had_FFT): """ Synchronize 8564_ tags and BibDocFile tables. This function directly manipulate the record parameter. @type rec_id: positive integer @param rec_id: the record identifier. @param record: the record structure as created by bibrecord.create_record @type record_had_FFT: boolean @param record_had_FFT: True if the incoming bibuploaded-record used FFT @return: the manipulated record (which is also modified as a side effect) """ def merge_marc_into_bibdocfile(field): """ Internal function that reads a single field and store its content in BibDocFile tables. @param field: the 8564_ field containing a BibDocFile URL. """ write_message('Merging field: %s' % (field, ), verbose=9) url = field_get_subfield_values(field, 'u')[:1] or field_get_subfield_values(field, 'q')[:1] description = field_get_subfield_values(field, 'y')[:1] comment = field_get_subfield_values(field, 'z')[:1] if url: recid, docname, format = decompose_bibdocfile_url(url[0]) if recid != rec_id: write_message("INFO: URL %s is not pointing to a fulltext owned by this record (%s)" % (url, recid), stream=sys.stderr) else: try: bibdoc = BibRecDocs(recid).get_bibdoc(docname) if description: bibdoc.set_description(description[0], format) if comment: bibdoc.set_comment(comment[0], format) except InvenioWebSubmitFileError: ## Apparently the referenced docname doesn't exist anymore. ## Too bad. Let's skip it. write_message("WARNING: docname %s doesn't exist for record %s. Has it been renamed outside FFT?" % (docname, recid), stream=sys.stderr) def merge_bibdocfile_into_marc(field, subfields): """ Internal function that reads BibDocFile table entries referenced by the URL in the given 8564_ field and integrate the given information directly with the provided subfields. @param field: the 8564_ field containing a BibDocFile URL. @param subfields: the subfields corresponding to the BibDocFile URL generated after BibDocFile tables. """ write_message('Merging subfields %s into field %s' % (subfields, field), verbose=9) subfields = dict(subfields) ## We make a copy not to have side-effects subfield_to_delete = [] for subfield_position, (code, value) in enumerate(field_get_subfield_instances(field)): ## For each subfield instance already existing... if code in subfields: ## ...We substitute it with what is in BibDocFile tables record_modify_subfield(record, '856', code, subfields[code], subfield_position, field_position_global=field[4]) del subfields[code] else: ## ...We delete it otherwise subfield_to_delete.append(subfield_position) subfield_to_delete.sort() for counter, position in enumerate(subfield_to_delete): ## FIXME: Very hackish algorithm. Since deleting a subfield ## will alterate the position of following subfields, we ## are taking note of this and adjusting further position ## by using a counter. record_delete_subfield_from(record, '856', position - counter, field_position_global=field[4]) subfields = subfields.items() subfields.sort() for code, value in subfields: ## Let's add non-previously existing subfields record_add_subfield_into(record, '856', code, value, field_position_global=field[4]) def get_bibdocfile_managed_info(): """ Internal function to eturns a dictionary of BibDocFile URL -> wanna-be subfields. @rtype: mapping @return: BibDocFile URL -> wanna-be subfields dictionary """ ret = {} bibrecdocs = BibRecDocs(rec_id) - latest_files = bibrecdocs.list_latest_files() + latest_files = bibrecdocs.list_latest_files(list_hidden=False) for afile in latest_files: url = afile.get_url() ret[url] = {'u' : url} description = afile.get_description() comment = afile.get_comment() + subformat = afile.get_subformat() if description: ret[url]['y'] = description if comment: ret[url]['z'] = comment + if subformat: + ret[url]['x'] = subformat - for bibdoc in bibrecdocs.list_bibdocs(): - icon = bibdoc.get_icon() - if icon: - icon = icon.list_all_files() - if icon: - url = icon[0].get_url() - ret[url] = {'q' : url, 'x' : 'icon'} return ret write_message("Synchronizing MARC of recid '%s' with:\n%s" % (rec_id, record), verbose=9) - tags8564s = record_get_field_instances(record, '856', '4', ' ') - write_message("Original 8564_ instances: %s" % tags8564s, verbose=9) + tags856s = record_get_field_instances(record, '856', '%', '%') + write_message("Original 856%% instances: %s" % tags856s, verbose=9) tags8564s_to_add = get_bibdocfile_managed_info() write_message("BibDocFile instances: %s" % tags8564s_to_add, verbose=9) positions_tags8564s_to_remove = [] - for local_position, field in enumerate(tags8564s): - for url in field_get_subfield_values(field, 'u') + field_get_subfield_values(field, 'q'): - if url in tags8564s_to_add: - if record_had_FFT: - merge_bibdocfile_into_marc(field, tags8564s_to_add[url]) - else: - merge_marc_into_bibdocfile(field) - del tags8564s_to_add[url] - break - elif bibdocfile_url_p(url) and decompose_bibdocfile_url(url)[0] == rec_id: - positions_tags8564s_to_remove.append(local_position) - break + for local_position, field in enumerate(tags856s): + if field[1] == '4' and field[2] == ' ': + write_message('Analysing %s' % (field, ), verbose=9) + for url in field_get_subfield_values(field, 'u') + field_get_subfield_values(field, 'q'): + if url in tags8564s_to_add: + if record_had_FFT: + merge_bibdocfile_into_marc(field, tags8564s_to_add[url]) + else: + merge_marc_into_bibdocfile(field) + del tags8564s_to_add[url] + break + elif bibdocfile_url_p(url) and decompose_bibdocfile_url(url)[0] == rec_id: + positions_tags8564s_to_remove.append(local_position) + write_message("%s to be deleted and re-synchronized" % (field, ), verbose=9) + break record_delete_fields(record, '856', positions_tags8564s_to_remove) tags8564s_to_add = tags8564s_to_add.values() tags8564s_to_add.sort() for subfields in tags8564s_to_add: subfields = subfields.items() subfields.sort() record_add_field(record, '856', '4', ' ', subfields=subfields) write_message('Final record: %s' % record, verbose=9) return record def elaborate_fft_tags(record, rec_id, mode): """ Process FFT tags that should contain $a with file pathes or URLs to get the fulltext from. This function enriches record with proper 8564 URL tags, downloads fulltext files and stores them into var/data structure where appropriate. CFG_BIBUPLOAD_WGET_SLEEP_TIME defines time to sleep in seconds in between URL downloads. Note: if an FFT tag contains multiple $a subfields, we upload them into different 856 URL tags in the metadata. See regression test case test_multiple_fft_insert_via_http(). """ # Let's define some handy sub procedure. - def _add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment): + def _add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment, flags): """Adds a new format for a given bibdoc. Returns True when everything's fine.""" - write_message('Add new format to %s url: %s, format: %s, docname: %s, doctype: %s, newname: %s, description: %s, comment: %s' % (repr(bibdoc), url, format, docname, doctype, newname, description, comment), verbose=9) + write_message('Add new format to %s url: %s, format: %s, docname: %s, doctype: %s, newname: %s, description: %s, comment: %s, flags: %s' % (repr(bibdoc), url, format, docname, doctype, newname, description, comment, flags), verbose=9) try: if not url: # Not requesting a new url. Just updating comment & description - return _update_description_and_comment(bibdoc, docname, format, description, comment) + return _update_description_and_comment(bibdoc, docname, format, description, comment, flags) tmpurl = download_url(url, format) try: try: - bibdoc.add_file_new_format(tmpurl, description=description, comment=comment) + bibdoc.add_file_new_format(tmpurl, description=description, comment=comment, flags=flags) except StandardError, e: - write_message("('%s', '%s', '%s', '%s', '%s', '%s', '%s') not inserted because format already exists (%s)." % (url, format, docname, doctype, newname, description, comment, e), stream=sys.stderr) + write_message("('%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s') not inserted because format already exists (%s)." % (url, format, docname, doctype, newname, description, comment, flags, e), stream=sys.stderr) raise finally: os.remove(tmpurl) except Exception, e: write_message("Error in downloading '%s' because of: %s" % (url, e), stream=sys.stderr) raise return True - def _add_new_version(bibdoc, url, format, docname, doctype, newname, description, comment): + def _add_new_version(bibdoc, url, format, docname, doctype, newname, description, comment, flags): """Adds a new version for a given bibdoc. Returns True when everything's fine.""" - write_message('Add new version to %s url: %s, format: %s, docname: %s, doctype: %s, newname: %s, description: %s, comment: %s' % (repr(bibdoc), url, format, docname, doctype, newname, description, comment)) + write_message('Add new version to %s url: %s, format: %s, docname: %s, doctype: %s, newname: %s, description: %s, comment: %s, flags: %s' % (repr(bibdoc), url, format, docname, doctype, newname, description, comment, flags)) try: if not url: - return _update_description_and_comment(bibdoc, docname, format, description, comment) + return _update_description_and_comment(bibdoc, docname, format, description, comment, flags) tmpurl = download_url(url, format) try: try: - bibdoc.add_file_new_version(tmpurl, description=description, comment=comment) + bibdoc.add_file_new_version(tmpurl, description=description, comment=comment, flags=flags) except StandardError, e: - write_message("('%s', '%s', '%s', '%s', '%s', '%s', '%s') not inserted because '%s'." % (url, format, docname, doctype, newname, description, comment, e), stream=sys.stderr) + write_message("('%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s') not inserted because '%s'." % (url, format, docname, doctype, newname, description, comment, flags, e), stream=sys.stderr) raise finally: os.remove(tmpurl) except Exception, e: write_message("Error in downloading '%s' because of: %s" % (url, e), stream=sys.stderr) raise return True - def _update_description_and_comment(bibdoc, docname, format, description, comment): + def _update_description_and_comment(bibdoc, docname, format, description, comment, flags): """Directly update comments and descriptions.""" - write_message('Just updating description and comment for %s with format %s with description %s and comment %s' % (docname, format, description, comment), verbose=9) + write_message('Just updating description and comment for %s with format %s with description %s, comment %s and flags %s' % (docname, format, description, comment, flags), verbose=9) try: bibdoc.set_description(description, format) bibdoc.set_comment(comment, format) + for flag in CFG_BIBDOCFILE_AVAILABLE_FLAGS: + if flag in flags: + bibdoc.set_flag(flag, format) + else: + bibdoc.unset_flag(flag, format) except StandardError, e: - write_message("('%s', '%s', '%s', '%s') description and comment not updated because '%s'." % (docname, format, description, comment, e)) + write_message("('%s', '%s', '%s', '%s', '%s') description and comment not updated because '%s'." % (docname, format, description, comment, flags, e)) raise return True - def _add_new_icon(bibdoc, url, restriction): - """Adds a new icon to an existing bibdoc, replacing the previous one if it exists. If url is empty, just remove the current icon.""" - if not url: - bibdoc.delete_icon() - else: - try: - path = urllib2.urlparse.urlsplit(url)[2] - filename = os.path.split(path)[-1] - format = filename[len(file_strip_ext(filename)):] - tmpurl = download_url(url, format) - try: - try: - icondoc = bibdoc.add_icon(tmpurl, 'icon-%s' % bibdoc.get_docname()) - if restriction and restriction != KEEP_OLD_VALUE: - icondoc.set_status(restriction) - except StandardError, e: - write_message("('%s', '%s') icon not added because '%s'." % (url, format, e), stream=sys.stderr) - raise - finally: - os.remove(tmpurl) - except Exception, e: - write_message("Error in downloading '%s' because of: %s" % (url, e), stream=sys.stderr) - raise - return True - if mode == 'delete': raise StandardError('FFT tag specified but bibupload executed in --delete mode') tuple_list = extract_tag_from_record(record, 'FFT') if tuple_list: # FFT Tags analysis write_message("FFTs: "+str(tuple_list), verbose=9) docs = {} # docnames and their data for fft in record_get_field_instances(record, 'FFT', ' ', ' '): # Let's discover the type of the document # This is a legacy field and will not be enforced any particular # check on it. doctype = field_get_subfield_values(fft, 't') if doctype: doctype = doctype[0] else: # Default is Main doctype = 'Main' # Let's discover the url. url = field_get_subfield_values(fft, 'a') if url: url = url[0] try: check_valid_url(url) except StandardError, e: - raise StandardError, "fft '%s' specify an url ('%s') with problems: %s" % (fft, url, e) + raise StandardError, "fft '%s' specifies in $a a location ('%s') with problems: %s" % (fft, url, e) else: url = '' # Let's discover the description description = field_get_subfield_values(fft, 'd') if description != []: description = description[0] else: if mode == 'correct' and doctype != 'FIX-MARC': ## If the user require to correct, and do not specify ## a description this means she really want to ## modify the description. description = '' else: description = KEEP_OLD_VALUE # Let's discover the desired docname to be created/altered name = field_get_subfield_values(fft, 'n') if name: - name = file_strip_ext(name[0]) + ## Let's remove undesired extensions + name = file_strip_ext(name[0] + '.pdf') else: if url: name = get_docname_from_url(url) + elif mode != 'correct' and doctype != 'FIX-MARC': + raise StandardError, "Warning: fft '%s' doesn't specifies either a location in $a or a docname in $n" % str(fft) else: - write_message("Warning: fft '%s' doesn't specifies neither a url nor a name" % str(fft), stream=sys.stderr) continue # Let's discover the desired new docname in case we want to change it newname = field_get_subfield_values(fft, 'm') if newname: - newname = file_strip_ext(newname[0]) + newname = file_strip_ext(newname[0] + '.pdf') else: newname = name # Let's discover the desired format format = field_get_subfield_values(fft, 'f') if format: format = format[0] else: if url: - format = get_format_from_url(url) + format = guess_format_from_url(url) else: - format = '' + format = "" format = normalize_format(format) # Let's discover the icon icon = field_get_subfield_values(fft, 'x') if icon != []: icon = icon[0] if icon != KEEP_OLD_VALUE: try: check_valid_url(icon) except StandardError, e: - raise StandardError, "fft '%s' specify an icon ('%s') with problems: %s" % (fft, icon, e) + raise StandardError, "fft '%s' specifies in $x an icon ('%s') with problems: %s" % (fft, icon, e) else: - if mode == 'correct' and doctype != 'FIX-MARC': - ## See comment on description - icon = '' - else: - icon = KEEP_OLD_VALUE + icon = '' # Let's discover the comment comment = field_get_subfield_values(fft, 'z') if comment != []: comment = comment[0] else: if mode == 'correct' and doctype != 'FIX-MARC': ## See comment on description comment = '' else: comment = KEEP_OLD_VALUE # Let's discover the restriction restriction = field_get_subfield_values(fft, 'r') if restriction != []: restriction = restriction[0] else: if mode == 'correct' and doctype != 'FIX-MARC': ## See comment on description restriction = '' else: restriction = KEEP_OLD_VALUE version = field_get_subfield_values(fft, 'v') if version: version = version[0] else: version = '' + flags = field_get_subfield_values(fft, 'o') + for flag in flags: + if flag not in CFG_BIBDOCFILE_AVAILABLE_FLAGS: + raise StandardError, "fft '%s' specifies a non available flag: %s" % (fft, flag) + if docs.has_key(name): # new format considered - (doctype2, newname2, restriction2, icon2, version2, urls) = docs[name] + (doctype2, newname2, restriction2, version2, urls) = docs[name] if doctype2 != doctype: raise StandardError, "fft '%s' specifies a different doctype from previous fft with docname '%s'" % (str(fft), name) if newname2 != newname: raise StandardError, "fft '%s' specifies a different newname from previous fft with docname '%s'" % (str(fft), name) if restriction2 != restriction: raise StandardError, "fft '%s' specifies a different restriction from previous fft with docname '%s'" % (str(fft), name) - if icon2 != icon: - raise StandardError, "fft '%x' specifies a different icon than the previous fft with docname '%s'" % (str(fft), name) if version2 != version: raise StandardError, "fft '%x' specifies a different version than the previous fft with docname '%s'" % (str(fft), name) - for (url2, format2, description2, comment2) in urls: + for (url2, format2, description2, comment2, flags2) in urls: if format == format2: raise StandardError, "fft '%s' specifies a second file '%s' with the same format '%s' from previous fft with docname '%s'" % (str(fft), url, format, name) if url or format: - urls.append((url, format, description, comment)) + urls.append((url, format, description, comment, flags)) + if icon: + urls.append((icon, icon[len(file_strip_ext(icon)):] + ';icon', description, comment, flags)) else: if url or format: - docs[name] = (doctype, newname, restriction, icon, version, [(url, format, description, comment)]) + docs[name] = (doctype, newname, restriction, version, [(url, format, description, comment, flags)]) + if icon: + docs[name][4].append((icon, icon[len(file_strip_ext(icon)):] + ';icon', description, comment, flags)) + elif icon: + docs[name] = (doctype, newname, restriction, version, [(icon, icon[len(file_strip_ext(icon)):] + ';icon', description, comment, flags)]) else: - docs[name] = (doctype, newname, restriction, icon, version, []) + docs[name] = (doctype, newname, restriction, version, []) write_message('Result of FFT analysis:\n\tDocs: %s' % (docs,), verbose=9) # Let's remove all FFT tags record_delete_field(record, 'FFT', ' ', ' ') # Preprocessed data elaboration bibrecdocs = BibRecDocs(rec_id) if mode == 'replace': # First we erase previous bibdocs for bibdoc in bibrecdocs.list_bibdocs(): bibdoc.delete() bibrecdocs.build_bibdoc_list() - for docname, (doctype, newname, restriction, icon, version, urls) in docs.iteritems(): - write_message("Elaborating olddocname: '%s', newdocname: '%s', doctype: '%s', restriction: '%s', icon: '%s', urls: '%s', mode: '%s'" % (docname, newname, doctype, restriction, icon, urls, mode), verbose=9) + for docname, (doctype, newname, restriction, version, urls) in docs.iteritems(): + write_message("Elaborating olddocname: '%s', newdocname: '%s', doctype: '%s', restriction: '%s', urls: '%s', mode: '%s'" % (docname, newname, doctype, restriction, urls, mode), verbose=9) if mode in ('insert', 'replace'): # new bibdocs, new docnames, new marc if newname in bibrecdocs.get_bibdoc_names(): write_message("('%s', '%s') not inserted because docname already exists." % (newname, urls), stream=sys.stderr) raise StandardError try: bibdoc = bibrecdocs.add_bibdoc(doctype, newname) bibdoc.set_status(restriction) except Exception, e: write_message("('%s', '%s', '%s') not inserted because: '%s'." % (doctype, newname, urls, e), stream=sys.stderr) raise StandardError - for (url, format, description, comment) in urls: - assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment)) - if icon and not icon == KEEP_OLD_VALUE: - assert(_add_new_icon(bibdoc, icon, restriction)) + for (url, format, description, comment, flags) in urls: + assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment, flags)) elif mode == 'replace_or_insert': # to be thought as correct_or_insert for bibdoc in bibrecdocs.list_bibdocs(): if bibdoc.get_docname() == docname: if doctype not in ('PURGE', 'DELETE', 'EXPUNGE', 'REVERT', 'FIX-ALL', 'FIX-MARC', 'DELETE-FILE'): if newname != docname: try: bibdoc.change_name(newname) - icon = bibdoc.get_icon() - if icon: - icon.change_name('icon-%s' % newname) except StandardError, e: write_message(e, stream=sys.stderr) raise found_bibdoc = False for bibdoc in bibrecdocs.list_bibdocs(): if bibdoc.get_docname() == newname: found_bibdoc = True if doctype == 'PURGE': bibdoc.purge() elif doctype == 'DELETE': bibdoc.delete() elif doctype == 'EXPUNGE': bibdoc.expunge() elif doctype == 'FIX-ALL': bibrecdocs.fix(docname) elif doctype == 'FIX-MARC': pass elif doctype == 'DELETE-FILE': if urls: - for (url, format, description, comment) in urls: + for (url, format, description, comment, flags) in urls: bibdoc.delete_file(format, version) elif doctype == 'REVERT': try: bibdoc.revert(version) except Exception, e: write_message('(%s, %s) not correctly reverted: %s' % (newname, version, e), stream=sys.stderr) raise else: if restriction != KEEP_OLD_VALUE: bibdoc.set_status(restriction) # Since the docname already existed we have to first # bump the version by pushing the first new file # then pushing the other files. if urls: - (first_url, first_format, first_description, first_comment) = urls[0] + (first_url, first_format, first_description, first_comment, first_flags) = urls[0] other_urls = urls[1:] - assert(_add_new_version(bibdoc, first_url, first_format, docname, doctype, newname, first_description, first_comment)) - for (url, format, description, comment) in other_urls: - assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment)) - if icon != KEEP_OLD_VALUE: - assert(_add_new_icon(bibdoc, icon, restriction)) + assert(_add_new_version(bibdoc, first_url, first_format, docname, doctype, newname, first_description, first_comment, first_flags)) + for (url, format, description, comment, flags) in other_urls: + assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment, flags)) if not found_bibdoc: bibdoc = bibrecdocs.add_bibdoc(doctype, newname) - for (url, format, description, comment) in urls: - assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment)) - if icon and not icon == KEEP_OLD_VALUE: - assert(_add_new_icon(bibdoc, icon, restriction)) + for (url, format, description, comment, flags) in urls: + assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment, flags)) elif mode == 'correct': for bibdoc in bibrecdocs.list_bibdocs(): if bibdoc.get_docname() == docname: if doctype not in ('PURGE', 'DELETE', 'EXPUNGE', 'REVERT', 'FIX-ALL', 'FIX-MARC', 'DELETE-FILE'): if newname != docname: try: bibdoc.change_name(newname) - icon = bibdoc.get_icon() - if icon: - icon.change_name('icon-%s' % newname) except StandardError, e: write_message('Error in renaming %s to %s: %s' % (docname, newname, e), stream=sys.stderr) raise found_bibdoc = False for bibdoc in bibrecdocs.list_bibdocs(): if bibdoc.get_docname() == newname: found_bibdoc = True if doctype == 'PURGE': bibdoc.purge() elif doctype == 'DELETE': bibdoc.delete() elif doctype == 'EXPUNGE': bibdoc.expunge() elif doctype == 'FIX-ALL': bibrecdocs.fix(newname) elif doctype == 'FIX-MARC': pass elif doctype == 'DELETE-FILE': if urls: - for (url, format, description, comment) in urls: + for (url, format, description, comment, flags) in urls: bibdoc.delete_file(format, version) elif doctype == 'REVERT': try: bibdoc.revert(version) except Exception, e: write_message('(%s, %s) not correctly reverted: %s' % (newname, version, e), stream=sys.stderr) raise else: if restriction != KEEP_OLD_VALUE: bibdoc.set_status(restriction) if urls: - (first_url, first_format, first_description, first_comment) = urls[0] + (first_url, first_format, first_description, first_comment, first_flags) = urls[0] other_urls = urls[1:] - assert(_add_new_version(bibdoc, first_url, first_format, docname, doctype, newname, first_description, first_comment)) - for (url, format, description, comment) in other_urls: - assert(_add_new_format(bibdoc, url, format, docname, description, doctype, newname, description, comment)) - if icon != KEEP_OLD_VALUE: - _add_new_icon(bibdoc, icon, restriction) + assert(_add_new_version(bibdoc, first_url, first_format, docname, doctype, newname, first_description, first_comment, first_flags)) + for (url, format, description, comment, flags) in other_urls: + assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment, flags)) if not found_bibdoc: if doctype in ('PURGE', 'DELETE', 'EXPUNGE', 'FIX-ALL', 'FIX-MARC', 'DELETE-FILE', 'REVERT'): write_message("('%s', '%s', '%s') not performed because '%s' docname didn't existed." % (doctype, newname, urls, docname), stream=sys.stderr) raise StandardError else: bibdoc = bibrecdocs.add_bibdoc(doctype, newname) for (url, format, description, comment) in urls: - assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment)) - if icon and not icon == KEEP_OLD_VALUE: - assert(_add_new_icon(bibdoc, icon, restriction)) + assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment, flags)) elif mode == 'append': try: found_bibdoc = False for bibdoc in bibrecdocs.list_bibdocs(): if bibdoc.get_docname() == docname: found_bibdoc = True - for (url, format, description, comment) in urls: - assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment)) - if icon not in ('', KEEP_OLD_VALUE): - assert(_add_new_icon(bibdoc, icon, restriction)) + for (url, format, description, comment, flags) in urls: + assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment, flags)) if not found_bibdoc: try: bibdoc = bibrecdocs.add_bibdoc(doctype, docname) bibdoc.set_status(restriction) - for (url, format, description, comment) in urls: - assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment)) - if icon and not icon == KEEP_OLD_VALUE: - assert(_add_new_icon(bibdoc, icon, restriction)) + for (url, format, description, comment, flags) in urls: + assert(_add_new_format(bibdoc, url, format, docname, doctype, newname, description, comment, flags)) except Exception, e: register_exception() write_message("('%s', '%s', '%s') not appended because: '%s'." % (doctype, newname, urls, e), stream=sys.stderr) raise except: register_exception() raise return record def insert_fmt_tags(record, rec_id, opt_mode): """Process and insert FMT tags""" fmt_fields = record_get_field_instances(record, 'FMT') if fmt_fields: for fmt_field in fmt_fields: # Get the d, f, g subfields of the FMT tag try: d_value = field_get_subfield_values(fmt_field, "d")[0] except IndexError: d_value = "" try: f_value = field_get_subfield_values(fmt_field, "f")[0] except IndexError: f_value = "" try: g_value = field_get_subfield_values(fmt_field, "g")[0] except IndexError: g_value = "" # Update the format res = update_bibfmt_format(rec_id, g_value, f_value, d_value) if res == 1: write_message(" Failed: Error during update_bibfmt", verbose=1, stream=sys.stderr) # If we are in format mode, we only care about the FMT tag if opt_mode == 'format': return 0 # We delete the FMT Tag of the record record_delete_field(record, 'FMT') write_message(" -Delete field FMT from record : DONE", verbose=2) return record elif opt_mode == 'format': write_message(" Failed: Format updated failed : No tag FMT found", verbose=1, stream=sys.stderr) return None else: return record ### Update functions def update_bibrec_modif_date(now, bibrec_id): """Update the date of the record in bibrec table """ query = """UPDATE bibrec SET modification_date=%s WHERE id=%s""" params = (now, bibrec_id) try: run_sql(query, params) write_message(" -Update record modification date : DONE" , verbose=2) except Error, error: write_message(" Error during update_bibrec_modif_date function : %s" % error, verbose=1, stream=sys.stderr) def update_bibfmt_format(id_bibrec, format_value, format_name, modification_date=None): """Update the format in the table bibfmt""" if modification_date is None: modification_date = time.strftime('%Y-%m-%d %H:%M:%S') else: try: time.strptime(modification_date, "%Y-%m-%d %H:%M:%S") except ValueError: modification_date = '1970-01-01 00:00:00' # We check if the format is already in bibFmt nb_found = find_record_format(id_bibrec, format_name) if nb_found == 1: # we are going to update the format # compress the format_value value pickled_format_value = compress(format_value) # update the format: query = """UPDATE bibfmt SET last_updated=%s, value=%s WHERE id_bibrec=%s AND format=%s""" params = (modification_date, pickled_format_value, id_bibrec, format_name) try: row_id = run_sql(query, params) if row_id is None: write_message(" Failed: Error during update_bibfmt_format function", verbose=1, stream=sys.stderr) return 1 else: write_message(" -Update the format %s in bibfmt : DONE" % format_name , verbose=2) return 0 except Error, error: write_message(" Error during the update_bibfmt_format function : %s " % error, verbose=1, stream=sys.stderr) elif nb_found > 1: write_message(" Failed: Same format %s found several time in bibfmt for the same record." % format_name, verbose=1, stream=sys.stderr) return 1 else: # Insert the format information in BibFMT res = insert_bibfmt(id_bibrec, format_value, format_name, modification_date) if res is None: write_message(" Failed: Error during insert_bibfmt", verbose=1, stream=sys.stderr) return 1 else: write_message(" -Insert the format %s in bibfmt : DONE" % format_name , verbose=2) return 0 def archive_marcxml_for_history(recID): """ Archive current MARCXML format of record RECID from BIBFMT table into hstRECORD table. Useful to keep MARCXML history of records. Return 0 if everything went fine. Return 1 otherwise. """ try: res = run_sql("SELECT id_bibrec, value, last_updated FROM bibfmt WHERE format='xm' AND id_bibrec=%s", (recID,)) if res: run_sql("""INSERT INTO hstRECORD (id_bibrec, marcxml, job_id, job_name, job_person, job_date, job_details) VALUES (%s,%s,%s,%s,%s,%s,%s)""", (res[0][0], res[0][1], task_get_task_param('task_id', 0), 'bibupload', task_get_task_param('user','UNKNOWN'), res[0][2], 'mode: ' + task_get_option('mode','UNKNOWN') + '; file: ' + task_get_option('file_path','UNKNOWN') + '.')) except Error, error: write_message(" Error during archive_marcxml_for_history: %s " % error, verbose=1, stream=sys.stderr) return 1 return 0 def update_database_with_metadata(record, rec_id, oai_rec_id = "oai"): """Update the database tables with the record and the record id given in parameter""" for tag in record.keys(): # check if tag is not a special one: if tag not in CFG_BIBUPLOAD_SPECIAL_TAGS: # for each tag there is a list of tuples representing datafields tuple_list = record[tag] # this list should contain the elements of a full tag [tag, ind1, ind2, subfield_code] tag_list = [] tag_list.append(tag) for single_tuple in tuple_list: # these are the contents of a single tuple subfield_list = single_tuple[0] ind1 = single_tuple[1] ind2 = single_tuple[2] # append the ind's to the full tag if ind1 == '' or ind1 == ' ': tag_list.append('_') else: tag_list.append(ind1) if ind2 == '' or ind2 == ' ': tag_list.append('_') else: tag_list.append(ind2) datafield_number = single_tuple[4] if tag in CFG_BIBUPLOAD_SPECIAL_TAGS: # nothing to do for special tags (FFT, FMT) pass elif tag in CFG_BIBUPLOAD_CONTROLFIELD_TAGS and tag != "001": value = single_tuple[3] # get the full tag full_tag = ''.join(tag_list) # update the tables write_message(" insertion of the tag "+full_tag+" with the value "+value, verbose=9) # insert the tag and value into into bibxxx (table_name, bibxxx_row_id) = insert_record_bibxxx(full_tag, value) #print 'tname, bibrow', table_name, bibxxx_row_id; if table_name is None or bibxxx_row_id is None: write_message(" Failed : during insert_record_bibxxx", verbose=1, stream=sys.stderr) # connect bibxxx and bibrec with the table bibrec_bibxxx res = insert_record_bibrec_bibxxx(table_name, bibxxx_row_id, datafield_number, rec_id) if res is None: write_message(" Failed : during insert_record_bibrec_bibxxx", verbose=1, stream=sys.stderr) else: # get the tag and value from the content of each subfield for subfield in subfield_list: subtag = subfield[0] value = subfield[1] tag_list.append(subtag) # get the full tag full_tag = ''.join(tag_list) # update the tables write_message(" insertion of the tag "+full_tag+" with the value "+value, verbose=9) # insert the tag and value into into bibxxx (table_name, bibxxx_row_id) = insert_record_bibxxx(full_tag, value) if table_name is None or bibxxx_row_id is None: write_message(" Failed : during insert_record_bibxxx", verbose=1, stream=sys.stderr) # connect bibxxx and bibrec with the table bibrec_bibxxx res = insert_record_bibrec_bibxxx(table_name, bibxxx_row_id, datafield_number, rec_id) if res is None: write_message(" Failed : during insert_record_bibrec_bibxxx", verbose=1, stream=sys.stderr) # remove the subtag from the list tag_list.pop() tag_list.pop() tag_list.pop() tag_list.pop() write_message(" -Update the database with metadata : DONE", verbose=2) log_record_uploading(oai_rec_id, task_get_task_param('task_id', 0), rec_id, 'P') def append_new_tag_to_old_record(record, rec_old, opt_tag, opt_mode): """Append new tags to a old record""" def _append_tag(tag): # Reference mode append only reference tag if opt_mode == 'reference': if tag == CFG_BIBUPLOAD_REFERENCE_TAG: for single_tuple in record[tag]: # We retrieve the information of the tag subfield_list = single_tuple[0] ind1 = single_tuple[1] ind2 = single_tuple[2] # We add the datafield to the old record write_message(" Adding tag: %s ind1=%s ind2=%s code=%s" % (tag, ind1, ind2, subfield_list), verbose=9) newfield_number = record_add_field(rec_old, tag, ind1, ind2, subfields=subfield_list) if newfield_number is None: write_message(" Error when adding the field"+tag, verbose=1, stream=sys.stderr) else: if tag in CFG_BIBUPLOAD_CONTROLFIELD_TAGS: if tag == '001': pass else: # if it is a controlfield,just access the value for single_tuple in record[tag]: controlfield_value = single_tuple[3] # add the field to the old record newfield_number = record_add_field(rec_old, tag, controlfield_value=controlfield_value) if newfield_number is None: write_message(" Error when adding the field"+tag, verbose=1, stream=sys.stderr) else: # For each tag there is a list of tuples representing datafields for single_tuple in record[tag]: # We retrieve the information of the tag subfield_list = single_tuple[0] ind1 = single_tuple[1] ind2 = single_tuple[2] if '%s%s%s' % (tag, ind1 == ' ' and '_' or ind1, ind2 == ' ' and '_' or ind2) in (CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[:5], CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[:5]): ## We don't want to append the external identifier ## if it is already existing. if record_find_field(rec_old, tag, single_tuple)[0] is not None: write_message(" Not adding tag: %s ind1=%s ind2=%s subfields=%s: it's already there" % (tag, ind1, ind2, subfield_list), verbose=9) continue # We add the datafield to the old record write_message(" Adding tag: %s ind1=%s ind2=%s subfields=%s" % (tag, ind1, ind2, subfield_list), verbose=9) newfield_number = record_add_field(rec_old, tag, ind1, ind2, subfields=subfield_list) if newfield_number is None: write_message(" Error when adding the field"+tag, verbose=1, stream=sys.stderr) if opt_tag is not None: _append_tag(opt_tag) else: # Go through each tag in the appended record for tag in record: _append_tag(tag) return rec_old def copy_strong_tags_from_old_record(record, rec_old): """ Look for strong tags in RECORD and REC_OLD. If no strong tags are found in RECORD, then copy them over from REC_OLD. This function modifies RECORD structure on the spot. """ for strong_tag in CFG_BIBUPLOAD_STRONG_TAGS: if not record_get_field_instances(record, strong_tag): strong_tag_old_field_instances = record_get_field_instances(rec_old, strong_tag) if strong_tag_old_field_instances: for strong_tag_old_field_instance in strong_tag_old_field_instances: sf_vals, fi_ind1, fi_ind2, controlfield, dummy = strong_tag_old_field_instance record_add_field(record, strong_tag, fi_ind1, fi_ind2, controlfield, sf_vals) return ### Delete functions def delete_tags(record, rec_old): """ Returns a record structure with all the fields in rec_old minus the fields in record. @param record: The record containing tags to delete. @type record: record structure @param rec_old: The original record. @type rec_old: record structure @return: The modified record. @rtype: record structure """ returned_record = copy.deepcopy(rec_old) for tag, fields in record.iteritems(): if tag in ('001', ): continue for field in fields: local_position = record_find_field(returned_record, tag, field)[1] if local_position is not None: record_delete_field(returned_record, tag, field_position_local=local_position) return returned_record def delete_tags_to_correct(record, rec_old, opt_tag): """ Delete tags from REC_OLD which are also existing in RECORD. When deleting, pay attention not only to tags, but also to indicators, so that fields with the same tags but different indicators are not deleted. """ ## Some fields are controlled via provenance information. ## We should re-add saved fields at the end. fields_to_readd = {} for tag in CFG_BIBUPLOAD_CONTROLLED_PROVENANCE_TAGS: if tag[:3] in record: tmp_field_instances = record_get_field_instances(record, tag[:3], tag[3], tag[4]) ## Let's discover the provenance that will be updated provenances_to_update = [] for instance in tmp_field_instances: for code, value in instance[0]: if code == tag[5]: if value not in provenances_to_update: provenances_to_update.append(value) break else: ## The provenance is not specified. ## let's add the special empty provenance. if '' not in provenances_to_update: provenances_to_update.append('') potential_fields_to_readd = record_get_field_instances(rec_old, tag[:3], tag[3], tag[4]) ## Let's take all the field corresponding to tag ## Let's save apart all the fields that should be updated, but ## since they have a different provenance not mentioned in record ## they should be preserved. fields = [] for sf_vals, ind1, ind2, dummy_cf, dummy_line in potential_fields_to_readd: for code, value in sf_vals: if code == tag[5]: if value not in provenances_to_update: fields.append(sf_vals) break else: if '' not in provenances_to_update: ## Empty provenance, let's protect in any case fields.append(sf_vals) fields_to_readd[tag] = fields # browse through all the tags from the MARCXML file: for tag in record: # do we have to delete only a special tag or any tag? if opt_tag is None or opt_tag == tag: # check if the tag exists in the old record too: if tag in rec_old and tag != '001': # the tag does exist, so delete all record's tag+ind1+ind2 combinations from rec_old for dummy_sf_vals, ind1, ind2, dummy_cf, field_number in record[tag]: write_message(" Delete tag: " + tag + " ind1=" + ind1 + " ind2=" + ind2, verbose=9) record_delete_field(rec_old, tag, ind1, ind2) ## Ok, we readd necessary fields! for tag, fields in fields_to_readd.iteritems(): for sf_vals in fields: write_message(" Adding tag: " + tag[:3] + " ind1=" + tag[3] + " ind2=" + tag[4] + " code=" + str(sf_vals), verbose=9) record_add_field(rec_old, tag[:3], tag[3], tag[4], subfields=sf_vals) def delete_bibrec_bibxxx(record, id_bibrec): """Delete the database record from the table bibxxx given in parameters""" # we clear all the rows from bibrec_bibxxx from the old record for tag in record.keys(): if tag not in CFG_BIBUPLOAD_SPECIAL_TAGS: # for each name construct the bibrec_bibxxx table name table_name = 'bibrec_bib'+tag[0:2]+'x' # delete all the records with proper id_bibrec query = """DELETE FROM `%s` where id_bibrec = %s""" params = (table_name, id_bibrec) try: run_sql(query % params) except Error, error: write_message(" Error during the delete_bibrec_bibxxx function : %s " % error, verbose=1, stream=sys.stderr) def wipe_out_record_from_all_tables(recid): """ Wipe out completely the record and all its traces of RECID from the database (bibrec, bibrec_bibxxx, bibxxx, bibfmt). Useful for the time being for test cases. """ # delete all the linked bibdocs for bibdoc in BibRecDocs(recid).list_bibdocs(): bibdoc.expunge() # delete from bibrec: run_sql("DELETE FROM bibrec WHERE id=%s", (recid,)) # delete from bibrec_bibxxx: for i in range(0, 10): for j in range(0, 10): run_sql("DELETE FROM %(bibrec_bibxxx)s WHERE id_bibrec=%%s" % \ {'bibrec_bibxxx': "bibrec_bib%i%ix" % (i, j)}, (recid,)) # delete all unused bibxxx values: for i in range(0, 10): for j in range(0, 10): run_sql("DELETE %(bibxxx)s FROM %(bibxxx)s " \ " LEFT JOIN %(bibrec_bibxxx)s " \ " ON %(bibxxx)s.id=%(bibrec_bibxxx)s.id_bibxxx " \ " WHERE %(bibrec_bibxxx)s.id_bibrec IS NULL" % \ {'bibxxx': "bib%i%ix" % (i, j), 'bibrec_bibxxx': "bibrec_bib%i%ix" % (i, j)}) # delete from bibfmt: run_sql("DELETE FROM bibfmt WHERE id_bibrec=%s", (recid,)) # delete from bibrec_bibdoc: run_sql("DELETE FROM bibrec_bibdoc WHERE id_bibrec=%s", (recid,)) return def delete_bibdoc(id_bibrec): """Delete document from bibdoc which correspond to the bibrec id given in parameter""" query = """UPDATE bibdoc SET status='DELETED' WHERE id IN (SELECT id_bibdoc FROM bibrec_bibdoc WHERE id_bibrec=%s)""" params = (id_bibrec,) try: run_sql(query, params) except Error, error: write_message(" Error during the delete_bibdoc function : %s " % error, verbose=1, stream=sys.stderr) def delete_bibrec_bibdoc(id_bibrec): """Delete the bibrec record from the table bibrec_bibdoc given in parameter""" # delete all the records with proper id_bibrec query = """DELETE FROM bibrec_bibdoc WHERE id_bibrec=%s""" params = (id_bibrec,) try: run_sql(query, params) except Error, error: write_message(" Error during the delete_bibrec_bibdoc function : %s " % error, verbose=1, stream=sys.stderr) def main(): """Main that construct all the bibtask.""" task_init(authorization_action='runbibupload', authorization_msg="BibUpload Task Submission", description="""Receive MARC XML file and update appropriate database tables according to options. Examples: $ bibupload -i input.xml """, help_specific_usage=""" -a, --append\t\tnew fields are appended to the existing record -c, --correct\t\tfields are replaced by the new ones in the existing record -f, --format\t\ttakes only the FMT fields into account. Does not update -i, --insert\t\tinsert the new record in the database -r, --replace\t\tthe existing record is entirely replaced by the new one -z, --reference\tupdate references (update only 999 fields) -d, --delete\t\tspecified fields are deleted in existing record -S, --stage=STAGE\tstage to start from in the algorithm (0: always done; 1: FMT tags; \t\t\t2: FFT tags; 3: BibFmt; 4: Metadata update; 5: time update) -n, --notimechange\tdo not change record last modification date when updating -o, --holdingpen\t\tInsert record into holding pen instead of the normal database """, version=__revision__, specific_params=("ircazdS:fno", [ "insert", "replace", "correct", "append", "reference", "delete", "stage=", "format", "notimechange", "holdingpen", ]), task_submit_elaborate_specific_parameter_fnc=task_submit_elaborate_specific_parameter, task_run_fnc=task_run_core) def task_submit_elaborate_specific_parameter(key, value, opts, args): """ Given the string key it checks it's meaning, eventually using the value. Usually it fills some key in the options dict. It must return True if it has elaborated the key, False, if it doesn't know that key. eg: if key in ['-n', '--number']: task_get_option(\1) = value return True return False """ # No time change option if key in ("-n", "--notimechange"): task_set_option('notimechange', 1) # Insert mode option elif key in ("-i", "--insert"): if task_get_option('mode') == 'replace': # if also replace found, then set to replace_or_insert task_set_option('mode', 'replace_or_insert') else: task_set_option('mode', 'insert') fix_argv_paths([args[0]]) task_set_option('file_path', os.path.abspath(args[0])) # Replace mode option elif key in ("-r", "--replace"): if task_get_option('mode') == 'insert': # if also insert found, then set to replace_or_insert task_set_option('mode', 'replace_or_insert') else: task_set_option('mode', 'replace') fix_argv_paths([args[0]]) task_set_option('file_path', os.path.abspath(args[0])) # Holding pen mode option elif key in ("-o", "--holdingpen"): write_message("Holding pen mode", verbose=3) task_set_option('mode', 'holdingpen') fix_argv_paths([args[0]]) task_set_option('file_path', os.path.abspath(args[0])) # Correct mode option elif key in ("-c", "--correct"): task_set_option('mode', 'correct') fix_argv_paths([args[0]]) task_set_option('file_path', os.path.abspath(args[0])) # Append mode option elif key in ("-a", "--append"): task_set_option('mode', 'append') fix_argv_paths([args[0]]) task_set_option('file_path', os.path.abspath(args[0])) # Reference mode option elif key in ("-z", "--reference"): task_set_option('mode', 'reference') fix_argv_paths([args[0]]) task_set_option('file_path', os.path.abspath(args[0])) elif key in ("-d", "--delete"): task_set_option('mode', 'delete') fix_argv_paths([args[0]]) task_set_option('file_path', os.path.abspath(args[0])) # Format mode option elif key in ("-f", "--format"): task_set_option('mode', 'format') fix_argv_paths([args[0]]) task_set_option('file_path', os.path.abspath(args[0])) # Stage elif key in ("-S", "--stage"): try: value = int(value) except ValueError: print >> sys.stderr, """The value specified for --stage must be a valid integer, not %s""" % value return False if not (0 <= value <= 5): print >> sys.stderr, """The value specified for --stage must be comprised between 0 and 5""" return False task_set_option('stage_to_start_from', value) else: return False return True def task_submit_check_options(): """ Reimplement this method for having the possibility to check options before submitting the task, in order for example to provide default values. It must return False if there are errors in the options. """ if task_get_option('mode') is None: write_message("Please specify at least one update/insert mode!") return False if task_get_option('file_path') is None: write_message("Missing filename! -h for help.") return False return True def writing_rights_p(): """Return True in case bibupload has the proper rights to write in the fulltext file folder.""" filename = os.path.join(CFG_WEBSUBMIT_FILEDIR, 'test.txt') try: if not os.path.exists(CFG_WEBSUBMIT_FILEDIR): os.makedirs(CFG_WEBSUBMIT_FILEDIR) open(filename, 'w').write('TEST') assert(open(filename).read() == 'TEST') os.remove(filename) except: register_exception() return False return True def task_run_core(): """ Reimplement to add the body of the task.""" error = 0 write_message("Input file '%s', input mode '%s'." % (task_get_option('file_path'), task_get_option('mode'))) write_message("STAGE 0:", verbose=2) if task_get_option('file_path') is not None: write_message("start preocessing", verbose=3) task_update_progress("Reading XML input") recs = xml_marc_to_records(open_marc_file(task_get_option('file_path'))) stat['nb_records_to_upload'] = len(recs) write_message(" -Open XML marc: DONE", verbose=2) task_sleep_now_if_required(can_stop_too=True) write_message("Entering records loop", verbose=3) if recs is not None: # We proceed each record by record for record in recs: record_id = record_extract_oai_id(record) task_sleep_now_if_required(can_stop_too=True) if task_get_option("mode") == "holdingpen": #inserting into the holding pen write_message("Inserting into holding pen", verbose=3) insert_record_into_holding_pen(record, record_id) else: write_message("Inserting into main database", verbose=3) error = bibupload( record, opt_tag=task_get_option('tag'), opt_mode=task_get_option('mode'), opt_stage_to_start_from=task_get_option('stage_to_start_from'), opt_notimechange=task_get_option('notimechange'), oai_rec_id = record_id) if error[0] == 1: if record: write_message(record_xml_output(record), stream=sys.stderr) else: write_message("Record could not have been parsed", stream=sys.stderr) stat['nb_errors'] += 1 elif error[0] == 2: if record: write_message(record_xml_output(record), stream=sys.stderr) else: write_message("Record could not have been parsed", stream=sys.stderr) task_update_progress("Done %d out of %d." % \ (stat['nb_records_inserted'] + \ stat['nb_records_updated'], stat['nb_records_to_upload'])) else: write_message(" Error bibupload failed: No record found", verbose=1, stream=sys.stderr) if task_get_task_param('verbose') >= 1: # Print out the statistics print_out_bibupload_statistics() # Check if they were errors return not stat['nb_errors'] >= 1 def log_record_uploading(oai_rec_id, task_id, bibrec_id, insertion_db): if oai_rec_id != "" and oai_rec_id != None: query = """UPDATE oaiHARVESTLOG SET date_inserted=NOW(), inserted_to_db=%s, id_bibrec=%s WHERE oai_id = %s AND bibupload_task_id = %s ORDER BY date_harvested LIMIT 1""" try: run_sql(query, (str(insertion_db), str(bibrec_id), str(oai_rec_id), str(task_id), )) except Error, error: write_message(" Error during the log_record_uploading function : %s " % error, verbose=1, stream=sys.stderr) if __name__ == "__main__": main() diff --git a/modules/bibupload/lib/bibupload_regression_tests.py b/modules/bibupload/lib/bibupload_regression_tests.py index 0f7593dba..9e365ddb2 100644 --- a/modules/bibupload/lib/bibupload_regression_tests.py +++ b/modules/bibupload/lib/bibupload_regression_tests.py @@ -1,3445 +1,3523 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. # pylint: disable-msg=C0301 """Regression tests for the BibUpload.""" __revision__ = "$Id$" import re import unittest import datetime import os import time import sys -from urllib2 import urlopen +from urllib2 import urlopen, HTTPError if sys.hexversion < 0x2060000: from md5 import md5 else: from hashlib import md5 from invenio.config import CFG_OAI_ID_FIELD, CFG_PREFIX, CFG_SITE_URL, CFG_TMPDIR, \ CFG_WEBSUBMIT_FILEDIR, \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG, \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG, \ CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG from invenio import bibupload from invenio.search_engine import print_record from invenio.dbquery import run_sql from invenio.dateutils import convert_datestruct_to_datetext from invenio.testutils import make_test_suite, run_test_suite from invenio.bibdocfile import BibRecDocs from invenio.bibtask import task_set_task_param # helper functions: def remove_tag_001_from_xmbuffer(xmbuffer): """Remove tag 001 from MARCXML buffer. Useful for testing two MARCXML buffers without paying attention to recIDs attributed during the bibupload. """ return re.sub(r'.*', '', xmbuffer) def compare_xmbuffers(xmbuffer1, xmbuffer2): """Compare two XM (XML MARC) buffers by removing whitespaces before testing. """ def remove_blanks_from_xmbuffer(xmbuffer): """Remove \n and blanks from XMBUFFER.""" out = xmbuffer.replace("\n", "") out = out.replace(" ", "") return out # remove whitespace: xmbuffer1 = remove_blanks_from_xmbuffer(xmbuffer1) xmbuffer2 = remove_blanks_from_xmbuffer(xmbuffer2) if xmbuffer1 != xmbuffer2: return "\n=" + xmbuffer1 + "=\n" + '!=' + "\n=" + xmbuffer2 + "=\n" return '' def remove_tag_001_from_hmbuffer(hmbuffer): """Remove tag 001 from HTML MARC buffer. Useful for testing two HTML MARC buffers without paying attention to recIDs attributed during the bibupload. """ return re.sub(r'(^|\n)(

    )?[0-9]{9}\s001__\s\d+($|\n)', '', hmbuffer)
     
     def compare_hmbuffers(hmbuffer1, hmbuffer2):
         """Compare two HM (HTML MARC) buffers by removing whitespaces
            before testing.
         """
     
         hmbuffer1 = hmbuffer1.strip()
         hmbuffer2 = hmbuffer2.strip()
     
         # remove eventual 
    ...
    formatting: hmbuffer1 = re.sub(r'^
    ', '', hmbuffer1)
         hmbuffer2 = re.sub(r'^
    ', '', hmbuffer2)
         hmbuffer1 = re.sub(r'
    $', '', hmbuffer1) hmbuffer2 = re.sub(r'
    $', '', hmbuffer2) # remove leading recid, leaving only field values: hmbuffer1 = re.sub(r'(^|\n)[0-9]{9}\s', '', hmbuffer1) hmbuffer2 = re.sub(r'(^|\n)[0-9]{9}\s', '', hmbuffer2) # remove leading whitespace: hmbuffer1 = re.sub(r'(^|\n)\s+', '', hmbuffer1) hmbuffer2 = re.sub(r'(^|\n)\s+', '', hmbuffer2) compare_hmbuffers = hmbuffer1 == hmbuffer2 if not compare_hmbuffers: return "\n=" + hmbuffer1 + "=\n" + '!=' + "\n=" + hmbuffer2 + "=\n" return '' def try_url_download(url): """Try to download a given URL""" try: open_url = urlopen(url) open_url.read() except Exception, e: - raise StandardError, "Downloading %s is impossible because of %s" \ - % (url, str(e)) + raise StandardError("Downloading %s is impossible because of %s" + % (url, str(e))) return True class BibUploadInsertModeTest(unittest.TestCase): """Testing insert mode.""" def setUp(self): # pylint: disable-msg=C0103 """Initialise the MARCXML variable""" self.test = """ something Tester, J Y MIT Tester, K J CERN2 Tester, G CERN3 test11 test31 test12 test32 test13 test33 test21 test41 test22 test42 test14 test51 test52 Tester, T CERN """ self.test_hm = """ 100__ $$aTester, T$$uCERN 111__ $$atest11$$ctest31 111__ $$atest12$$ctest32 111__ $$atest13$$ctest33 111__ $$btest21$$dtest41 111__ $$btest22$$dtest42 111__ $$atest14 111__ $$etest51 111__ $$etest52 245__ $$asomething 700__ $$aTester, J Y$$uMIT 700__ $$aTester, K J$$uCERN2 700__ $$aTester, G$$uCERN3 """ def test_create_record_id(self): """bibupload - insert mode, trying to create a new record ID in the database""" rec_id = bibupload.create_new_record() self.assertNotEqual(-1, rec_id) def test_no_retrieve_record_id(self): """bibupload - insert mode, detection of record ID in the input file""" # We create create the record out of the xml marc recs = bibupload.xml_marc_to_records(self.test) # We call the function which should retrieve the record id rec_id = bibupload.retrieve_rec_id(recs[0], 'insert') # We compare the value found with None self.assertEqual(None, rec_id) def test_insert_complete_xmlmarc(self): """bibupload - insert mode, trying to insert complete MARCXML file""" # Initialize the global variable task_set_task_param('verbose', 0) # We create create the record out of the xml marc recs = bibupload.xml_marc_to_records(self.test) # We call the main function with the record as a parameter err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # We retrieve the inserted xml inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') # Compare if the two MARCXML are the same self.assertEqual(compare_xmbuffers(remove_tag_001_from_xmbuffer(inserted_xm), self.test), '') self.assertEqual(compare_hmbuffers(remove_tag_001_from_hmbuffer(inserted_hm), self.test_hm), '') class BibUploadAppendModeTest(unittest.TestCase): """Testing append mode.""" def setUp(self): # pylint: disable-msg=C0103 """Initialize the MARCXML variable""" self.test_existing = """ 123456789 Tester, T DESY 0003719PHOPHO """ self.test_to_append = """ 123456789 Tester, U CERN 0003719PHOPHO """ self.test_expected_xm = """ 123456789 Tester, T DESY Tester, U CERN 0003719PHOPHO """ self.test_expected_hm = """ 001__ 123456789 100__ $$aTester, T$$uDESY 100__ $$aTester, U$$uCERN 970__ $$a0003719PHOPHO """ # insert test record: test_to_upload = self.test_existing.replace('123456789', '') recs = bibupload.xml_marc_to_records(test_to_upload) task_set_task_param('verbose', 0) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') self.test_recid = recid # replace test buffers with real recid of inserted test record: self.test_existing = self.test_existing.replace('123456789', str(self.test_recid)) self.test_to_append = self.test_to_append.replace('123456789', str(self.test_recid)) self.test_expected_xm = self.test_expected_xm.replace('123456789', str(self.test_recid)) self.test_expected_hm = self.test_expected_hm.replace('123456789', str(self.test_recid)) def test_retrieve_record_id(self): """bibupload - append mode, the input file should contain a record ID""" task_set_task_param('verbose', 0) # We create create the record out of the xml marc recs = bibupload.xml_marc_to_records(self.test_to_append) # We call the function which should retrieve the record id rec_id = bibupload.retrieve_rec_id(recs[0], 'append') # We compare the value found with None self.assertEqual(self.test_recid, rec_id) # clean up after ourselves: bibupload.wipe_out_record_from_all_tables(self.test_recid) return def test_update_modification_record_date(self): """bibupload - append mode, checking the update of the modification date""" # Initialize the global variable task_set_task_param('verbose', 0) # We create create the record out of the xml marc recs = bibupload.xml_marc_to_records(self.test_existing) # We call the function which should retrieve the record id rec_id = bibupload.retrieve_rec_id(recs[0], opt_mode='append') # Retrieve current localtime now = time.localtime() # We update the modification date bibupload.update_bibrec_modif_date(convert_datestruct_to_datetext(now), rec_id) # We retrieve the modification date from the database query = """SELECT DATE_FORMAT(modification_date,'%%Y-%%m-%%d %%H:%%i:%%s') FROM bibrec where id = %s""" res = run_sql(query % rec_id) # We compare the two results self.assertEqual(res[0][0], convert_datestruct_to_datetext(now)) # clean up after ourselves: bibupload.wipe_out_record_from_all_tables(self.test_recid) return def test_append_complete_xml_marc(self): """bibupload - append mode, appending complete MARCXML file""" # Now we append a datafield # We create create the record out of the xml marc recs = bibupload.xml_marc_to_records(self.test_to_append) # We call the main function with the record as a parameter err, recid = bibupload.bibupload(recs[0], opt_mode='append') # We retrieve the inserted xm after_append_xm = print_record(recid, 'xm') after_append_hm = print_record(recid, 'hm') # Compare if the two MARCXML are the same self.assertEqual(compare_xmbuffers(after_append_xm, self.test_expected_xm), '') self.assertEqual(compare_hmbuffers(after_append_hm, self.test_expected_hm), '') # clean up after ourselves: bibupload.wipe_out_record_from_all_tables(self.test_recid) return class BibUploadCorrectModeTest(unittest.TestCase): """ Testing correcting a record containing similar tags (identical tag, different indicators). Currently CDS Invenio replaces only those tags that have matching indicators too, unlike ALEPH500 that does not pay attention to indicators, it corrects all fields with the same tag, regardless of the indicator values. """ def setUp(self): """Initialize the MARCXML test record.""" self.testrec1_xm = """ 123456789 SzGeCERN Test, Jane Test Institute Test, John Test University Cool Test, Jim Test Laboratory """ self.testrec1_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, Jane$$uTest Institute 10047 $$aTest, John$$uTest University 10048 $$aCool 10047 $$aTest, Jim$$uTest Laboratory """ self.testrec1_xm_to_correct = """ 123456789 Test, Joseph Test Academy Test2, Joseph Test2 Academy """ self.testrec1_corrected_xm = """ 123456789 SzGeCERN Test, Jane Test Institute Cool Test, Joseph Test Academy Test2, Joseph Test2 Academy """ self.testrec1_corrected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, Jane$$uTest Institute 10048 $$aCool 10047 $$aTest, Joseph$$uTest Academy 10047 $$aTest2, Joseph$$uTest2 Academy """ # insert test record: task_set_task_param('verbose', 0) test_record_xm = self.testrec1_xm.replace('123456789', '') recs = bibupload.xml_marc_to_records(test_record_xm) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recID: self.testrec1_xm = self.testrec1_xm.replace('123456789', str(recid)) self.testrec1_hm = self.testrec1_hm.replace('123456789', str(recid)) self.testrec1_xm_to_correct = self.testrec1_xm_to_correct.replace('123456789', str(recid)) self.testrec1_corrected_xm = self.testrec1_corrected_xm.replace('123456789', str(recid)) self.testrec1_corrected_hm = self.testrec1_corrected_hm.replace('123456789', str(recid)) # test of the inserted record: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, self.testrec1_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.testrec1_hm), '') def test_record_correction(self): """bibupload - correct mode, similar MARCXML tags/indicators""" # correct some tags: recs = bibupload.xml_marc_to_records(self.testrec1_xm_to_correct) err, recid = bibupload.bibupload(recs[0], opt_mode='correct') corrected_xm = print_record(recid, 'xm') corrected_hm = print_record(recid, 'hm') # did it work? self.assertEqual(compare_xmbuffers(corrected_xm, self.testrec1_corrected_xm), '') self.assertEqual(compare_hmbuffers(corrected_hm, self.testrec1_corrected_hm), '') # clean up after ourselves: bibupload.wipe_out_record_from_all_tables(recid) return class BibUploadDeleteModeTest(unittest.TestCase): """ Testing deleting specific tags from a record while keeping anything else untouched. Currently CDS Invenio deletes only those tags that have matching indicators too, unlike ALEPH500 that does not pay attention to indicators, it corrects all fields with the same tag, regardless of the indicator values. """ def setUp(self): """Initialize the MARCXML test record.""" self.testrec1_xm = """ 123456789 SzGeCERN Test, Jane Test Institute Test, John Test University Cool Test, Jim Test Laboratory dumb text """ self.testrec1_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, Jane$$uTest Institute 10047 $$aTest, John$$uTest University 10048 $$aCool 10047 $$aTest, Jim$$uTest Laboratory 888__ $$adumb text """ self.testrec1_xm_to_delete = """ 123456789 Test, Jane Test Institute Test, Johnson Test University Cool dumb text """ self.testrec1_corrected_xm = """ 123456789 SzGeCERN Test, John Test University Test, Jim Test Laboratory """ self.testrec1_corrected_hm = """ 001__ 123456789 003__ SzGeCERN 10047 $$aTest, John$$uTest University 10047 $$aTest, Jim$$uTest Laboratory """ # insert test record: task_set_task_param('verbose', 0) test_record_xm = self.testrec1_xm.replace('123456789', '') recs = bibupload.xml_marc_to_records(test_record_xm) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recID: self.testrec1_xm = self.testrec1_xm.replace('123456789', str(recid)) self.testrec1_hm = self.testrec1_hm.replace('123456789', str(recid)) self.testrec1_xm_to_delete = self.testrec1_xm_to_delete.replace('123456789', str(recid)) self.testrec1_corrected_xm = self.testrec1_corrected_xm.replace('123456789', str(recid)) self.testrec1_corrected_hm = self.testrec1_corrected_hm.replace('123456789', str(recid)) # test of the inserted record: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, self.testrec1_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.testrec1_hm), '') # Checking dumb text is in bibxxx self.failUnless(run_sql("SELECT * from bibrec_bib88x WHERE id_bibrec=%s", (recid, ))) def test_record_tags_deletion(self): """bibupload - delete mode, deleting specific tags""" # correct some tags: recs = bibupload.xml_marc_to_records(self.testrec1_xm_to_delete) err, recid = bibupload.bibupload(recs[0], opt_mode='delete') corrected_xm = print_record(recid, 'xm') corrected_hm = print_record(recid, 'hm') # did it work? self.assertEqual(compare_xmbuffers(corrected_xm, self.testrec1_corrected_xm), '') self.assertEqual(compare_hmbuffers(corrected_hm, self.testrec1_corrected_hm), '') # Checking dumb text is no more in bibxxx self.failIf(run_sql("SELECT * from bibrec_bib88x WHERE id_bibrec=%s", (recid, ))) # clean up after ourselves: bibupload.wipe_out_record_from_all_tables(recid) return class BibUploadReplaceModeTest(unittest.TestCase): """Testing replace mode.""" def setUp(self): """Initialize the MARCXML test record.""" self.testrec1_xm = """ 123456789 SzGeCERN Test, Jane Test Institute Test, John Test University Cool Test, Jim Test Laboratory """ self.testrec1_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, Jane$$uTest Institute 10047 $$aTest, John$$uTest University 10048 $$aCool 10047 $$aTest, Jim$$uTest Laboratory """ self.testrec1_xm_to_replace = """ 123456789 Test, Joseph Test Academy Test2, Joseph Test2 Academy """ self.testrec1_replaced_xm = """ 123456789 Test, Joseph Test Academy Test2, Joseph Test2 Academy """ self.testrec1_replaced_hm = """ 001__ 123456789 10047 $$aTest, Joseph$$uTest Academy 10047 $$aTest2, Joseph$$uTest2 Academy """ # insert test record: task_set_task_param('verbose', 0) test_record_xm = self.testrec1_xm.replace('123456789', '') recs = bibupload.xml_marc_to_records(test_record_xm) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recID: self.testrec1_xm = self.testrec1_xm.replace('123456789', str(recid)) self.testrec1_hm = self.testrec1_hm.replace('123456789', str(recid)) self.testrec1_xm_to_replace = self.testrec1_xm_to_replace.replace('123456789', str(recid)) self.testrec1_replaced_xm = self.testrec1_replaced_xm.replace('123456789', str(recid)) self.testrec1_replaced_hm = self.testrec1_replaced_hm.replace('123456789', str(recid)) # test of the inserted record: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, self.testrec1_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.testrec1_hm), '') def test_record_replace(self): """bibupload - replace mode, similar MARCXML tags/indicators""" # replace some tags: recs = bibupload.xml_marc_to_records(self.testrec1_xm_to_replace) err, recid = bibupload.bibupload(recs[0], opt_mode='replace') replaced_xm = print_record(recid, 'xm') replaced_hm = print_record(recid, 'hm') # did it work? self.assertEqual(compare_xmbuffers(replaced_xm, self.testrec1_replaced_xm), '') self.assertEqual(compare_hmbuffers(replaced_hm, self.testrec1_replaced_hm), '') # clean up after ourselves: bibupload.wipe_out_record_from_all_tables(recid) return class BibUploadReferencesModeTest(unittest.TestCase): """Testing references mode.""" def setUp(self): # pylint: disable-msg=C0103 """Initialize the MARCXML variable""" self.test_insert = """ 123456789 Tester, T CERN """ self.test_reference = """ 123456789 M. Lüscher and P. Weisz, String excitation energies in SU(N) gauge theories beyond the free-string approximation, J. High Energy Phys. 07 (2004) 014 """ self.test_reference_expected_xm = """ 123456789 Tester, T CERN M. Lüscher and P. Weisz, String excitation energies in SU(N) gauge theories beyond the free-string approximation, J. High Energy Phys. 07 (2004) 014 """ self.test_insert_hm = """ 001__ 123456789 100__ $$aTester, T$$uCERN """ self.test_reference_expected_hm = """ 001__ 123456789 100__ $$aTester, T$$uCERN %(reference_tag)sC5 $$mM. Lüscher and P. Weisz, String excitation energies in SU(N) gauge theories beyond the free-string approximation,$$sJ. High Energy Phys. 07 (2004) 014 """ % {'reference_tag': bibupload.CFG_BIBUPLOAD_REFERENCE_TAG} # insert test record: task_set_task_param('verbose', 0) test_insert = self.test_insert.replace('123456789', '') recs = bibupload.xml_marc_to_records(test_insert) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recID: self.test_insert = self.test_insert.replace('123456789', str(recid)) self.test_insert_hm = self.test_insert_hm.replace('123456789', str(recid)) self.test_reference = self.test_reference.replace('123456789', str(recid)) self.test_reference_expected_xm = self.test_reference_expected_xm.replace('123456789', str(recid)) self.test_reference_expected_hm = self.test_reference_expected_hm.replace('123456789', str(recid)) # test of the inserted record: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, self.test_insert), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.test_insert_hm), '') self.test_recid = recid def test_reference_complete_xml_marc(self): """bibupload - reference mode, inserting references MARCXML file""" # We create create the record out of the xml marc recs = bibupload.xml_marc_to_records(self.test_reference) # We call the main function with the record as a parameter err, recid = bibupload.bibupload(recs[0], opt_mode='reference') # We retrieve the inserted xml reference_xm = print_record(recid, 'xm') reference_hm = print_record(recid, 'hm') # Compare if the two MARCXML are the same self.assertEqual(compare_xmbuffers(reference_xm, self.test_reference_expected_xm), '') self.assertEqual(compare_hmbuffers(reference_hm, self.test_reference_expected_hm), '') # clean up after ourselves: bibupload.wipe_out_record_from_all_tables(self.test_recid) return class BibUploadFMTModeTest(unittest.TestCase): """Testing FMT mode.""" def setUp(self): # pylint: disable-msg=C0103 """Initialize the MARCXML variable""" self.new_xm_with_fmt = """ SzGeCERN HB Test. Okay. 2008-03-14 15:14:00 Bar, Baz Foo On the quux and huux """ self.expected_xm_after_inserting_new_xm_with_fmt = """ 123456789 SzGeCERN Bar, Baz Foo On the quux and huux """ self.expected_hm_after_inserting_new_xm_with_fmt = """ 001__ 123456789 003__ SzGeCERN 100__ $$aBar, Baz$$uFoo 245__ $$aOn the quux and huux """ self.recid76_xm_before_all_the_tests = print_record(76, 'xm') self.recid76_hm_before_all_the_tests = print_record(76, 'hm') self.recid76_fmts = run_sql("""SELECT last_updated, value, format FROM bibfmt WHERE id_bibrec=76""") self.recid76_xm_with_fmt = """ 76 SzGeCERN HB Test. Here is some format value. Doe, John CERN On the foos and bars """ self.recid76_xm_with_fmt_only_first = """ 76 HB Test. Let us see if this gets inserted well. """ self.recid76_xm_with_fmt_only_second = """ 76 HB Test. Yet another test, to be run after the first one. HD Test. Let's see what will be stored in the detailed format field. """ def tearDown(self): """Helper function that restores recID 76 MARCXML, using the value saved before all the tests started to execute. (see self.recid76_xm_before_all_the_tests). Does not restore HB and HD formats. """ recs = bibupload.xml_marc_to_records(self.recid76_xm_before_all_the_tests) err, recid = bibupload.bibupload(recs[0], opt_mode='replace') for (last_updated, value, format) in self.recid76_fmts: run_sql("""UPDATE bibfmt SET last_updated=%s, value=%s WHERE id_bibrec=76 AND format=%s""", (last_updated, value, format)) inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, self.recid76_xm_before_all_the_tests), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.recid76_hm_before_all_the_tests), '') def test_inserting_new_record_containing_fmt_tag(self): """bibupload - FMT tag, inserting new record containing FMT tag""" recs = bibupload.xml_marc_to_records(self.new_xm_with_fmt) (dummy, new_recid) = bibupload.bibupload(recs[0], opt_mode='insert') xm_after = print_record(new_recid, 'xm') hm_after = print_record(new_recid, 'hm') hb_after = print_record(new_recid, 'hb') self.assertEqual(compare_xmbuffers(xm_after, self.expected_xm_after_inserting_new_xm_with_fmt.replace('123456789', str(new_recid))), '') self.assertEqual(compare_hmbuffers(hm_after, self.expected_hm_after_inserting_new_xm_with_fmt.replace('123456789', str(new_recid))), '') self.assertEqual(run_sql('SELECT last_updated from bibfmt WHERE id_bibrec=%s', (new_recid, ))[0][0], datetime.datetime(2008, 3, 14, 15, 14)) self.failUnless(hb_after.startswith("Test. Okay.")) def test_updating_existing_record_formats_in_format_mode(self): """bibupload - FMT tag, updating existing record via format mode""" xm_before = print_record(76, 'xm') hm_before = print_record(76, 'hm') # insert first format value: recs = bibupload.xml_marc_to_records(self.recid76_xm_with_fmt_only_first) bibupload.bibupload(recs[0], opt_mode='format') xm_after = print_record(76, 'xm') hm_after = print_record(76, 'hm') hb_after = print_record(76, 'hb') self.assertEqual(xm_after, xm_before) self.assertEqual(hm_after, hm_before) self.failUnless(hb_after.startswith("Test. Let us see if this gets inserted well.")) # now insert another format value and recheck: recs = bibupload.xml_marc_to_records(self.recid76_xm_with_fmt_only_second) bibupload.bibupload(recs[0], opt_mode='format') xm_after = print_record(76, 'xm') hm_after = print_record(76, 'hm') hb_after = print_record(76, 'hb') hd_after = print_record(76, 'hd') self.assertEqual(xm_after, xm_before) self.assertEqual(hm_after, hm_before) self.failUnless(hb_after.startswith("Test. Yet another test, to be run after the first one.")) self.failUnless(hd_after.startswith("Test. Let's see what will be stored in the detailed format field.")) def test_updating_existing_record_formats_in_correct_mode(self): """bibupload - FMT tag, updating existing record via correct mode""" xm_before = print_record(76, 'xm') hm_before = print_record(76, 'hm') # insert first format value: recs = bibupload.xml_marc_to_records(self.recid76_xm_with_fmt_only_first) bibupload.bibupload(recs[0], opt_mode='correct') xm_after = print_record(76, 'xm') hm_after = print_record(76, 'hm') hb_after = print_record(76, 'hb') self.assertEqual(xm_after, xm_before) self.assertEqual(hm_after, hm_before) self.failUnless(hb_after.startswith("Test. Let us see if this gets inserted well.")) # now insert another format value and recheck: recs = bibupload.xml_marc_to_records(self.recid76_xm_with_fmt_only_second) bibupload.bibupload(recs[0], opt_mode='correct') xm_after = print_record(76, 'xm') hm_after = print_record(76, 'hm') hb_after = print_record(76, 'hb') hd_after = print_record(76, 'hd') self.assertEqual(xm_after, xm_before) self.assertEqual(hm_after, hm_before) self.failUnless(hb_after.startswith("Test. Yet another test, to be run after the first one.")) self.failUnless(hd_after.startswith("Test. Let's see what will be stored in the detailed format field.")) def test_updating_existing_record_formats_in_replace_mode(self): """bibupload - FMT tag, updating existing record via replace mode""" # insert first format value: recs = bibupload.xml_marc_to_records(self.recid76_xm_with_fmt_only_first) bibupload.bibupload(recs[0], opt_mode='replace') xm_after = print_record(76, 'xm') hm_after = print_record(76, 'hm') hb_after = print_record(76, 'hb') self.assertEqual(compare_xmbuffers(xm_after, '76'), '') self.assertEqual(compare_hmbuffers(hm_after, '000000076 001__ 76'), '') self.failUnless(hb_after.startswith("Test. Let us see if this gets inserted well.")) # now insert another format value and recheck: recs = bibupload.xml_marc_to_records(self.recid76_xm_with_fmt_only_second) bibupload.bibupload(recs[0], opt_mode='replace') xm_after = print_record(76, 'xm') hm_after = print_record(76, 'hm') hb_after = print_record(76, 'hb') hd_after = print_record(76, 'hd') self.assertEqual(compare_xmbuffers(xm_after, """ 76 """), '') self.assertEqual(compare_hmbuffers(hm_after, '000000076 001__ 76'), '') self.failUnless(hb_after.startswith("Test. Yet another test, to be run after the first one.")) self.failUnless(hd_after.startswith("Test. Let's see what will be stored in the detailed format field.")) # final insertion and recheck: recs = bibupload.xml_marc_to_records(self.recid76_xm_with_fmt) bibupload.bibupload(recs[0], opt_mode='replace') xm_after = print_record(76, 'xm') hm_after = print_record(76, 'hm') hb_after = print_record(76, 'hb') hd_after = print_record(76, 'hd') self.assertEqual(compare_xmbuffers(xm_after, """ 76 SzGeCERN Doe, John CERN On the foos and bars """), '') self.assertEqual(compare_hmbuffers(hm_after, """ 001__ 76 003__ SzGeCERN 100__ $$aDoe, John$$uCERN 245__ $$aOn the foos and bars """), '') self.failUnless(hb_after.startswith("Test. Here is some format value.")) self.failUnless(hd_after.startswith("Test. Let's see what will be stored in the detailed format field.")) class BibUploadRecordsWithSYSNOTest(unittest.TestCase): """Testing uploading of records that have external SYSNO present.""" def setUp(self): # pylint: disable-msg=C0103 """Initialize the MARCXML test records.""" self.verbose = 0 # Note that SYSNO fields are repeated but with different # subfields, this is to test whether bibupload would not # mistakenly pick up wrong values. self.xm_testrec1 = """ 123456789 SzGeCERN Bar, Baz Foo On the quux and huux 1 sysno1 sysno2 """ % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3], 'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] or " ", 'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] or " ", 'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6], } self.hm_testrec1 = """ 001__ 123456789 003__ SzGeCERN 100__ $$aBar, Baz$$uFoo 245__ $$aOn the quux and huux 1 %(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$%(sysnosubfieldcode)ssysno1 %(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$0sysno2 """ % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3], 'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4], 'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5], 'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6], } self.xm_testrec1_to_update = """ SzGeCERN Bar, Baz Foo On the quux and huux 1 Updated sysno1 sysno2 """ % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3], 'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] or " ", 'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] or " ", 'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6], } self.xm_testrec1_updated = """ 123456789 SzGeCERN Bar, Baz Foo On the quux and huux 1 Updated sysno1 sysno2 """ % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3], 'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] or " ", 'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] or " ", 'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6], } self.hm_testrec1_updated = """ 001__ 123456789 003__ SzGeCERN 100__ $$aBar, Baz$$uFoo 245__ $$aOn the quux and huux 1 Updated %(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$%(sysnosubfieldcode)ssysno1 %(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$0sysno2 """ % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3], 'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4], 'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5], 'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6], } self.xm_testrec2 = """ 987654321 SzGeCERN Bar, Baz Foo On the quux and huux 2 sysno2 sysno1 """ % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3], 'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] or " ", 'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] or " ", 'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6], } self.hm_testrec2 = """ 001__ 987654321 003__ SzGeCERN 100__ $$aBar, Baz$$uFoo 245__ $$aOn the quux and huux 2 %(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$%(sysnosubfieldcode)ssysno2 %(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$0sysno1 """ % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3], 'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4], 'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5], 'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6], } def test_insert_the_same_sysno_record(self): """bibupload - SYSNO tag, refuse to insert the same SYSNO record""" # initialize bibupload mode: if self.verbose: print "test_insert_the_same_sysno_record() started" # insert record 1 first time: testrec_to_insert_first = self.xm_testrec1.replace('123456789', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) task_set_task_param('verbose', 0) err1, recid1 = bibupload.bibupload(recs[0], opt_mode='insert') inserted_xm = print_record(recid1, 'xm') inserted_hm = print_record(recid1, 'hm') # use real recID when comparing whether it worked: self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1)) self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1), '') # insert record 2 first time: testrec_to_insert_first = self.xm_testrec2.replace('987654321', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) task_set_task_param('verbose', 0) err2, recid2 = bibupload.bibupload(recs[0], opt_mode='insert') inserted_xm = print_record(recid2, 'xm') inserted_hm = print_record(recid2, 'hm') # use real recID when comparing whether it worked: self.xm_testrec2 = self.xm_testrec2.replace('987654321', str(recid2)) self.hm_testrec2 = self.hm_testrec2.replace('987654321', str(recid2)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec2), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec2), '') # try to insert updated record 1, it should fail: recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update) task_set_task_param('verbose', 0) err1_updated, recid1_updated = bibupload.bibupload(recs[0], opt_mode='insert') self.assertEqual(-1, recid1_updated) # delete test records bibupload.wipe_out_record_from_all_tables(recid1) bibupload.wipe_out_record_from_all_tables(recid2) bibupload.wipe_out_record_from_all_tables(recid1_updated) if self.verbose: print "test_insert_the_same_sysno_record() finished" def test_insert_or_replace_the_same_sysno_record(self): """bibupload - SYSNO tag, allow to insert or replace the same SYSNO record""" # initialize bibupload mode: task_set_task_param('verbose', self.verbose) if self.verbose: print "test_insert_or_replace_the_same_sysno_record() started" # insert/replace record 1 first time: testrec_to_insert_first = self.xm_testrec1.replace('123456789', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert') inserted_xm = print_record(recid1, 'xm') inserted_hm = print_record(recid1, 'hm') # use real recID in test buffers when comparing whether it worked: self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1)) self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1), '') # try to insert/replace updated record 1, it should be okay: task_set_task_param('verbose', self.verbose) recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update) err1_updated, recid1_updated = bibupload.bibupload(recs[0], opt_mode='replace_or_insert') inserted_xm = print_record(recid1_updated, 'xm') inserted_hm = print_record(recid1_updated, 'hm') self.assertEqual(recid1, recid1_updated) # use real recID in test buffers when comparing whether it worked: self.xm_testrec1_updated = self.xm_testrec1_updated.replace('123456789', str(recid1)) self.hm_testrec1_updated = self.hm_testrec1_updated.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1_updated), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1_updated), '') # delete test records bibupload.wipe_out_record_from_all_tables(recid1) bibupload.wipe_out_record_from_all_tables(recid1_updated) if self.verbose: print "test_insert_or_replace_the_same_sysno_record() finished" def test_replace_nonexisting_sysno_record(self): """bibupload - SYSNO tag, refuse to replace non-existing SYSNO record""" # initialize bibupload mode: task_set_task_param('verbose', self.verbose) if self.verbose: print "test_replace_nonexisting_sysno_record() started" # insert record 1 first time: testrec_to_insert_first = self.xm_testrec1.replace('123456789', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert') inserted_xm = print_record(recid1, 'xm') inserted_hm = print_record(recid1, 'hm') # use real recID in test buffers when comparing whether it worked: self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1)) self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1), '') # try to replace record 2 it should fail: testrec_to_insert_first = self.xm_testrec2.replace('987654321', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err2, recid2 = bibupload.bibupload(recs[0], opt_mode='replace') self.assertEqual(-1, recid2) # delete test records bibupload.wipe_out_record_from_all_tables(recid1) bibupload.wipe_out_record_from_all_tables(recid2) if self.verbose: print "test_replace_nonexisting_sysno_record() finished" class BibUploadRecordsWithEXTOAIIDTest(unittest.TestCase): """Testing uploading of records that have external EXTOAIID present.""" def setUp(self): # pylint: disable-msg=C0103 """Initialize the MARCXML test records.""" self.verbose = 0 # Note that EXTOAIID fields are repeated but with different # subfields, this is to test whether bibupload would not # mistakenly pick up wrong values. self.xm_testrec1 = """ 123456789 SzGeCERN extoaiid1 extoaisrc1 extoaiid2 Bar, Baz Foo On the quux and huux 1 """ % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3], 'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] or " ", 'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] or " ", 'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6], 'extoaisrcsubfieldcode' : CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG[5:6], } self.hm_testrec1 = """ 001__ 123456789 003__ SzGeCERN %(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$%(extoaisrcsubfieldcode)sextoaisrc1$$%(extoaiidsubfieldcode)sextoaiid1 %(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$0extoaiid2 100__ $$aBar, Baz$$uFoo 245__ $$aOn the quux and huux 1 """ % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3], 'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4], 'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5], 'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6], 'extoaisrcsubfieldcode' : CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG[5:6], } self.xm_testrec1_to_update = """ SzGeCERN extoaiid1 extoaisrc1 extoaiid2 Bar, Baz Foo On the quux and huux 1 Updated """ % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3], 'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] or " ", 'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] or " ", 'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6], 'extoaisrcsubfieldcode' : CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG[5:6], } self.xm_testrec1_updated = """ 123456789 SzGeCERN extoaiid1 extoaisrc1 extoaiid2 Bar, Baz Foo On the quux and huux 1 Updated """ % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3], 'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] or " ", 'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] or " ", 'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6], 'extoaisrcsubfieldcode' : CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG[5:6], } self.hm_testrec1_updated = """ 001__ 123456789 003__ SzGeCERN %(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$%(extoaisrcsubfieldcode)sextoaisrc1$$%(extoaiidsubfieldcode)sextoaiid1 %(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$0extoaiid2 100__ $$aBar, Baz$$uFoo 245__ $$aOn the quux and huux 1 Updated """ % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3], 'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4], 'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5], 'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6], 'extoaisrcsubfieldcode' : CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG[5:6], } self.xm_testrec2 = """ 987654321 SzGeCERN extoaiid2 extoaisrc1 extoaiid1 Bar, Baz Foo On the quux and huux 2 """ % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3], 'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] or " ", 'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] != "_" and \ CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] or " ", 'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6], 'extoaisrcsubfieldcode' : CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG[5:6], } self.hm_testrec2 = """ 001__ 987654321 003__ SzGeCERN %(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$%(extoaisrcsubfieldcode)sextoaisrc1$$%(extoaiidsubfieldcode)sextoaiid2 %(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$0extoaiid1 100__ $$aBar, Baz$$uFoo 245__ $$aOn the quux and huux 2 """ % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3], 'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4], 'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5], 'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6], 'extoaisrcsubfieldcode' : CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG[5:6], } def test_insert_the_same_extoaiid_record(self): """bibupload - EXTOAIID tag, refuse to insert the same EXTOAIID record""" # initialize bibupload mode: task_set_task_param('verbose', self.verbose) if self.verbose: print "test_insert_the_same_extoaiid_record() started" # insert record 1 first time: testrec_to_insert_first = self.xm_testrec1.replace('123456789', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err1, recid1 = bibupload.bibupload(recs[0], opt_mode='insert') inserted_xm = print_record(recid1, 'xm') inserted_hm = print_record(recid1, 'hm') # use real recID when comparing whether it worked: self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1)) self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1), '') # insert record 2 first time: testrec_to_insert_first = self.xm_testrec2.replace('987654321', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err2, recid2 = bibupload.bibupload(recs[0], opt_mode='insert') inserted_xm = print_record(recid2, 'xm') inserted_hm = print_record(recid2, 'hm') # use real recID when comparing whether it worked: self.xm_testrec2 = self.xm_testrec2.replace('987654321', str(recid2)) self.hm_testrec2 = self.hm_testrec2.replace('987654321', str(recid2)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec2), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec2), '') # try to insert updated record 1, it should fail: recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update) err1_updated, recid1_updated = bibupload.bibupload(recs[0], opt_mode='insert') self.assertEqual(-1, recid1_updated) # delete test records bibupload.wipe_out_record_from_all_tables(recid1) bibupload.wipe_out_record_from_all_tables(recid2) bibupload.wipe_out_record_from_all_tables(recid1_updated) if self.verbose: print "test_insert_the_same_extoaiid_record() finished" def test_insert_or_replace_the_same_extoaiid_record(self): """bibupload - EXTOAIID tag, allow to insert or replace the same EXTOAIID record""" # initialize bibupload mode: task_set_task_param('verbose', self.verbose) if self.verbose: print "test_insert_or_replace_the_same_extoaiid_record() started" # insert/replace record 1 first time: testrec_to_insert_first = self.xm_testrec1.replace('123456789', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert') inserted_xm = print_record(recid1, 'xm') inserted_hm = print_record(recid1, 'hm') # use real recID in test buffers when comparing whether it worked: self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1)) self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1), '') # try to insert/replace updated record 1, it should be okay: recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update) err1_updated, recid1_updated = bibupload.bibupload(recs[0], opt_mode='replace_or_insert') inserted_xm = print_record(recid1_updated, 'xm') inserted_hm = print_record(recid1_updated, 'hm') self.assertEqual(recid1, recid1_updated) # use real recID in test buffers when comparing whether it worked: self.xm_testrec1_updated = self.xm_testrec1_updated.replace('123456789', str(recid1)) self.hm_testrec1_updated = self.hm_testrec1_updated.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1_updated), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1_updated), '') # delete test records bibupload.wipe_out_record_from_all_tables(recid1) bibupload.wipe_out_record_from_all_tables(recid1_updated) if self.verbose: print "test_insert_or_replace_the_same_extoaiid_record() finished" def test_replace_nonexisting_extoaiid_record(self): """bibupload - EXTOAIID tag, refuse to replace non-existing EXTOAIID record""" # initialize bibupload mode: task_set_task_param('verbose', self.verbose) if self.verbose: print "test_replace_nonexisting_extoaiid_record() started" # insert record 1 first time: testrec_to_insert_first = self.xm_testrec1.replace('123456789', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert') inserted_xm = print_record(recid1, 'xm') inserted_hm = print_record(recid1, 'hm') # use real recID in test buffers when comparing whether it worked: self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1)) self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1), '') # try to replace record 2 it should fail: testrec_to_insert_first = self.xm_testrec2.replace('987654321', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err2, recid2 = bibupload.bibupload(recs[0], opt_mode='replace') self.assertEqual(-1, recid2) # delete test records bibupload.wipe_out_record_from_all_tables(recid1) bibupload.wipe_out_record_from_all_tables(recid2) if self.verbose: print "test_replace_nonexisting_extoaiid_record() finished" class BibUploadRecordsWithOAIIDTest(unittest.TestCase): """Testing uploading of records that have OAI ID present.""" def setUp(self): # pylint: disable-msg=C0103 """Initialize the MARCXML test records.""" self.verbose = 0 # Note that OAI fields are repeated but with different # subfields, this is to test whether bibupload would not # mistakenly pick up wrong values. self.xm_testrec1 = """ 123456789 SzGeCERN Bar, Baz Foo On the quux and huux 1 oai:foo:1 oai:foo:2 """ % {'oaitag': CFG_OAI_ID_FIELD[0:3], 'oaiind1': CFG_OAI_ID_FIELD[3:4] != "_" and \ CFG_OAI_ID_FIELD[3:4] or " ", 'oaiind2': CFG_OAI_ID_FIELD[4:5] != "_" and \ CFG_OAI_ID_FIELD[4:5] or " ", 'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6], } self.hm_testrec1 = """ 001__ 123456789 003__ SzGeCERN 100__ $$aBar, Baz$$uFoo 245__ $$aOn the quux and huux 1 %(oaitag)s%(oaiind1)s%(oaiind2)s $$%(oaisubfieldcode)soai:foo:1 %(oaitag)s%(oaiind1)s%(oaiind2)s $$0oai:foo:2 """ % {'oaitag': CFG_OAI_ID_FIELD[0:3], 'oaiind1': CFG_OAI_ID_FIELD[3:4], 'oaiind2': CFG_OAI_ID_FIELD[4:5], 'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6], } self.xm_testrec1_to_update = """ SzGeCERN Bar, Baz Foo On the quux and huux 1 Updated oai:foo:1 oai:foo:2 """ % {'oaitag': CFG_OAI_ID_FIELD[0:3], 'oaiind1': CFG_OAI_ID_FIELD[3:4] != "_" and \ CFG_OAI_ID_FIELD[3:4] or " ", 'oaiind2': CFG_OAI_ID_FIELD[4:5] != "_" and \ CFG_OAI_ID_FIELD[4:5] or " ", 'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6], } self.xm_testrec1_updated = """ 123456789 SzGeCERN Bar, Baz Foo On the quux and huux 1 Updated oai:foo:1 oai:foo:2 """ % {'oaitag': CFG_OAI_ID_FIELD[0:3], 'oaiind1': CFG_OAI_ID_FIELD[3:4] != "_" and \ CFG_OAI_ID_FIELD[3:4] or " ", 'oaiind2': CFG_OAI_ID_FIELD[4:5] != "_" and \ CFG_OAI_ID_FIELD[4:5] or " ", 'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6], } self.hm_testrec1_updated = """ 001__ 123456789 003__ SzGeCERN 100__ $$aBar, Baz$$uFoo 245__ $$aOn the quux and huux 1 Updated %(oaitag)s%(oaiind1)s%(oaiind2)s $$%(oaisubfieldcode)soai:foo:1 %(oaitag)s%(oaiind1)s%(oaiind2)s $$0oai:foo:2 """ % {'oaitag': CFG_OAI_ID_FIELD[0:3], 'oaiind1': CFG_OAI_ID_FIELD[3:4], 'oaiind2': CFG_OAI_ID_FIELD[4:5], 'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6], } self.xm_testrec2 = """ 987654321 SzGeCERN Bar, Baz Foo On the quux and huux 2 oai:foo:2 oai:foo:1 """ % {'oaitag': CFG_OAI_ID_FIELD[0:3], 'oaiind1': CFG_OAI_ID_FIELD[3:4] != "_" and \ CFG_OAI_ID_FIELD[3:4] or " ", 'oaiind2': CFG_OAI_ID_FIELD[4:5] != "_" and \ CFG_OAI_ID_FIELD[4:5] or " ", 'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6], } self.hm_testrec2 = """ 001__ 987654321 003__ SzGeCERN 100__ $$aBar, Baz$$uFoo 245__ $$aOn the quux and huux 2 %(oaitag)s%(oaiind1)s%(oaiind2)s $$%(oaisubfieldcode)soai:foo:2 %(oaitag)s%(oaiind1)s%(oaiind2)s $$0oai:foo:1 """ % {'oaitag': CFG_OAI_ID_FIELD[0:3], 'oaiind1': CFG_OAI_ID_FIELD[3:4], 'oaiind2': CFG_OAI_ID_FIELD[4:5], 'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6], } def test_insert_the_same_oai_record(self): """bibupload - OAIID tag, refuse to insert the same OAI record""" task_set_task_param('verbose', self.verbose) # insert record 1 first time: testrec_to_insert_first = self.xm_testrec1.replace('123456789', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err1, recid1 = bibupload.bibupload(recs[0], opt_mode='insert') inserted_xm = print_record(recid1, 'xm') inserted_hm = print_record(recid1, 'hm') # use real recID when comparing whether it worked: self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1)) self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1), '') # insert record 2 first time: testrec_to_insert_first = self.xm_testrec2.replace('987654321', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err2, recid2 = bibupload.bibupload(recs[0], opt_mode='insert') inserted_xm = print_record(recid2, 'xm') inserted_hm = print_record(recid2, 'hm') # use real recID when comparing whether it worked: self.xm_testrec2 = self.xm_testrec2.replace('987654321', str(recid2)) self.hm_testrec2 = self.hm_testrec2.replace('987654321', str(recid2)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec2), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec2), '') # try to insert updated record 1, it should fail: recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update) err1_updated, recid1_updated = bibupload.bibupload(recs[0], opt_mode='insert') self.assertEqual(-1, recid1_updated) # delete test records bibupload.wipe_out_record_from_all_tables(recid1) bibupload.wipe_out_record_from_all_tables(recid2) bibupload.wipe_out_record_from_all_tables(recid1_updated) def test_insert_or_replace_the_same_oai_record(self): """bibupload - OAIID tag, allow to insert or replace the same OAI record""" # initialize bibupload mode: task_set_task_param('verbose', self.verbose) # insert/replace record 1 first time: testrec_to_insert_first = self.xm_testrec1.replace('123456789', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert') inserted_xm = print_record(recid1, 'xm') inserted_hm = print_record(recid1, 'hm') # use real recID in test buffers when comparing whether it worked: self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1)) self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1), '') # try to insert/replace updated record 1, it should be okay: recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update) err1_updated, recid1_updated = bibupload.bibupload(recs[0], opt_mode='replace_or_insert') inserted_xm = print_record(recid1_updated, 'xm') inserted_hm = print_record(recid1_updated, 'hm') self.assertEqual(recid1, recid1_updated) # use real recID in test buffers when comparing whether it worked: self.xm_testrec1_updated = self.xm_testrec1_updated.replace('123456789', str(recid1)) self.hm_testrec1_updated = self.hm_testrec1_updated.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1_updated), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1_updated), '') # delete test records bibupload.wipe_out_record_from_all_tables(recid1) bibupload.wipe_out_record_from_all_tables(recid1_updated) def test_replace_nonexisting_oai_record(self): """bibupload - OAIID tag, refuse to replace non-existing OAI record""" task_set_task_param('verbose', self.verbose) # insert record 1 first time: testrec_to_insert_first = self.xm_testrec1.replace('123456789', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert') inserted_xm = print_record(recid1, 'xm') inserted_hm = print_record(recid1, 'hm') # use real recID in test buffers when comparing whether it worked: self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1)) self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1)) self.assertEqual(compare_xmbuffers(inserted_xm, self.xm_testrec1), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.hm_testrec1), '') # try to replace record 2 it should fail: testrec_to_insert_first = self.xm_testrec2.replace('987654321', '') recs = bibupload.xml_marc_to_records(testrec_to_insert_first) err2, recid2 = bibupload.bibupload(recs[0], opt_mode='replace') self.assertEqual(-1, recid2) # delete test records bibupload.wipe_out_record_from_all_tables(recid1) bibupload.wipe_out_record_from_all_tables(recid2) class BibUploadIndicatorsTest(unittest.TestCase): """ Testing uploading of a MARCXML record with indicators having either blank space (as per MARC schema) or empty string value (old behaviour). """ def setUp(self): """Initialize the MARCXML test record.""" self.testrec1_xm = """ SzGeCERN Test, John Test University """ self.testrec1_hm = """ 003__ SzGeCERN 100__ $$aTest, John$$uTest University """ self.testrec2_xm = """ SzGeCERN Test, John Test University """ self.testrec2_hm = """ 003__ SzGeCERN 100__ $$aTest, John$$uTest University """ def test_record_with_spaces_in_indicators(self): """bibupload - inserting MARCXML with spaces in indicators""" task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(self.testrec1_xm) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(remove_tag_001_from_xmbuffer(inserted_xm), self.testrec1_xm), '') self.assertEqual(compare_hmbuffers(remove_tag_001_from_hmbuffer(inserted_hm), self.testrec1_hm), '') bibupload.wipe_out_record_from_all_tables(recid) def test_record_with_no_spaces_in_indicators(self): """bibupload - inserting MARCXML with no spaces in indicators""" task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(self.testrec2_xm) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(remove_tag_001_from_xmbuffer(inserted_xm), self.testrec2_xm), '') self.assertEqual(compare_hmbuffers(remove_tag_001_from_hmbuffer(inserted_hm), self.testrec2_hm), '') bibupload.wipe_out_record_from_all_tables(recid) class BibUploadUpperLowerCaseTest(unittest.TestCase): """ Testing treatment of similar records with only upper and lower case value differences in the bibxxx table. """ def setUp(self): """Initialize the MARCXML test records.""" self.testrec1_xm = """ SzGeCERN Test, John Test University """ self.testrec1_hm = """ 003__ SzGeCERN 100__ $$aTest, John$$uTest University """ self.testrec2_xm = """ SzGeCERN TeSt, JoHn Test UniVeRsity """ self.testrec2_hm = """ 003__ SzGeCERN 100__ $$aTeSt, JoHn$$uTest UniVeRsity """ def test_record_with_upper_lower_case_letters(self): """bibupload - inserting similar MARCXML records with upper/lower case""" task_set_task_param('verbose', 0) # insert test record #1: recs = bibupload.xml_marc_to_records(self.testrec1_xm) err1, recid1 = bibupload.bibupload(recs[0], opt_mode='insert') recid1_inserted_xm = print_record(recid1, 'xm') recid1_inserted_hm = print_record(recid1, 'hm') # insert test record #2: recs = bibupload.xml_marc_to_records(self.testrec2_xm) err1, recid2 = bibupload.bibupload(recs[0], opt_mode='insert') recid2_inserted_xm = print_record(recid2, 'xm') recid2_inserted_hm = print_record(recid2, 'hm') # let us compare stuff now: self.assertEqual(compare_xmbuffers(remove_tag_001_from_xmbuffer(recid1_inserted_xm), self.testrec1_xm), '') self.assertEqual(compare_hmbuffers(remove_tag_001_from_hmbuffer(recid1_inserted_hm), self.testrec1_hm), '') self.assertEqual(compare_xmbuffers(remove_tag_001_from_xmbuffer(recid2_inserted_xm), self.testrec2_xm), '') self.assertEqual(compare_hmbuffers(remove_tag_001_from_hmbuffer(recid2_inserted_hm), self.testrec2_hm), '') # clean up after ourselves: bibupload.wipe_out_record_from_all_tables(recid1) bibupload.wipe_out_record_from_all_tables(recid2) class BibUploadControlledProvenanceTest(unittest.TestCase): """Testing treatment of tags under controlled provenance in the correct mode.""" def setUp(self): """Initialize the MARCXML test record.""" self.testrec1_xm = """ 123456789 SzGeCERN Test, Jane Test Institute Test title blabla sam blublu sim human """ self.testrec1_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, Jane$$uTest Institute 245__ $$aTest title 6531_ $$9sam$$ablabla 6531_ $$9sim$$ablublu 6531_ $$ahuman """ self.testrec1_xm_to_correct = """ 123456789 bleble sim bloblo som """ self.testrec1_corrected_xm = """ 123456789 SzGeCERN Test, Jane Test Institute Test title blabla sam human bleble sim bloblo som """ self.testrec1_corrected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, Jane$$uTest Institute 245__ $$aTest title 6531_ $$9sam$$ablabla 6531_ $$ahuman 6531_ $$9sim$$ableble 6531_ $$9som$$abloblo """ # insert test record: task_set_task_param('verbose', 0) test_record_xm = self.testrec1_xm.replace('123456789', '') recs = bibupload.xml_marc_to_records(test_record_xm) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recID: self.testrec1_xm = self.testrec1_xm.replace('123456789', str(recid)) self.testrec1_hm = self.testrec1_hm.replace('123456789', str(recid)) self.testrec1_xm_to_correct = self.testrec1_xm_to_correct.replace('123456789', str(recid)) self.testrec1_corrected_xm = self.testrec1_corrected_xm.replace('123456789', str(recid)) self.testrec1_corrected_hm = self.testrec1_corrected_hm.replace('123456789', str(recid)) # test of the inserted record: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, self.testrec1_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.testrec1_hm), '') def test_controlled_provenance_persistence(self): """bibupload - correct mode, tags with controlled provenance""" # correct metadata tags; will the protected tags be kept? task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(self.testrec1_xm_to_correct) err, recid = bibupload.bibupload(recs[0], opt_mode='correct') corrected_xm = print_record(recid, 'xm') corrected_hm = print_record(recid, 'hm') # did it work? self.assertEqual(compare_xmbuffers(corrected_xm, self.testrec1_corrected_xm), '') self.assertEqual(compare_hmbuffers(corrected_hm, self.testrec1_corrected_hm), '') # clean up after ourselves: bibupload.wipe_out_record_from_all_tables(recid) class BibUploadStrongTagsTest(unittest.TestCase): """Testing treatment of strong tags and the replace mode.""" def setUp(self): """Initialize the MARCXML test record.""" self.testrec1_xm = """ 123456789 SzGeCERN Test, Jane Test Institute Test title A value Another value """ % {'strong_tag': bibupload.CFG_BIBUPLOAD_STRONG_TAGS[0]} self.testrec1_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, Jane$$uTest Institute 245__ $$aTest title %(strong_tag)s__ $$aA value$$bAnother value """ % {'strong_tag': bibupload.CFG_BIBUPLOAD_STRONG_TAGS[0]} self.testrec1_xm_to_replace = """ 123456789 Test, Joseph Test Academy """ self.testrec1_replaced_xm = """ 123456789 Test, Joseph Test Academy A value Another value """ % {'strong_tag': bibupload.CFG_BIBUPLOAD_STRONG_TAGS[0]} self.testrec1_replaced_hm = """ 001__ 123456789 100__ $$aTest, Joseph$$uTest Academy %(strong_tag)s__ $$aA value$$bAnother value """ % {'strong_tag': bibupload.CFG_BIBUPLOAD_STRONG_TAGS[0]} # insert test record: task_set_task_param('verbose', 0) test_record_xm = self.testrec1_xm.replace('123456789', '') recs = bibupload.xml_marc_to_records(test_record_xm) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recID: self.testrec1_xm = self.testrec1_xm.replace('123456789', str(recid)) self.testrec1_hm = self.testrec1_hm.replace('123456789', str(recid)) self.testrec1_xm_to_replace = self.testrec1_xm_to_replace.replace('123456789', str(recid)) self.testrec1_replaced_xm = self.testrec1_replaced_xm.replace('123456789', str(recid)) self.testrec1_replaced_hm = self.testrec1_replaced_hm.replace('123456789', str(recid)) # test of the inserted record: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, self.testrec1_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, self.testrec1_hm), '') def test_strong_tags_persistence(self): """bibupload - strong tags, persistence in replace mode""" # replace all metadata tags; will the strong tags be kept? recs = bibupload.xml_marc_to_records(self.testrec1_xm_to_replace) err, recid = bibupload.bibupload(recs[0], opt_mode='replace') replaced_xm = print_record(recid, 'xm') replaced_hm = print_record(recid, 'hm') # did it work? self.assertEqual(compare_xmbuffers(replaced_xm, self.testrec1_replaced_xm), '') self.assertEqual(compare_hmbuffers(replaced_hm, self.testrec1_replaced_hm), '') # clean up after ourselves: bibupload.wipe_out_record_from_all_tables(recid) return class BibUploadFFTModeTest(unittest.TestCase): """ Testing treatment of fulltext file transfer import mode. """ def _test_bibdoc_status(self, recid, docname, status): res = run_sql('SELECT bd.status FROM bibrec_bibdoc as bb JOIN bibdoc as bd ON bb.id_bibdoc = bd.id WHERE bb.id_bibrec = %s AND bd.docname = %s', (recid, docname)) self.failUnless(res) self.assertEqual(status, res[0][0]) def test_writing_rights(self): """bibupload - FFT has writing rights""" self.failUnless(bibupload.writing_rights_p()) def test_simple_fft_insert(self): """bibupload - simple FFT insert""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University http://cds.cern.ch/img/cds.gif """ testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/cds.gif """ % {'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/cds.gif """ % {'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/cds.gif" \ % {'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self.failUnless(try_url_download(testrec_expected_url)) bibupload.wipe_out_record_from_all_tables(recid) def test_exotic_format_fft_append(self): """bibupload - exotic format FFT append""" # define the test case: testfile = os.path.join(CFG_TMPDIR, 'test.ps.Z') open(testfile, 'w').write('TEST') test_to_upload = """ SzGeCERN Test, John Test University """ testrec_to_append = """ 123456789 %s """ % testfile testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/test.ps.Z """ % {'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/test.ps.Z """ % {'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/test.ps.Z" \ % {'siteurl': CFG_SITE_URL} testrec_expected_url2 = "%(siteurl)s/record/123456789/files/test?format=ps.Z" \ % {'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_to_append = testrec_to_append.replace('123456789', str(recid)) testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) testrec_expected_url2 = testrec_expected_url.replace('123456789', str(recid)) recs = bibupload.xml_marc_to_records(testrec_to_append) err, recid = bibupload.bibupload(recs[0], opt_mode='append') # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self.assertEqual(urlopen(testrec_expected_url).read(), 'TEST') self.assertEqual(urlopen(testrec_expected_url2).read(), 'TEST') bibupload.wipe_out_record_from_all_tables(recid) def test_fft_check_md5_through_bibrecdoc_str(self): """bibupload - simple FFT insert, check md5 through BibRecDocs.str()""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University %s/img/head.gif """ % CFG_SITE_URL # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') original_md5 = md5(urlopen('%s/img/head.gif' % CFG_SITE_URL).read()).hexdigest() bibrec_str = str(BibRecDocs(int(recid))) md5_found = False for row in bibrec_str.split('\n'): if 'checksum' in row: if original_md5 in row: md5_found = True self.failUnless(md5_found) bibupload.wipe_out_record_from_all_tables(recid) def test_detailed_fft_insert(self): """bibupload - detailed FFT insert""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University http://cds.cern.ch/img/cds.gif SuperMain This is a description This is a comment CIDIESSE http://cds.cern.ch/img/cds.gif SuperMain .jpeg This is a description This is a second comment CIDIESSE """ testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/CIDIESSE.gif This is a description This is a comment %(siteurl)s/record/123456789/files/CIDIESSE.jpeg This is a description This is a second comment """ % {'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/CIDIESSE.gif$$yThis is a description$$zThis is a comment 8564_ $$u%(siteurl)s/record/123456789/files/CIDIESSE.jpeg$$yThis is a description$$zThis is a second comment """ % {'siteurl': CFG_SITE_URL} testrec_expected_url1 = "%(siteurl)s/record/123456789/files/CIDIESSE.gif" % {'siteurl': CFG_SITE_URL} testrec_expected_url2 = "%(siteurl)s/record/123456789/files/CIDIESSE.jpeg" % {'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url1 = testrec_expected_url1.replace('123456789', str(recid)) testrec_expected_url2 = testrec_expected_url1.replace('123456789', str(recid)) # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self.failUnless(try_url_download(testrec_expected_url1)) self.failUnless(try_url_download(testrec_expected_url2)) bibupload.wipe_out_record_from_all_tables(recid) def test_simple_fft_insert_with_restriction(self): """bibupload - simple FFT insert with restriction""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University http://cds.cern.ch/img/cds.gif thesis http://cds.cern.ch/img/cds.gif """ testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/cds.gif - %(siteurl)s/record/123456789/files/icon-cds.gif + %(siteurl)s/record/123456789/files/cds.gif?subformat=icon icon """ % {'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/cds.gif - 8564_ $$q%(siteurl)s/record/123456789/files/icon-cds.gif$$xicon + 8564_ $$u%(siteurl)s/record/123456789/files/cds.gif?subformat=icon$$xicon """ % {'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/cds.gif" \ % {'siteurl': CFG_SITE_URL} - testrec_expected_icon = "%(siteurl)s/record/123456789/files/icon-cds.gif" \ + testrec_expected_icon = "%(siteurl)s/record/123456789/files/cds.gif?subformat=icon" \ % {'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) testrec_expected_icon = testrec_expected_icon.replace('123456789', str(recid)) # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') - open_url = urlopen(testrec_expected_url) - self.failUnless("This file is restricted" in open_url.read()) + ## FIXME: we have introduced redirections to login page + ## so test to this must be added. + self.assertRaises(HTTPError, urlopen, testrec_expected_url) + self.assertRaises(HTTPError, urlopen, testrec_expected_icon) - open_icon = urlopen(testrec_expected_icon) - restricted_icon = urlopen("%s/img/restricted.gif" % CFG_SITE_URL) - self.failUnless(open_icon.read() == restricted_icon.read()) bibupload.wipe_out_record_from_all_tables(recid) def test_simple_fft_insert_with_icon(self): """bibupload - simple FFT insert with icon""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University http://cds.cern.ch/img/cds.gif http://cds.cern.ch/img/cds.gif """ testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/cds.gif - %(siteurl)s/record/123456789/files/icon-cds.gif + %(siteurl)s/record/123456789/files/cds.gif?subformat=icon icon """ % {'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/cds.gif - 8564_ $$q%(siteurl)s/record/123456789/files/icon-cds.gif$$xicon + 8564_ $$u%(siteurl)s/record/123456789/files/cds.gif?subformat=icon$$xicon """ % {'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/cds.gif" \ % {'siteurl': CFG_SITE_URL} - testrec_expected_icon = "%(siteurl)s/record/123456789/files/icon-cds.gif" \ + testrec_expected_icon = "%(siteurl)s/record/123456789/files/cds.gif?subformat=icon" \ % {'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) testrec_expected_icon = testrec_expected_icon.replace('123456789', str(recid)) # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self.failUnless(try_url_download(testrec_expected_url)) self.failUnless(try_url_download(testrec_expected_icon)) bibupload.wipe_out_record_from_all_tables(recid) def test_multiple_fft_insert(self): """bibupload - multiple FFT insert""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University http://cds.cern.ch/img/cds.gif http://cdsweb.cern.ch/img/head.gif http://doc.cern.ch/archive/electronic/hep-th/0101/0101001.pdf %(prefix)s/var/tmp/demobibdata.xml """ % { 'prefix': CFG_PREFIX } testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/0101001.pdf %(siteurl)s/record/123456789/files/cds.gif %(siteurl)s/record/123456789/files/demobibdata.xml %(siteurl)s/record/123456789/files/head.gif """ % { 'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/0101001.pdf 8564_ $$u%(siteurl)s/record/123456789/files/cds.gif 8564_ $$u%(siteurl)s/record/123456789/files/demobibdata.xml 8564_ $$u%(siteurl)s/record/123456789/files/head.gif """ % { 'siteurl': CFG_SITE_URL} # insert test record: testrec_expected_urls = [] for files in ('cds.gif', 'head.gif', '0101001.pdf', 'demobibdata.xml'): testrec_expected_urls.append('%(siteurl)s/record/123456789/files/%(files)s' % {'siteurl' : CFG_SITE_URL, 'files' : files}) task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_urls = [] for files in ('cds.gif', 'head.gif', '0101001.pdf', 'demobibdata.xml'): testrec_expected_urls.append('%(siteurl)s/record/%(recid)s/files/%(files)s' % {'siteurl' : CFG_SITE_URL, 'files' : files, 'recid' : recid}) # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') # FIXME: Next test has been commented out since, appearently, the # returned xml can have non predictable row order (but still correct) # Using only html marc output is fine because a value is represented # by a single row, so a row to row comparison can be employed. self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') for url in testrec_expected_urls: self.failUnless(try_url_download(url)) self._test_bibdoc_status(recid, 'head', '') self._test_bibdoc_status(recid, '0101001', '') self._test_bibdoc_status(recid, 'cds', '') self._test_bibdoc_status(recid, 'demobibdata', '') bibupload.wipe_out_record_from_all_tables(recid) def test_simple_fft_correct(self): """bibupload - simple FFT correct""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University http://cds.cern.ch/img/cds.gif """ test_to_correct = """ 123456789 http://cds.cern.ch/img/cds.gif """ testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/cds.gif """ % { 'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/cds.gif """ % { 'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/cds.gif" \ % {'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) test_to_correct = test_to_correct.replace('123456789', str(recid)) # correct test record with new FFT: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_correct) bibupload.bibupload(recs[0], opt_mode='correct') # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.failUnless(try_url_download(testrec_expected_url)) self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self._test_bibdoc_status(recid, 'cds', '') #print "\nRecid: " + str(recid) + "\n" #print testrec_expected_hm + "\n" #print print_record(recid, 'hm') + "\n" bibupload.wipe_out_record_from_all_tables(recid) + def test_fft_implicit_fix_marc(self): + """bibupload - FFT implicit FIX-MARC""" + test_to_upload = """ + + SzGeCERN + + Test, John + Test University + + + foo@bar.com + + + http://cds.cern.ch/img/cds.gif + + + """ + test_to_correct = """ + + 123456789 + + foo@bar.com + + + http://cds.cern.ch/img/cds.gif + + + %(siteurl)s/record/123456789/files/cds.gif + + + """ % { 'siteurl': CFG_SITE_URL} + testrec_expected_xm = """ + + 123456789 + SzGeCERN + + Test, John + Test University + + + foo@bar.com + + + http://cds.cern.ch/img/cds.gif + + + """ + testrec_expected_hm = """ + 001__ 123456789 + 003__ SzGeCERN + 100__ $$aTest, John$$uTest University + 8560_ $$ffoo@bar.com + 8564_ $$uhttp://cds.cern.ch/img/cds.gif + """ + task_set_task_param('verbose', 0) + recs = bibupload.xml_marc_to_records(test_to_upload) + err, recid = bibupload.bibupload(recs[0], opt_mode='insert') + # replace test buffers with real recid of inserted test record: + test_to_correct = test_to_correct.replace('123456789', + str(recid)) + testrec_expected_xm = testrec_expected_xm.replace('123456789', + str(recid)) + testrec_expected_hm = testrec_expected_hm.replace('123456789', + str(recid)) + # correct test record with implicit FIX-MARC: + task_set_task_param('verbose', 0) + recs = bibupload.xml_marc_to_records(test_to_correct) + bibupload.bibupload(recs[0], opt_mode='correct') + # compare expected results: + inserted_xm = print_record(recid, 'xm') + inserted_hm = print_record(recid, 'hm') + self.assertEqual(compare_xmbuffers(inserted_xm, + testrec_expected_xm), '') + self.assertEqual(compare_hmbuffers(inserted_hm, + testrec_expected_hm), '') + bibupload.wipe_out_record_from_all_tables(recid) + def test_fft_vs_bibedit(self): """bibupload - FFT Vs. BibEdit compatibility""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University http://cds.cern.ch/img/cds.gif """ test_to_replace = """ 123456789 SzGeCERN Test, John Test University http://www.google.com/ BibEdit Comment %(siteurl)s/record/123456789/files/cds.gif BibEdit Description 01 http://cern.ch/ """ % { 'siteurl': CFG_SITE_URL} testrec_expected_xm = str(test_to_replace) testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$uhttp://www.google.com/ 8564_ $$u%(siteurl)s/record/123456789/files/cds.gif$$x01$$yBibEdit Description$$zBibEdit Comment 8564_ $$uhttp://cern.ch/ """ % { 'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/cds.gif" \ % {'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) test_to_replace = test_to_replace.replace('123456789', str(recid)) # correct test record with new FFT: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_replace) bibupload.bibupload(recs[0], opt_mode='replace') # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.failUnless(try_url_download(testrec_expected_url)) self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self._test_bibdoc_status(recid, 'cds', '') bibrecdocs = BibRecDocs(recid) bibdoc = bibrecdocs.get_bibdoc('cds') self.assertEqual(bibdoc.get_description('.gif'), 'BibEdit Description') bibupload.wipe_out_record_from_all_tables(recid) def test_detailed_fft_correct(self): """bibupload - detailed FFT correct""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University http://cds.cern.ch/img/cds.gif Try Comment """ test_to_correct = """ 123456789 http://cdsweb.cern.ch/img/head.gif cds patata Next Try KEEP-OLD-VALUE """ testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/patata.gif Next Try Comment """ % { 'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/patata.gif$$yNext Try$$zComment """ % { 'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/patata.gif" \ % {'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) test_to_correct = test_to_correct.replace('123456789', str(recid)) # correct test record with new FFT: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_correct) bibupload.bibupload(recs[0], opt_mode='correct') # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.failUnless(try_url_download(testrec_expected_url)) self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self._test_bibdoc_status(recid, 'patata', '') #print "\nRecid: " + str(recid) + "\n" #print testrec_expected_hm + "\n" #print print_record(recid, 'hm') + "\n" bibupload.wipe_out_record_from_all_tables(recid) def test_no_url_fft_correct(self): """bibupload - no_url FFT correct""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University http://cds.cern.ch/img/cds.gif Try Comment """ test_to_correct = """ 123456789 cds patata .gif KEEP-OLD-VALUE Next Comment """ testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/patata.gif Try Next Comment """ % { 'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/patata.gif$$yTry$$zNext Comment """ % { 'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/patata.gif" \ % {'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) test_to_correct = test_to_correct.replace('123456789', str(recid)) # correct test record with new FFT: recs = bibupload.xml_marc_to_records(test_to_correct) bibupload.bibupload(recs[0], opt_mode='correct') # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.failUnless(try_url_download(testrec_expected_url)) self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self._test_bibdoc_status(recid, 'patata', '') #print "\nRecid: " + str(recid) + "\n" #print testrec_expected_hm + "\n" #print print_record(recid, 'hm') + "\n" bibupload.wipe_out_record_from_all_tables(recid) def test_new_icon_fft_append(self): """bibupload - new icon FFT append""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University """ test_to_correct = """ 123456789 cds http://cds.cern.ch/img/cds.gif """ testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University - %(siteurl)s/record/123456789/files/icon-cds.gif + %(siteurl)s/record/123456789/files/cds.gif?subformat=icon icon """ % { 'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University - 8564_ $$q%(siteurl)s/record/123456789/files/icon-cds.gif$$xicon + 8564_ $$u%(siteurl)s/record/123456789/files/cds.gif?subformat=icon$$xicon """ % { 'siteurl': CFG_SITE_URL} - testrec_expected_url = "%(siteurl)s/record/123456789/files/icon-cds.gif" \ + testrec_expected_url = "%(siteurl)s/record/123456789/files/cds.gif?subformat=icon" \ % {'siteurl': CFG_SITE_URL} # insert test record: - task_set_task_param('verbose', 0) + task_set_task_param('verbose', 9) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) test_to_correct = test_to_correct.replace('123456789', str(recid)) # correct test record with new FFT: - task_set_task_param('verbose', 0) + task_set_task_param('verbose', 9) recs = bibupload.xml_marc_to_records(test_to_correct) bibupload.bibupload(recs[0], opt_mode='append') # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.failUnless(try_url_download(testrec_expected_url)) self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self._test_bibdoc_status(recid, 'cds', '') #print "\nRecid: " + str(recid) + "\n" #print testrec_expected_hm + "\n" #print print_record(recid, 'hm') + "\n" bibupload.wipe_out_record_from_all_tables(recid) def test_multiple_fft_correct(self): """bibupload - multiple FFT correct""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University http://cds.cern.ch/img/cds.gif Try Comment Restricted http://cds.cern.ch/img/cds.gif .jpeg Try jpeg Comment jpeg Restricted """ test_to_correct = """ 123456789 http://cds.cern.ch/img/cds.gif patata .gif New restricted """ testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/patata.gif """ % { 'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/patata.gif """ % { 'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/patata.gif" \ % {'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) test_to_correct = test_to_correct.replace('123456789', str(recid)) # correct test record with new FFT: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_correct) bibupload.bibupload(recs[0], opt_mode='correct') # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') - self.failUnless(try_url_download(testrec_expected_url)) + ## FIXME: we have introduced redirection to login page. + ## so proper test should be done. + self.assertRaises(StandardError, try_url_download, testrec_expected_url) self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self._test_bibdoc_status(recid, 'patata', 'New restricted') #print "\nRecid: " + str(recid) + "\n" #print testrec_expected_hm + "\n" #print print_record(recid, 'hm') + "\n" bibupload.wipe_out_record_from_all_tables(recid) def test_purge_fft_correct(self): """bibupload - purge FFT correct""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University http://cds.cern.ch/img/cds.gif http://cdsweb.cern.ch/img/head.gif """ test_to_correct = """ 123456789 http://cds.cern.ch/img/cds.gif """ test_to_purge = """ 123456789 http://cds.cern.ch/img/cds.gif PURGE """ testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/cds.gif %(siteurl)s/record/123456789/files/head.gif """ % { 'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/cds.gif 8564_ $$u%(siteurl)s/record/123456789/files/head.gif """ % { 'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/cds.gif" % { 'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) test_to_correct = test_to_correct.replace('123456789', str(recid)) test_to_purge = test_to_purge.replace('123456789', str(recid)) # correct test record with new FFT: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_correct) bibupload.bibupload(recs[0], opt_mode='correct') # purge test record with new FFT: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_purge) bibupload.bibupload(recs[0], opt_mode='correct') # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.failUnless(try_url_download(testrec_expected_url)) self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self._test_bibdoc_status(recid, 'cds', '') self._test_bibdoc_status(recid, 'head', '') #print "\nRecid: " + str(recid) + "\n" #print testrec_expected_hm + "\n" #print print_record(recid, 'hm') + "\n" bibupload.wipe_out_record_from_all_tables(recid) def test_revert_fft_correct(self): """bibupload - revert FFT correct""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University %s/img/iconpen.gif cds """ % CFG_SITE_URL test_to_correct = """ 123456789 %s/img/head.gif cds """ % CFG_SITE_URL test_to_revert = """ 123456789 cds REVERT 1 """ testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/cds.gif """ % { 'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/cds.gif """ % { 'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/cds.gif" % { 'siteurl': CFG_SITE_URL} # insert test record: - task_set_task_param('verbose', 0) + task_set_task_param('verbose', 9) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) test_to_correct = test_to_correct.replace('123456789', str(recid)) test_to_revert = test_to_revert.replace('123456789', str(recid)) # correct test record with new FFT: - task_set_task_param('verbose', 0) + task_set_task_param('verbose', 9) recs = bibupload.xml_marc_to_records(test_to_correct) bibupload.bibupload(recs[0], opt_mode='correct') # revert test record with new FFT: - task_set_task_param('verbose', 0) + task_set_task_param('verbose', 9) recs = bibupload.xml_marc_to_records(test_to_revert) bibupload.bibupload(recs[0], opt_mode='correct') # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.failUnless(try_url_download(testrec_expected_url)) self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') self._test_bibdoc_status(recid, 'cds', '') expected_content_version1 = urlopen('%s/img/iconpen.gif' % CFG_SITE_URL).read() expected_content_version2 = urlopen('%s/img/head.gif' % CFG_SITE_URL).read() expected_content_version3 = expected_content_version1 content_version1 = urlopen('%s/record/%s/files/cds.gif?version=1' % (CFG_SITE_URL, recid)).read() content_version2 = urlopen('%s/record/%s/files/cds.gif?version=2' % (CFG_SITE_URL, recid)).read() content_version3 = urlopen('%s/record/%s/files/cds.gif?version=3' % (CFG_SITE_URL, recid)).read() self.assertEqual(expected_content_version1, content_version1) self.assertEqual(expected_content_version2, content_version2) self.assertEqual(expected_content_version3, content_version3) #print "\nRecid: " + str(recid) + "\n" #print testrec_expected_hm + "\n" #print print_record(recid, 'hm') + "\n" bibupload.wipe_out_record_from_all_tables(recid) def test_simple_fft_replace(self): """bibupload - simple FFT replace""" # define the test case: test_to_upload = """ SzGeCERN Test, John Test University %s/img/iconpen.gif cds """ % CFG_SITE_URL test_to_replace = """ 123456789 SzGeCERN Test, John Test University %s/img/head.gif """ % CFG_SITE_URL testrec_expected_xm = """ 123456789 SzGeCERN Test, John Test University %(siteurl)s/record/123456789/files/head.gif """ % { 'siteurl': CFG_SITE_URL} testrec_expected_hm = """ 001__ 123456789 003__ SzGeCERN 100__ $$aTest, John$$uTest University 8564_ $$u%(siteurl)s/record/123456789/files/head.gif """ % { 'siteurl': CFG_SITE_URL} testrec_expected_url = "%(siteurl)s/record/123456789/files/head.gif" % { 'siteurl': CFG_SITE_URL} # insert test record: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_upload) err, recid = bibupload.bibupload(recs[0], opt_mode='insert') # replace test buffers with real recid of inserted test record: testrec_expected_xm = testrec_expected_xm.replace('123456789', str(recid)) testrec_expected_hm = testrec_expected_hm.replace('123456789', str(recid)) testrec_expected_url = testrec_expected_url.replace('123456789', str(recid)) test_to_replace = test_to_replace.replace('123456789', str(recid)) # replace test record with new FFT: task_set_task_param('verbose', 0) recs = bibupload.xml_marc_to_records(test_to_replace) bibupload.bibupload(recs[0], opt_mode='replace') # compare expected results: inserted_xm = print_record(recid, 'xm') inserted_hm = print_record(recid, 'hm') self.failUnless(try_url_download(testrec_expected_url)) self.assertEqual(compare_xmbuffers(inserted_xm, testrec_expected_xm), '') self.assertEqual(compare_hmbuffers(inserted_hm, testrec_expected_hm), '') expected_content_version = urlopen('%s/img/head.gif' % CFG_SITE_URL).read() content_version = urlopen('%s/record/%s/files/head.gif' % (CFG_SITE_URL, recid)).read() self.assertEqual(expected_content_version, content_version) #print "\nRecid: " + str(recid) + "\n" #print testrec_expected_hm + "\n" #print print_record(recid, 'hm') + "\n" bibupload.wipe_out_record_from_all_tables(recid) TEST_SUITE = make_test_suite(BibUploadInsertModeTest, BibUploadAppendModeTest, BibUploadCorrectModeTest, BibUploadDeleteModeTest, BibUploadReplaceModeTest, BibUploadReferencesModeTest, BibUploadRecordsWithSYSNOTest, BibUploadRecordsWithEXTOAIIDTest, BibUploadRecordsWithOAIIDTest, BibUploadFMTModeTest, BibUploadIndicatorsTest, BibUploadUpperLowerCaseTest, BibUploadControlledProvenanceTest, BibUploadStrongTagsTest, BibUploadFFTModeTest, ) if __name__ == "__main__": run_test_suite(TEST_SUITE, warn_user=True) diff --git a/modules/miscutil/demo/demobibdata.xml b/modules/miscutil/demo/demobibdata.xml index 72903639e..ca552205c 100644 --- a/modules/miscutil/demo/demobibdata.xml +++ b/modules/miscutil/demo/demobibdata.xml @@ -1,22530 +1,22838 @@ CERN-EX-0106015 Photolab ALEPH experiment: Candidate of Higgs boson production Expérience ALEPH: Candidat de la production d'un boson Higgs 14 06 2000 FILM Candidate for the associated production of the Higgs boson and Z boson. Both, the Higgs and Z boson decay into 2 jets each. The green and the yellow jets belong to the Higgs boson. They represent the fragmentation of a bottom andanti-bottom quark. The red and the blue jets stem from the decay of the Z boson into a quark anti-quark pair. Left: View of the event along the beam axis. Bottom right: Zoom around the interaction point at the centre showing detailsof the fragmentation of the bottom and anti-bottom quarks. As expected for b quarks, in each jet the decay of a long-lived B meson is visible. Top right: "World map" showing the spatial distribution of the jets in the event. Press SzGeCERN Experiments and Tracks LEP neil.calder@cern.ch http://cdsware.cern.ch/download/invenio-demo-site-files/0106015_01.jpg - http://cdsware.cern.ch/download/invenio-demo-site-files/0106015_01.gif - restricted_picture + + + http://cdsware.cern.ch/download/invenio-demo-site-files/0106015_01.gif + .gif;icon + restricted_picture 0003717PHOPHO 2000 81 2001-06-14 50 2001-08-27 CM Bldg. 2 Calder, N n 200231 PICTURE CERN-EX-0104007 Patrice Loïez The first CERN-built module of the barrel section of ATLAS's electromagnetic calorimeter Premier module du tonneau du calorimètre electromagnétique d'ATLAS 10 Apr 2001 DIGITAL Behind the module, left to right Ralf Huber, Andreas Bies and Jorgen Beck Hansen. In front of the module, left to right: Philippe Lançon and Edward Wood. Derrière le module, de gauche à droite: Ralf Huber, Andreas Bies, Jorgen Beck Hansen. Devant le module, de gauche à droite : Philippe Lançon et Edward Wood. CERN EDS SzGeCERN Experiments and Tracks marie.noelle.pages.ribeiro@cern.ch http://cdsware.cern.ch/download/invenio-demo-site-files/0104007_02.jpeg http://cdsware.cern.ch/download/invenio-demo-site-files/0104007_02.gif 0003601PHOPHO 2001 81 2001-04-23 50 2001-06-18 CM 0020699 ADMBUL CERN Bulletin 18/2001 : 30 April 2001 (English) 0020700 ADMBUL CERN Bulletin 18/2001 : 30 avril 2001 (French) Bldg. 184 Fassnacht, P n 200231 PICTURE CERN-HI-6902127 European Molecular Biology Conference Jul 1969 In February, the Agreement establishing the European Molecular Biology Conference was signed at CERN. Willy Spuhler is signing for Switzerland. SzGeCERN Personalities and History of CERN http://cdsware.cern.ch/download/invenio-demo-site-files/6902127.jpeg http://cdsware.cern.ch/download/invenio-demo-site-files/6902127.gif 0002690PHOPHO 1969 81 2000-06-13 50 2000-06-13 CM 127-2-69 n 200024 PICTURE CERN-DI-9906028 J.L. Caron The Twenty Member States of CERN (with dates of accession) on 1 June 1999 Jun 1999 CERN Member States. Les Etats membres du CERN. Press SzGeCERN Diagrams and Charts http://cdsware.cern.ch/download/invenio-demo-site-files/9906028_01.jpeg http://cdsware.cern.ch/download/invenio-demo-site-files/9906028_01.gif 0001754PHOPHO 1999 81 1999-06-17 50 2000-10-30 CM n 199924 PICTURE CERN-DI-9905005 High energy cosmic rays striking atoms at the top of the atmosphere give the rise to showers of particles striking the Earth's surface Des rayons cosmiques de haute energie heurtent des atomes dans la haute atmosphere et donnent ainsi naissance a des gerbes de particules projetees sur la surface terrestre 10 May 1999 DIGITAL Press SzGeCERN Diagrams and Charts neil.calder@cern.ch http://cdsware.cern.ch/download/invenio-demo-site-files/9905005_01.jpeg http://cdsware.cern.ch/download/invenio-demo-site-files/9905005_01.gif 0001626PHOPHO 1999 81 1999-05-10 50 2000-09-12 CM Bldg. 60 Calder, N n 200231 PICTURE CERN-HI-6206002 eng At CERN in 1962 eight Nobel prizewinners 1962 In 1962, CERN hosted the 11th International Conference on High Energy Physics. Among the distinguished visitors were eight Nobel prizewinners.Left to right: Cecil F. Powell, Isidor I. Rabi, Werner Heisenberg, Edwin M. McMillan, Emile Segre, Tsung Dao Lee, Chen Ning Yang and Robert Hofstadter. En 1962, le CERN est l'hote de la onzieme Conference Internationale de Physique des Hautes Energies. Parmi les visiteurs eminents se trouvaient huit laureats du prix Nobel.De gauche a droite: Cecil F. Powell, Isidor I. Rabi, Werner Heisenberg, Edwin M. McMillan, Emile Segre, Tsung Dao Lee, Chen Ning Yang et Robert Hofstadter. Press SzGeCERN Personalities and History of CERN Nobel laureate http://cdsware.cern.ch/download/invenio-demo-site-files/6206002.jpg http://cdsware.cern.ch/download/invenio-demo-site-files/6206002.gif 0000736PHOPHO 1962 81 1998-07-23 50 2002-07-15 CM http://www.nobel.se/physics/laureates/1950/index.html The Nobel Prize in Physics 1950 : Cecil Frank Powell http://www.nobel.se/physics/laureates/1944/index.html The Nobel Prize in Physics 1944 : Isidor Isaac Rabi http://www.nobel.se/physics/laureates/1932/index.html The Nobel Prize in Physics 1932 : Werner Karl Heisenberg http://www.nobel.se/chemistry/laureates/1951/index.html The Nobel Prize in Chemistry 1951 : Edwin Mattison McMillan http://www.nobel.se/physics/laureates/1959/index.html The Nobel Prize in Physics 1959 : Emilio Gino Segre http://www.nobel.se/physics/laureates/1957/index.html The Nobel Prize in Physics 1957 : Chen Ning Yang and Tsung-Dao Lee http://www.nobel.se/physics/laureates/1961/index.html The Nobel Prize in Physics 1961 : Robert Hofstadter 6206002 (1962) n 199830 PICTURE CERN-GE-9806033 Tim Berners-Lee World-Wide Web inventor 28 Jun 1998 Conference "Internet, Web, What's next?" on 26 June 1998 at CERN : Tim Berners-Lee, inventor of the World-Wide Web and Director of the W3C, explains how the Web came to be and give his views on the future. Conference "Internet, Web, What's next?" le 26 juin 1998 au CERN: Tim Berners-Lee, inventeur du World-Wide Web et directeur du W3C, explique comment le Web est ne, et donne ses opinions sur l'avenir. Press SzGeCERN Life at CERN neil.calder@cern.ch http://cdsware.cern.ch/download/invenio-demo-site-files/9806033.jpeg http://cdsware.cern.ch/download/invenio-demo-site-files/9806033.gif 0000655PHOPHO 1998 81 1998-07-03 50 2001-07-10 CM http://www.cern.ch/CERN/Announcements/1998/WebNext.html "Internet, Web, What's next?" 26 June 1998 http://Bulletin.cern.ch/9828/art2/Text_E.html CERN Bulletin no 28/98 (6 July 1998) (English) http://Bulletin.cern.ch/9828/art2/Text_F.html CERN Bulletin no 28/98 (6 juillet 1998) (French) http://www.w3.org/People/Berners-Lee/ Biography 0000990 PRSPRS Le Pays Gessien : 3 Jul 1998 0001037 PRSPRS Le Temps : 27 Jun 1998 0000809 PRSPRS La Tribune de Geneve : 27 Jun 1998 Bldg. 60 Calder, N n 199827 PICTURE astro-ph/9812226 eng Efstathiou, G P Cambridge University Constraints on $\Omega_{\Lambda}$ and $\Omega_{m}$from Distant Type 1a Supernovae and Cosmic Microwave Background Anisotropies 14 Dec 1998 6 p We perform a combined likelihood analysis of the latest cosmic microwave background anisotropy data and distant Type 1a Supernova data of Perlmutter etal (1998a). Our analysis is restricted tocosmological models where structure forms from adiabatic initial fluctuations characterised by a power-law spectrum with negligible tensor component. Marginalizing over other parameters, our bestfit solution gives Omega_m = 0.25 (+0.18, -0.12) and Omega_Lambda = 0.63 (+0.17, -0.23) (95 % confidence errors) for the cosmic densities contributed by matter and a cosmological constantrespectively. The results therefore strongly favour a nearly spatially flat Universe with a non-zero cosmological constant. LANL EDS SzGeCERN Astrophysics and Astronomy Lasenby, A N Hobson, M P Ellis, R S Bridle, S L George Efstathiou <gpe@ast.cam.ac.uk> http://cdsware.cern.ch/download/invenio-demo-site-files/9812226.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/9812226.fig1.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/9812226.fig3.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/9812226.fig5.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/9812226.fig6.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/9812226.fig7.ps.gz Additional 1998 11 1998-12-14 50 2001-04-07 BATCH Mon. Not. R. Astron. Soc. SLAC 4162242 CER n 200231 PREPRINT Bond, J.R. 1996, Theory and Observations of the Cosmic Background Radiation, in "Cosmology and Large Scale Structure", Les Houches Session LX, August 1993, eds. R. Schaeffer, J. Silk, M. Spiro and J. Zinn-Justin, Elsevier SciencePress, Amsterdam, p469 Bond J.R., Efstathiou G., Tegmark M., 1997 L33 Mon. Not. R. Astron. Soc. 291 1997 Mon. Not. R. Astron. Soc. 291 (1997) L33 Bond, J.R., Jaffe, A. 1997, in Proc. XXXI Rencontre de Moriond, ed. F. Bouchet, Edition Fronti eres, in press astro-ph/9610091 Bond J.R., Jaffe A.H. and Knox L.E., 1998 astro-ph/9808264 Astrophys.J. 533 (2000) 19 Burles S., Tytler D., 1998a, to appear in the Proceedings of the Second Oak Ridge Symposium on Atomic & Nuclear Astrophysics, ed. A. Mezzacappa, Institute of Physics, Bristol astro-ph/9803071 Burles S., Tytler D., 1998b, Astrophys. J.in press astro-ph/9712109 Astrophys.J. 507 (1998) 732 Caldwell, R.R., Dave, R., Steinhardt P.J., 1998 1582 Phys. Rev. Lett. 80 1998 Phys. Rev. Lett. 80 (1998) 1582 Carroll S.M., Press W.H., Turner E.L., 1992, Ann. Rev. Astr. Astrophys., 30, 499. Chaboyer B., 1998 astro-ph/9808200 Phys.Rept. 307 (1998) 23 Devlin M.J., De Oliveira-Costa A., Herbig T., Miller A.D., Netterfield C.B., Page L., Tegmark M., 1998, submitted to Astrophys. J astro-ph/9808043 Astrophys. J. 509 (1998) L69-72 Efstathiou G. 1996, Observations of Large-Scale Structure in the Universe, in "Cosmology and Large Scale Structure", Les Houches Session LX, August 1993, eds. R. Schaeffer, J. Silk, M. Spiro and J. Zinn-Justin, Elsevier SciencePress, Amsterdam, p135. Efstathiou G., Bond J.R., Mon. Not. R. Astron. Soc.in press astro-ph/9807130 Astrophys. J. 518 (1999) 2-23 Evrard G., 1998, submitted to Mon. Not. R. Astron. Soc astro-ph/9701148 Mon.Not.Roy.Astron.Soc. 292 (1997) 289 Freedman J.B., Mould J.R., Kennicutt R.C., Madore B.F., 1998 astro-ph/9801090 Astrophys. J. 480 (1997) 705 Garnavich P.M. et al. 1998 astro-ph/9806396 Astrophys.J. 509 (1998) 74-79 Goobar A., Perlmutter S., 1995 14 Astrophys. J. 450 1995 Astrophys. J. 450 (1995) 14 Hamuy M., Phillips M.M., Maza J., Suntzeff N.B., Schommer R.A., Aviles R. 1996 2391 Astrophys. J. 112 1996 Astrophys. J. 112 (1996) 2391 Hancock S., Gutierrez C.M., Davies R.D., Lasenby A.N., Rocha G., Rebolo R., Watson R.A., Tegmark M., 1997 505 Mon. Not. R. Astron. Soc. 298 1997 Mon. Not. R. Astron. Soc. 298 (1997) 505 Hancock S., Rocha G., Lasenby A.N., Gutierrez C.M., 1998 L1 Mon. Not. R. Astron. Soc. 294 1998 Mon. Not. R. Astron. Soc. 294 (1998) L1 Herbig T., De Oliveira-Costa A., Devlin M.J., Miller A.D., Page L., Tegmark M., 1998, submitted to Astrophys. J astro-ph/9808044 Astrophys.J. 509 (1998) L73-76 Lineweaver C.H., 1998. Astrophys. J.505, L69. Lineweaver, C.H., Barbosa D., 1998a 624 Astrophys. J. 446 1998 Astrophys. J. 446 (1998) 624 Lineweaver, C.H., Barbosa D., 1998b 799 Astron. Astrophys. 329 1998 Astron. Astrophys. 329 (1998) 799 De Oliveira-Costa A., Devlin M.J., Herbig T., Miller A.D., Netterfield C.B. Page L., Tegmark M., 1998, submitted to Astrophys. J astro-ph/9808045 Astrophys. J. 509 (1998) L77-80 Ostriker J.P., Steinhardt P.J., 1995 600 Nature 377 1995 Nature 377 (1995) 600 Peebles P.J.E., 1993, Principles of Physical Cosmology, Princeton University Press, Princeton, New Jersey. Perlmutter S, et al., 1995, In Presentations at the NATO ASI in Aiguablava, Spain, LBL-38400; also published in Thermonuclear Supernova, P. Ruiz-Lapuente, R. Cana and J. Isern (eds), Dordrecht, Kluwer, 1997, p749. Perlmutter S, et al., 1997 565 Astrophys. J. 483 1997 Astrophys. J. 483 (1997) 565 Perlmutter S. et al., 1998a, Astrophys. J.in press. (P98) astro-ph/9812133 Astrophys. J. 517 (1999) 565-586 Perlmutter S. et al., 1998b, In Presentation at the January 1988 Meeting of the American Astronomical Society, Washington D.C., LBL-42230, available at www-supernova.lbl.gov; B.A.A.S., volume : 29 (1997) 1351Perlmutter S, et al., 1998c 51 Nature 391 1998 Nature 391 (1998) 51 Ratra B., Peebles P.J.E., 1988 3406 Phys. Rev., D 37 1988 Phys. Rev. D 37 (1988) 3406 Riess A. et al. 1998, Astrophys. J.in press astro-ph/9805201 Astron. J. 116 (1998) 1009-1038 Seljak U., Zaldarriaga M. 1996 437 Astrophys. J. 469 1996 Astrophys. J. 469 (1996) 437 Seljak U. & Zaldarriaga M., 1998 astro-ph/9811123 Phys. Rev. D60 (1999) 043504 Tegmark M., 1997 3806 Phys. Rev. Lett. 79 1997 Phys. Rev. Lett. 79 (1997) 3806 Tegmark M. 1998, submitted to Astrophys. J astro-ph/9809201 Astrophys. J. 514 (1999) L69-L72 Tegmark, M., Eisenstein D.J., Hu W., Kron R.G., 1998 astro-ph/9805117 Wambsganss J., Cen R., Ostriker J.P., 1998 29 Astrophys. J. 494 1998 Astrophys. J. 494 (1998) 29 Webster M., Bridle S.L., Hobson M.P., Lasenby A.N., Lahav O., Rocha, G., 1998, Astrophys. J.in press astro-ph/9802109 White M., 1998, Astrophys. J.in press astro-ph/9802295 Astrophys. J. 506 (1998) 495 Zaldarriaga, M., Spergel D.N., Seljak U., 1997 1 Astrophys. J. 488 1997 Astrophys. J. 488 (1997) 1 eng PRE-25553 RL-82-024 Ellis, J University of Oxford Grand unification with large supersymmetry breaking Mar 1982 18 p SzGeCERN General Theoretical Physics Ibanez, L E Ross, G G 1982 11 Oxford Univ. Univ. Auton. Madrid Rutherford Lab. 1990-01-28 50 2002-01-04 BATCH h 1982n PREPRINT hep-ex/0201013 eng CERN-EP-2001-094 Heister, A Aachen, Tech. Hochsch. Search for R-Parity Violating Production of Single Sneutrinos in $e^{+}e^{-}$ Collisions at $\sqrt{s}$ = 189-209 GeV Geneva CERN 17 Dec 2001 22 p ALEPH Papers A search for single sneutrino production under the assumption that $R$-parity is violated via a single dominant $LL\bar{E}$ coupling is presented. This search considers the process ${\rm e} \gamma\;{\smash{\mathop{\rightarrow}}}\;\tilde{\nu}\ell$ and is performed using the data collected by the ALEPH detector at centre-of-mass energies from 189\,GeV up to 209\,GeV corresponding to an integrated luminosity of637.1\,$\mathrm{pb}^{-1}$. The numbers of observed candidate events are in agreement with Standard Model expectations and 95\% confidence level upper limits on five of the $LL\bar{E}$ couplings are given as a function of the assumedsneutrino mass. CERN EDS 20011220SLAC giva LANL EDS SzGeCERN Particle Physics - Experimental Results Schael, S Barate, R Bruneliere, R De Bonis, I Decamp, D Goy, C Jezequel, S Lees, J P Martin, F Merle, E Minard, M N Pietrzyk, B Trocme, B Boix, G Bravo, S Casado, M P Chmeissani, M Crespo, J M Fernandez, E Fernandez-Bosman, M Garrido, L Grauges, E Lopez, J Martinez, M Merino, G Miquel, R Mir, L M Pacheco, A Paneque, D Ruiz, H Colaleo, A Creanza, D De Filippis, N De Palma, M Iaselli, G Maggi, G Maggi, M Nuzzo, S Ranieri, A Raso, G Ruggieri, F Selvaggi, G Silvestris, L Tempesta, P Tricomi, A Zito, G Huang, X Lin, J Ouyang, Q Wang, T Xie, Y Xu, R Xue, S Zhang, J Zhang, L Zhao, W Abbaneo, D Azzurri, P Barklow, T Buchmuller, O Cattaneo, M Cerutti, F Clerbaux, B Drevermann, H Forty, R W Frank, M Gianotti, F Greening, T C Hansen, J B Harvey, J Hutchcroft, D E Janot, P Jost, B Kado, M Maley, P Mato, P Moutoussi, A Ranjard, F Rolandi, L Schlatter, D Sguazzoni, G Tejessy, W Teubert, F Valassi, A Videau, I Ward, J J Badaud, F Dessagne, S Falvard, A Fayolle, D Gay, P Jousset, J Michel, B Monteil, S Pallin, D Pascolo, J M Perret, P Hansen, J D Hansen, J R Hansen, P H Nilsson, B S Waananen, A Kyriakis, A Markou, C Simopoulou, E Vayaki, A Zachariadou, K Blondel, A Brient, J C Machefert, F P Rouge, A Swynghedauw, M Tanaka, R Videau, H L Ciulli, V Focardi, E Parrini, G Antonelli, A Antonelli, M Bencivenni, G Bologna, G Bossi, F Campana, P Capon, G Chiarella, V Laurelli, P Mannocchi, G Murtas, F Murtas, G P Passalacqua, L Pepe-Altarelli, M Spagnolo, P Kennedy, J Lynch, J G Negus, P O'Shea, V Smith, D Thompson, A S Wasserbaech, S R Cavanaugh, R Dhamotharan, S Geweniger, C Hanke, P Hepp, V Kluge, E E Leibenguth, G Putzer, A Stenzel, H Tittel, K Werner, S Wunsch, M Beuselinck, R Binnie, D M Cameron, W Davies, G Dornan, P J Girone, M Hill, R D Marinelli, N Nowell, J Przysiezniak, H Rutherford, S A Sedgbeer, J K Thompson, J C White, R Ghete, V M Girtler, P Kneringer, E Kuhn, D Rudolph, G Bouhova-Thacker, E Bowdery, C K Clarke, D P Ellis, G Finch, A J Foster, F Hughes, G Jones, R W L Pearson, M R Robertson, N A Smizanska, M Lemaître, V Blumenschein, U Holldorfer, F Jakobs, K Kayser, F Kleinknecht, K Muller, A S Quast, G Renk, B Sander, H G Schmeling, S Wachsmuth, H Zeitnitz, C Ziegler, T Bonissent, A Carr, J Coyle, P Curtil, C Ealet, A Fouchez, D Leroy, O Kachelhoffer, T Payre, P Rousseau, D Tilquin, A Ragusa, F David, A Dietl, H Ganis, G Huttmann, K Lutjens, G Mannert, C Manner, W Moser, H G Settles, R Wolf, G Boucrot, J Callot, O Davier, M Duflot, L Grivaz, J F Heusse, P Jacholkowska, A Loomis, C Serin, L Veillet, J J De Vivie de Regie, J B Yuan, C Bagliesi, G Boccali, T Foà, L Giammanco, A Giassi, A Ligabue, F Messineo, A Palla, F Sanguinetti, G Sciaba, A Tenchini, R Venturi, A Verdini, P G Awunor, O Blair, G A Coles, J Cowan, G García-Bellido, A Green, M G Jones, L T Medcalf, T Misiejuk, A Strong, J A Teixeira-Dias, P Clifft, R W Edgecock, T R Norton, P R Tomalin, I R Bloch-Devaux, B Boumediene, D Colas, P Fabbro, B Lancon, E Lemaire, M C Locci, E Perez, P Rander, J Renardy, J F Rosowsky, A Seager, P Trabelsi, A Tuchming, B Vallage, B Konstantinidis, N P Litke, A M Taylor, G Booth, C N Cartwright, S Combley, F Hodgson, P N Lehto, M H Thompson, L F Affholderbach, K Bohrer, A Brandt, S Grupen, C Hess, J Ngac, A Prange, G Sieler, U Borean, C Giannini, G He, H Putz, J Rothberg, J E Armstrong, S R Berkelman, K Cranmer, K Ferguson, D P S Gao, Y Gonzalez, S Hayes, O J Hu, H Jin, S Kile, J McNamara, P A Nielsen, J Pan, Y B Von Wimmersperg-Toller, J H Wiedenmann, W Wu, J Wu, S L Wu, X Zobernig, G Dissertori, G ALEPH Collaboration valerie.brunner@cern.ch http://cdsware.cern.ch/download/invenio-demo-site-files/ep-2001-094.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/ep-2001-094.ps.gz 2002 ALEPH 11 EP CERN LEP 2001-12-19 50 2002-02-19 BATCH CERN Eur. Phys. J., C SLAC 4823672 oai:cds.cern.ch:CERN-EP-2001-094 CER n 200231 PREPRINT [1] For reviews, see for example: H.P. Nilles 1 Phys. Rep. 110 1984 Phys. Rep. 110 (1984) 1 H.E. Haber and G. L. Kane 75 Phys. Rep. 117 1985 Phys. Rep. 117 (1985) 75 [2] G. Farrar and P. Fayet 575 Phys. Lett., B 76 1978 Phys. Lett. B 76 (1978) 575 [3] S. Weinberg 287 Phys. Rev., B 26 1982 Phys. Rev. B 26 (1982) 287 N. Sakai and T. Yanagida 83 Nucl. Phys., B 197 1982 Nucl. Phys. B 197 (1982) 83 S. Dimopoulos, S. Raby and F. Wilczek 133 Phys. Lett., B 212 1982 Phys. Lett. B 212 (1982) 133 [4] B.C. Allanach, H. Dreiner, P. Morawitz and M.D. Williams, "Single Sneutrino/Slepton Production at LEP2 and the NLC" 307 Phys. Lett., B 420 1998 Phys. Lett. B 420 (1998) 307 [5] ALEPH Collaboration, "Search for R-Parity Violating Decays of Supersymmetric Particles in e+e- Collisions at Centre-of-Mass Energies between s = 189­202 GeV" 415 Eur. Phys. J., C 19 2001 Eur. Phys. J. C 19 (2001) 415 [6] ALEPH Collaboration, "ALEPH: a detector for electron-positron annihilations at LEP", Nucl. Instrum. and Methods. A : 294 (1990) 121 [7] S. Cantini, Yu. L. Dokshitzer, M. Olsson, G. Turnock and B.R. Webber, `New clustering algorithm for multijet cross sections in e+e- annihilation" 432 Phys. Lett., B 269 1991 Phys. Lett. B 269 (1991) 432 [8] ALEPH Collaboration, "Performance of the ALEPH detector at LEP", Nucl. Instrum. and Methods. A : 360 (1995) 481 Nucl. Instrum. and Methods. 360 (1995) 481 [9] S. Katsanevas and P. Morawitz, "SUSYGEN 2.2 - A Monte Carlo Event Generator for MSSM Sparticle Production at e+e- Colliders" 227 Comput. Phys. Commun. 112 1998 Comput. Phys. Commun. 112 (1998) 227 [10] E. Barberio, B. van Eijk and Z. W¸as 115 Comput. Phys. Commun. 66 1991 Comput. Phys. Commun. 66 (1991) 115 [11] S. Jadach and Z. W¸as, R. Decker and J.H. Kühn, "The decay library TAUOLA" 361 Comput. Phys. Commun. 76 1993 Comput. Phys. Commun. 76 (1993) 361 [12] T. Sjöstrand, " High-Energy Physics Event Generation with PYTHIA 5.7 and JETSET 7.4" 74 Comput. Phys. Commun. 82 1994 Comput. Phys. Commun. 82 (1994) 74 [13] S. Jadach et al 276 Comput. Phys. Commun. 66 1991 Comput. Phys. Commun. 66 (1991) 276 11 [14] M. Skrzypek, S. Jadach, W. Placzek and Z. Was 216 Comput. Phys. Commun. 94 1996 Comput. Phys. Commun. 94 (1996) 216 [15] S. Jadach et al 298 Phys. Lett., B 390 1997 Phys. Lett. B 390 (1997) 298 [16] J.A.M. Vermaseren, in Proceedings of the IVth International Workshop on Gamma Gamma Interactions, Eds. G. Cochard and P. Kessler, Springer Verlag, 1980 [17] J. -F. Grivaz and F. Le Diberder, "Complementary analyses and acceptance optimization in new particle searches", LAL preprint # 92-37 (1992) [18] ALEPH Collaboration, "Search for Supersymmetry with a dominant R-Parity Violating LL ¯ E Coupling in e+e- Collisions at Centre-of-Mass Energies of 130 GeV to 172 GeV" 433 Eur. Phys. J., C 4 1998 Eur. Phys. J. C 4 (1998) 433 [19] For reviews see for example: H. Dreiner, "An Introduction to Explicit R-parity Violation" hep-ph/9707435 published in Perspectives on Supersymmetry, ed. G.L. Kane, World Scientific, Singapore (1998); G. Bhattacharyya 83 Nucl. Phys. B, Proc. Suppl. 52 1997 Nucl. Phys. B Proc. Suppl. 52 (1997) 83 12 astro-ph/0101431 eng Gray, M E Cambridge University Infrared constraints on the dark mass concentration observed in the cluster Abell 1942 24 Jan 2001 8 p We present a deep H-band image of the region in the vicinity of the cluster Abell 1942 containing the puzzling dark matter concentration detected in an optical weak lensing study by Erben et al. (2000). We demonstrate that ourlimiting magnitude, H=22, would be sufficient to detect clusters of appropriate mass out to redshifts comparable with the mean redshift of the background sources. Despite this, our infrared image reveals no obvious overdensity ofsources at the location of the lensing mass peak, nor an excess of sources in the I-H vs. H colour-magnitude diagram. We use this to further constrain the luminosity and mass-to-light ratio of the putative dark clump as a function ofits redshift. We find that for spatially-flat cosmologies, background lensing clusters with reasonable mass-to-light ratios lying in the redshift range 0<z<1 are strongly excluded, leaving open the possibility that the massconcentration is a new type of truly dark object. LANL EDS SzGeCERN Astrophysics and Astronomy Ellis, R S Lewis, J R McMahon, R G Firth, A E Meghan Gray <meg@ast.cam.ac.uk> http://cdsware.cern.ch/download/invenio-demo-site-files/0101431.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0101431.ps.gz http://cdsware.cern.ch/download/invenio-demo-site-files/0101431.fig1.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0101431.fig2.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0101431.fig3.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0101431.fig4.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0101431.fig5a.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0101431.fig5b.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0101431.fig6.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0101431.fig7.ps.gz Additional 2001 11 Caltech IoA, Cambridge 2001-01-25 00 2001-11-02 BATCH Gray, Meghan E. Ellis, Richard S. Lewis, James R. Mahon, Richard G. Mc Firth, Andrew E. Mon. Not. R. Astron. Soc. n 200104 PREPRINT hep-ph/0105155 eng CERN-TH-2001-131 Mangano, M L CERN Physics at the front-end of a neutrino factory : a quantitative appraisal Geneva CERN 16 May 2001 1 p We present a quantitative appraisal of the physics potential for neutrino experiments at the front-end of a muon storage ring. We estimate the forseeable accuracy in the determination of several interesting observables, and explorethe consequences of these measurements. We discuss the extraction of individual quark and antiquark densities from polarized and unpolarized deep-inelastic scattering. In particular we study the implications for the undertanding ofthe nucleon spin structure. We assess the determination of alpha_s from scaling violation of structure functions, and from sum rules, and the determination of sin^2(theta_W) from elastic nu-e and deep-inelastic nu-p scattering. Wethen consider the production of charmed hadrons, and the measurement of their absolute branching ratios. We study the polarization of Lambda baryons produced in the current and target fragmentation regions. Finally, we discuss thesensitivity to physics beyond the Standard Model. LANL EDS SzGeCERN Particle Physics - Phenomenology Alekhin, S I Anselmino, M Ball, R D Boglione, M D'Alesio, U Davidson, S De Lellis, G Ellis, J Forte, S Gambino, P Gehrmann, T Kataev, A L Kotzinian, A Kulagin, S A Lehmann-Dronke, B Migliozzi, P Murgia, F Ridolfi, G Michelangelo MANGANO <Michelangelo.Mangano@cern.ch> http://cdsware.cern.ch/download/invenio-demo-site-files/0105155.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0105155.ps.gz 2001 11 TH CERN nuDIS Working group of the ECFA-CERN Neutrino-Factory Study Group 2001-05-17 50 2001-05-25 MH SLAC 4628020 CER n 200231 PREPRINT [1] S. Geer 6989 Phys. Rev., D 57 1998 Phys. Rev. D 57 (1998) 6989 hep-ph/9712290 Phys. Rev. D 57 (1998) 6989-6997 039903 Phys. Rev., D 59 1999 Phys. Rev. D 59 (1999) 039903 ] [2] The Muon Collider Collab., µ+µ- Collider: a feasibility study, Report BNL-52503, Fermilab-Conf-96/092, LBNL-38946 (1996); B. Autin, A. Blondel and J. Ellis (eds.), Prospective study of muon storage rings at CERN, Report CERN 99-02, ECFA 99-197 (Geneva, 1999) [3] I. Bigi et al., The potential for neutrino physics at muon colliders and dedicated high current muon storage rings, Report BNL-67404 [4] C. Albright et al hep-ex/0008064 [5] R.D. Ball, D.A. Harris and K.S. McFarland hep-ph/0009223 submitted to the Proceedings of the Nufact '00 Workshop, June 2000, Monterey [6] H.L. Lai et al 1280 Phys. Rev., D 55 1997 Phys. Rev. D 55 (1997) 1280 hep-ph/9606399 Phys. Rev. D 55 (1997) 1280-1296 [7] V. Barone, C. Pascaud and F. Zomer 243 Eur. Phys. J., C 12 2000 Eur. Phys. J. C 12 (2000) 243 hep-ph/9907512 Eur. Phys. J. C 12 (2000) 243-262 [8] S. I. Alekhin 094022 Phys. Rev., D 63 2001 Phys. Rev. D 63 (2001) 094022 hep-ph/0011002 Phys. Rev. D 63 (2001) 094022 65 [9] G. Ridolfi 278 Nucl. Phys., A 666 2000 Nucl. Phys. A 666 (2000) 278 R.D. Ball and H.A.M. Tallini 1327 J. Phys., G 25 1999 J. Phys. G 25 (1999) 1327 S. Forte hep-ph/9409416 and hep-ph/9610238 [10] S. Forte, M.L. Mangano and G. Ridolfi hep-ph/0101192 Nucl. Phys. B 602 (2001) 585-621 to appear in Nucl. Phys., B [11] J. Blümlein and N. Kochelev 296 Phys. Lett., B 381 1996 Phys. Lett. B 381 (1996) 296 and 285 Nucl. Phys., B 498 1997 Nucl. Phys. B 498 (1997) 285 [12] D.A. Dicus 1637 Phys. Rev., D 5 1972 Phys. Rev. D 5 (1972) 1637 [13] M. Anselmino, P. Gambino and J. Kalinowski 267 Z. Phys., C 64 1994 Z. Phys. C 64 (1994) 267 M. Maul et al 443 Z. Phys., A 356 1997 Z. Phys. A 356 (1997) 443 J. Blümlein and N. Kochelev 285 Nucl. Phys., B 498 1997 Nucl. Phys. B 498 (1997) 285 V. Ravishankar 309 Nucl. Phys., B 374 1992 Nucl. Phys. B 374 (1992) 309 [14] B. Ehrnsperger and A. Schäfer 619 Phys. Lett., B 348 1995 Phys. Lett. B 348 (1995) 619 J. Lichtenstadt and H.J. Lipkin 119 Phys. Lett., B 353 1995 Phys. Lett. B 353 (1995) 119 J. Dai et al 273 Phys. Rev., D 53 1996 Phys. Rev. D 53 (1996) 273 P.G. Ratcliffe 383 Phys. Lett., B 365 1996 Phys. Lett. B 365 (1996) 383 N.W. Park, J. Schechter and H. Weigel 420 Phys. Lett., B 228 1989 Phys. Lett. B 228 (1989) 420 [15] A.O. Bazarko et al 189 Z. Phys., C 65 1989 Z. Phys. C 65 (1989) 189 [16] R. Mertig and W.L. van Neerven 637 Z. Phys., C 70 1996 Z. Phys. C 70 (1996) 637 W. Vogelsang 2023 Phys. Rev., D 54 1996 Phys. Rev. D 54 (1996) 2023 [17] D. de Florian and R. Sassot 6052 Phys. Rev., D 51 1995 Phys. Rev. D 51 (1995) 6052 [18] R.D. Ball, S. Forte and G. Ridolfi 255 Phys. Lett., B 378 1996 Phys. Lett. B 378 (1996) 255 [19] G. Altarelli, S. Forte and G. Ridolfi 277 Nucl. Phys., B 534 1998 Nucl. Phys. B 534 (1998) 277 and 138 Nucl. Phys. B, Proc. Suppl. 74 1999 Nucl. Phys. B Proc. Suppl. 74 (1999) 138 [20] G. Altarelli, R.D. Ball, S. Forte and G. Ridolfi 1145 Acta Phys. Pol., B 29 1998 Acta Phys. Pol. B 29 (1998) 1145 hep-ph/9803237 Acta Phys. Pol. B 29 (1998) 1145-1173 [21] H.L. Lai et al. (CTEQ Collab.) 375 Eur. Phys. J., C 12 2000 Eur. Phys. J. C 12 (2000) 375 hep-ph/9903282 Eur. Phys. J. C 12 (2000) 375-392 [22] G. Altarelli, R.D. Ball, S. Forte and G. Ridolfi 337 Nucl. Phys., B 496 1997 Nucl. Phys. B 496 (1997) 337 and 1145 Acta Phys. Pol., B 29 1998 Acta Phys. Pol. B 29 (1998) 1145 [23] G. Altarelli and G.G. Ross 391 Phys. Lett., B 212 1988 Phys. Lett. B 212 (1988) 391 A. V. Efremov and O. V. Teryaev, JINR-E2-88-287, in Proceedings of Symposium on Hadron Interactions-Theory and Phenomenology, Bechyne, June 26- July 1, 1988; ed. by J. Fischer et al (Czech. Acad. ScienceInst. Phys., 1988) p.432; R.D. Carlitz, J.C. Collins and A.H. Mueller 229 Phys. Lett., B 214 1988 Phys. Lett. B 214 (1988) 229 G. Altarelli and B. Lampe 315 Z. Phys. C 47 1990 Z. Phys. C 47 (1990) 315 W. Vogelsang 275 Z. Phys., C 50 1991 Z. Phys. C 50 (1991) 275 [24] G.M. Shore and G. Veneziano 75 Phys. Lett., B 244 1990 Phys. Lett. B 244 (1990) 75 and 23 Nucl. Phys., B 381 1992 Nucl. Phys. B 381 (1992) 23 see also G. M. Shore hep-ph/9812355 [25] S. Forte 189 Phys. Lett., B 224 1989 Phys. Lett. B 224 (1989) 189 and 1 Nucl. Phys., B 331 1990 Nucl. Phys. B 331 (1990) 1 S. Forte and E.V. Shuryak 153 Nucl. Phys., B 357 1991 Nucl. Phys. B 357 (1991) 153 [26] S.J. Brodsky and B.-Q. Ma 317 Phys. Lett., B 381 1996 Phys. Lett. B 381 (1996) 317 66 [27] S.J. Brodsky, J. Ellis and M. Karliner 309 Phys. Lett., B 206 1988 Phys. Lett. B 206 (1988) 309 J. Ellis and M. Karliner hep-ph/9601280 [28] M. Glück et al hep-ph/0011215 Phys.Rev. D63 (2001) 094005 [29] D. Adams et al. (Spin Muon Collab.) 23 Nucl. Instrum. Methods Phys. Res., A 437 1999 Nucl. Instrum. Methods Phys. Res. A 437 (1999) 23 [30] B. Adeva et al. (SMC Collab.) 112001 Phys. Rev., D 58 1998 Phys. Rev. D 58 (1998) 112001 P. L. Anthony et al. (E155 Collab.) 19 Phys. Lett., B 493 2000 Phys. Lett. B 493 (2000) 19 [31] R.M. Barnett 1163 Phys. Rev. Lett. 36 1976 Phys. Rev. Lett. 36 (1976) 1163 [32] M.A. Aivazis, J.C. Collins, F.I. Olness and W. Tung 3102 Phys. Rev., D 50 1994 Phys. Rev. D 50 (1994) 3102 [33] T. Gehrmann and W.J. Stirling 6100 Phys. Rev., D 53 1996 Phys. Rev. D 53 (1996) 6100 [34] M. Glück, E. Reya, M. Stratmann and W. Vogelsang 4775 Phys. Rev., D 53 1996 Phys. Rev. D 53 (1996) 4775 [35] D.J. Gross and C.H. Llewellyn Smith 337 Nucl. Phys., B 14 1969 Nucl. Phys. B 14 (1969) 337 [36] R. D. Ball and S. Forte 365 Phys. Lett., B 358 1995 Phys. Lett. B 358 (1995) 365 hep-ph/9506233 Phys.Lett. B358 (1995) 365-378 and hep-ph/9607289 [37] J. Santiago and F.J. Yndurain 45 Nucl. Phys., B 563 1999 Nucl. Phys. B 563 (1999) 45 hep-ph/9904344 Nucl.Phys. B563 (1999) 45-62 [38] V.S. Fadin and L.N. Lipatov 127 Phys. Lett., B 429 1998 Phys. Lett. B 429 (1998) 127 M. Ciafaloni, D. Colferai and G. Salam 114036 Phys. Rev., D 60 1999 Phys. Rev. D 60 (1999) 114036 G. Altarelli, R.D. Ball and S. Forte hep-ph/0011270 Nucl.Phys. B599 (2001) 383-423 [39] W.G. Seligman et al 1213 Phys. Rev. Lett. 79 1997 Phys. Rev. Lett. 79 (1997) 1213 [40] A. L. Kataev, G. Parente and A.V. Sidorov 405 Nucl. Phys., B 573 2000 Nucl. Phys. B 573 (2000) 405 hep-ph/9905310 Nucl.Phys. B573 (2000) 405-433 [41] A.L. Kataev, G. Parente and A.V. Sidorov, preprint CERN-TH/2000-343 hep-ph/0012014 and work in progress [42] S.I. Alekhin and A.L. Kataev 402 Phys. Lett., B 452 1999 Phys. Lett. B 452 (1999) 402 hep-ph/9812348 Phys.Lett. B452 (1999) 402-408 [43] S. Bethke R27 J. Phys., G 26 2000 J. Phys. G 26 (2000) R27 hep-ex/0004021 J.Phys. G26 (2000) R27 [44] I. Hinchliffe and A.V. Manohar 643 Annu. Rev. Nucl. Part. Sci. 50 2000 Annu. Rev. Nucl. Part. Sci. 50 (2000) 643 hep-ph/0004186 Ann.Rev.Nucl.Part.Sci. 50 (2000) 643-678 [45] H. Georgi and H.D. Politzer 1829 Phys. Rev., D 14 1976 Phys. Rev. D 14 (1976) 1829 [46] E.B. Zijlstra and W.L. van Neerven 377 Phys. Lett., B 297 1992 Phys. Lett. B 297 (1992) 377 [47] W.L. van Neerven and A. Vogt 263 Nucl. Phys., B 568 2000 Nucl. Phys. B 568 (2000) 263 hep-ph/9907472 Nucl.Phys. B568 (2000) 263-286 and hep-ph/0103123 Nucl.Phys. B603 (2001) 42-68 [48] W.L. van Neerven and A. Vogt 111 Phys. Lett., B 490 2000 Phys. Lett. B 490 (2000) 111 hep-ph/0007362 Phys.Lett. B490 (2000) 111-118 [49] S.A. Larin, T. van Ritbergen and J.A. Vermaseren 41 Nucl. Phys., B 427 1994 Nucl. Phys. B 427 (1994) 41 [50] S.A. Larin, P. Nogueira, T. van Ritbergen and J.A. Vermaseren 338 Nucl. Phys., B 492 1997 Nucl. Phys. B 492 (1997) 338 hep-ph/9605317 Nucl.Phys. B492 (1997) 338-378 [51] A. Retey and J.A. Vermaseren, preprint TTP00-13, NIKHEF-2000-018 hep-ph/0007294 Nucl.Phys. B604 (2001) 281-311 [52] J.A. Gracey 141 Phys. Lett., B 322 1994 Phys. Lett. B 322 (1994) 141 hep-ph/9401214 Phys.Lett. B322 (1994) 141-146 67 [53] J. Blümlein and A. Vogt 149 Phys. Lett., B 370 1996 Phys. Lett. B 370 (1996) 149 hep-ph/9510410 Phys.Lett. B370 (1996) 149-155 [54] S. Catani et al., preprint CERN-TH/2000-131 hep-ph/0005025 in Standard model physics (and more) at the LHC, eds. G. Altarelli and M. Mangano, Report CERN 2000-004 (Geneva, 2000) [55] A.L. Kataev, A.V. Kotikov, G. Parente and A.V. Sidorov 374 Phys. Lett. B 417 1998 Phys. Lett. B 417 (1998) 374 hep-ph/9706534 Phys.Lett. B417 (1998) 374-384 [56] M. Beneke 1 Phys. Rep. 317 1999 Phys. Rep. 317 (1999) 1 hep-ph/9807443 Phys.Rept. 317 (1999) 1-142 [57] M. Beneke and V.M. Braun hep-ph/0010208 [58] M. Dasgupta and B.R. Webber 273 Phys. Lett., B 382 1996 Phys. Lett. B 382 (1996) 273 hep-ph/9604388 [59] M. Maul, E. Stein, A. Schafer and L. Mankiewicz 100 Phys. Lett., B 401 1997 Phys. Lett. B 401 (1997) 100 hep-ph/9612300 [60] A.V. Sidorov et al. (IHEP­JINR Neutrino Detector Collab.) 405 Eur. Phys. J., C 10 1999 Eur. Phys. J. C 10 (1999) 405 hep-ex/9905038 [61] S.I. Alekhin et al. (IHEP-JINR Neutrino Detector Collab), preprint IHEP-01-18 (2001) hep-ex/0104013 [62] C. Adloff et al. (H1 Collab.) hep-ex/0012052 [63] A.D. Martin, R.G. Roberts, W.J. Stirling and R.S. Thorne 117 Eur. Phys. J., C 18 2000 Eur. Phys. J. C 18 (2000) 117 hep-ph/0007099 [64] E.B. Zijlstra and W.L. van Neerven 525 Nucl. Phys., B 383 1992 Nucl. Phys. B 383 (1992) 525 [65] S.G. Gorishny and S.A. Larin 109 Phys. Lett., B 172 1986 Phys. Lett. B 172 (1986) 109 S.A. Larin and J.A.M. Vermaseren 345 Phys. Lett., B 259 1991 Phys. Lett. B 259 (1991) 345 [66] A.L. Kataev and A.V. Sidorov, preprint CERN-TH/7235-94 hep-ph/9405254 in Proceedings of Rencontre de Moriond - Hadronic session of `QCD and high energy hadronic interactions', M´eribel-les-Allues, 1994, ed. J. Tr an Thanh V an (Editions Fronti eres, Gif-sur-Yvette, 1995), p. 189 [67] J.H. Kim et al 3595 Phys. Rev. Lett. 81 1998 Phys. Rev. Lett. 81 (1998) 3595 hep-ex/9808015 [68] J. Chyla and A.L. Kataev 385 Phys. Lett., B 297 1992 Phys. Lett. B 297 (1992) 385 hep-ph/9209213 [69] A.L. Kataev and A.V. Sidorov 179 Phys. Lett., B 331 1994 Phys. Lett. B 331 (1994) 179 hep-ph/9402342 [70] J. Blümlein and W. L. van Neerven 417 Phys. Lett., B 450 1999 Phys. Lett. B 450 (1999) 417 hep-ph/9811351 [71] A.L. Kataev and V.V. Starshenko 235 Mod. Phys. Lett., A 10 1995 Mod. Phys. Lett. A 10 (1995) 235 hep-ph/9502348 M.A. Samuel, J. Ellis and M. Karliner 4380 Phys. Rev. Lett. 74 1995 Phys. Rev. Lett. 74 (1995) 4380 hep-ph/9503411 [72] W. Bernreuther and W. Wetzel 228 Nucl. Phys., B 197 1982 Nucl. Phys. B 197 (1982) 228 [Erratum 758 Nucl. Phys., B 513 1998 Nucl. Phys. B 513 (1998) 758 ]; S.A. Larin, T. van Ritbergen and J.A. Vermaseren 278 Nucl. Phys., B 438 1995 Nucl. Phys. B 438 (1995) 278 hep-ph/9411260 K.G. Chetyrkin, B.A. Kniehl and M. Steinhauser 2184 Phys. Rev. Lett. 79 1997 Phys. Rev. Lett. 79 (1997) 2184 hep-ph/9706430 [73] E.V. Shuryak and A.I. Vainshtein 451 Nucl. Phys. B 199 1982 Nucl. Phys. B 199 (1982) 451 [74] M.A. Shifman, A.I. Vainshtein and V.I. Zakharov 385 Nucl. Phys., B 147 1979 Nucl. Phys. B 147 (1979) 385 [75] V.M. Braun and A.V. Kolesnichenko 723 Nucl. Phys., B 283 1987 Nucl. Phys. B 283 (1987) 723 68 [76] G.G. Ross and R.G. Roberts 425 Phys. Lett., B 322 1994 Phys. Lett. B 322 (1994) 425 hep-ph/9312237 [77] J. Balla, M.V. Polyakov and C. Weiss 327 Nucl. Phys., B 510 1998 Nucl. Phys. B 510 (1998) 327 hep-ph/9707515 [78] R.G. Oldeman (CHORUS Collab.) 96 Nucl. Phys. B, Proc. Suppl. 79 1999 Nucl. Phys. B, Proc. Suppl. 79 (1999) 96 R.G. Oldeman, PhD Thesis, Amsterdam University, June 2000 (unpublished) [79] U.K. Yang et al. (CCFR­NuTeV Collab.) hep-ex/0010001 [80] J.D. Bjorken 1767 Phys. Rev. 163 1967 Phys. Rev. 163 (1967) 1767 [81] W.A. Bardeen, A.J. Buras, D.W. Duke and T. Muta 3998 Phys. Rev., D 18 1978 Phys. Rev. D 18 (1978) 3998 G. Altarelli, R.K. Ellis and G. Martinelli 521 Nucl. Phys., B 143 1978 Nucl. Phys. B 143 (1978) 521 [82] K.G. Chetyrkin, S.G. Gorishny, S.A. Larin and F.V. Tkachov 230 Phys. Lett., B 137 1984 Phys. Lett. B 137 (1984) 230 [83] S.A. Larin, F.V. Tkachov and J.A. Vermaseren 862 Phys. Rev. Lett. 66 1991 Phys. Rev. Lett. 66 (1991) 862 [84] M. Arneodo 301 Phys. Rep. 240 1994 Phys. Rep. 240 (1994) 301 [85] G. Piller and W. Weise 1 Phys. Rep. 330 2000 Phys. Rep. 330 (2000) 1 hep-ph/9908230 [86] P. Amaudruz et al 3 Nucl. Phys., B 441 1995 Nucl. Phys. B 441 (1995) 3 M. Arneodo et al 12 Nucl. Phys., B 441 1995 Nucl. Phys. B 441 (1995) 12 [87] A.C. Benvenuti et al. (BCDMS Collab.) 483 Phys. Lett., B 189 1987 Phys. Lett. B 189 (1987) 483 [88] J. Gomez et al 4348 Phys. Rev., D 49 1994 Phys. Rev. D 49 (1994) 4348 [89] M.R. Adams et al. (E665 Collab.) 403 Z. Phys., C 67 1995 Z. Phys. C 67 (1995) 403 hep-ex/9505006 [90] S.L. Adler 963 Phys. Rev., B 135 1964 Phys. Rev. B 135 (1964) 963 [91] J.S. Bell 57 Phys. Rev. Lett. 13 1964 Phys. Rev. Lett. 13 (1964) 57 [92] C.A. Piketti and L. Stodolsky 571 Nucl. Phys., B 15 1970 Nucl. Phys. B 15 (1970) 571 [93] B.Z. Kopeliovich and P. Marage 1513 Int. J. Mod. Phys., A 8 1993 Int. J. Mod. Phys. A 8 (1993) 1513 [94] P.P. Allport et al. (BEBC WA59 Collab.) 417 Phys. Lett., B 232 1989 Phys. Lett. B 232 (1989) 417 [95] C. Boros, J.T. Londergan and A.W. Thomas 114030 Phys. Rev., D 58 1998 Phys. Rev. D 58 (1998) 114030 hep-ph/9804410 [96] U.K. Yang et al. (CCFR­NuTeV Collab.) hep-ex/0009041 [97] M.A. Aivazis, F.I. Olness and W. Tung 2339 Phys. Rev. Lett. 65 1990 Phys. Rev. Lett. 65 (1990) 2339 V. Barone, M. Genovese, N.N. Nikolaev, E. Predazzi and B. Zakharov 279 Phys. Lett., B 268 1991 Phys. Lett. B 268 (1991) 279 and 83 Z. Phys., C 70 1996 Z. Phys. C 70 (1996) 83 hep-ph/9505343 [98] R.S. Thorne and R.G. Roberts 303 Phys. Lett., B 421 1998 Phys. Lett. B 421 (1998) 303 hep-ph/9711223 [99] A.D. Martin, R.G. Roberts, W.J. Stirling and R.S. Thorne 463 Eur. Phys. J., C 4 1998 Eur. Phys. J. C 4 (1998) 463 hep-ph/9803445 [100] L.L. Frankfurt, M.I. Strikman and S. Liuti 1725 Phys. Rev. Lett. 65 1990 Phys. Rev. Lett. 65 (1990) 1725 [101] R. Kobayashi, S. Kumano and M. Miyama 465 Phys. Lett., B 354 1995 Phys. Lett. B 354 (1995) 465 hep-ph/9501313 [102] K.J. Eskola, V.J. Kolhinen and P.V. Ruuskanen 351 Nucl. Phys., B 535 1998 Nucl. Phys. B 535 (1998) 351 hep-ph/9802350 69 [103] S.A. Kulagin hep-ph/9812532 [104] P.V. Landshoff, J.C. Polkinghorne and R.D. Short 225 Nucl. Phys., B 28 1971 Nucl. Phys. B 28 (1971) 225 [105] S.A. Kulagin, G. Piller and W. Weise 1154 Phys. Rev., C 50 1994 Phys. Rev. C 50 (1994) 1154 nucl-th/9402015 [106] S.V. Akulinichev, S.A. Kulagin and G.M. Vagradov 485 Phys. Lett., B 158 1985 Phys. Lett. B 158 (1985) 485 [107] S.A. Kulagin 653 Nucl. Phys., A 500 1989 Nucl. Phys. A 500 (1989) 653 [108] G.B. West, Ann. Phys.NY : 74 (1972) 464 [109] S.A. Kulagin and A.V. Sidorov 261 Eur. Phys. J., A 9 2000 Eur. Phys. J. A 9 (2000) 261 hep-ph/0009150 [110] A.C. Benvenuti et al. (BCDMS Collab.) 29 Z. Phys., C 63 1994 Z. Phys. C 63 (1994) 29 [111] M. Vakili et al. (CCFR Collab.) 052003 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 052003 hep-ex/9905052 [112] S.A. Kulagin 435 Nucl. Phys., A 640 1998 Nucl. Phys. A 640 (1998) 435 nucl-th/9801039 [113] I.R. Afnan, F. Bissey, J. Gomez, A.T. Katramatou, W. Melnitchouk, G.G. Petratos and A.W. Thomas nucl-th/0006003 [114] V. Guzey et al hep-ph/0102133 [115] S. Sarantakos, A. Sirlin and W.J. Marciano 84 Nucl. Phys., B 217 1983 Nucl. Phys. B 217 (1983) 84 D.Y. Bardin and V.A. Dokuchaeva 975 Sov. J. Nucl. Phys. 43 1986 Sov. J. Nucl. Phys. 43 (1986) 975 D.Y. Bardin and V.A. Dokuchaeva 839 Nucl. Phys., B 287 1987 Nucl. Phys. B 287 (1987) 839 [116] G. Degrassi et al. Phys. Lett., B350 (95) 75; G. Degrassi and P. Gambino 3 Nucl. Phys., B 567 2000 Nucl. Phys. B 567 (2000) 3 [117] J.N. Bahcall, M. Kamionkowski and A. Sirlin 6146 Phys. Rev., D 51 1995 Phys. Rev. D 51 (1995) 6146 astro-ph/9502003 [118] See F. Jegerlehner hep-ph/9901386 and references therein [119] K.S. McFarland et al. (NuTeV Collab.) hep-ex/9806013 in Proceedings 33rd Rencontres de Moriond on Electroweak Interactions and Unified Theories, Les Arcs, 1998 [120] M. E. Peskin and T. Takeuchi 381 Phys. Rev., D 46 1992 Phys. Rev. D 46 (1992) 381 W. J. Marciano and J. L. Rosner 2963 Phys. Rev. Lett. 65 1990 Phys. Rev. Lett. 65 (1990) 2963 [Erratum 2963 Phys. Rev. Lett. 68 1990 Phys. Rev. Lett. 68 (1990) 2963 ] [121] G. Altarelli and R. Barbieri 161 Phys. Lett., B 253 1991 Phys. Lett. B 253 (1991) 161 D. C. Kennedy and P. Langacker 2967 Phys. Rev. Lett. 65 1990 Phys. Rev. Lett. 65 (1990) 2967 [Erratum 2967 Phys. Rev. Lett. 66 1990 Phys. Rev. Lett. 66 (1990) 2967 ] [122] D.E. Groom et al, Particle Data Group 1 Eur. Phys. J. 15 2000 Eur. Phys. J. 15 (2000) 1 [123] P. Migliozzi et al 217 Phys. Lett., B 462 1999 Phys. Lett. B 462 (1999) 217 [124] J. Finjord and F. Ravndal 61 Phys. Lett., B 58 1975 Phys. Lett. B 58 (1975) 61 [125] R.E. Shrock and B.W. Lee 2539 Phys. Rev., D 13 1976 Phys. Rev. D 13 (1976) 2539 [126] C. Avilez et al 149 Phys. Lett., B 66 1977 Phys. Lett. B 66 (1977) 149 [127] C. Avilez and T. Kobayashi 3448 Phys. Rev., D 19 1979 Phys. Rev. D 19 (1979) 3448 [128] C. Avilez et al 709 Phys. Rev., D 17 1978 Phys. Rev. D 17 (1978) 709 70 [129] A. Amer et al 48 Phys. Lett., B 81 1979 Phys. Lett. B 81 (1979) 48 [130] S.G. Kovalenko 934 Sov. J. Nucl. Phys. 52 1990 Sov. J. Nucl. Phys. 52 (1990) 934 [131] G.T. Jones et al. (WA21 Collab.) 593 Z. Phys., C 36 1987 Z. Phys. C 36 (1987) 593 [132] V.V. Ammosov et al 247 JETP Lett. 58 1993 JETP Lett. 58 (1993) 247 [133] D. Son et al 2129 Phys. Rev., D 28 1983 Phys. Rev. D 28 (1983) 2129 [134] N. Ushida et al. (E531 Collab.) 375 Phys. Lett., B 206 1988 Phys. Lett. B 206 (1988) 375 [135] N. Armenise et al 409 Phys. Lett., B 104 1981 Phys. Lett. B 104 (1981) 409 [136] G. De Lellis, P. Migliozzi and P. Zucchelli 7 Phys. Lett., B 507 2001 Phys. Lett. B 507 (2001) 7 hep-ph/0104066 [137] G. Corcella et al hep-ph/0011363 [138] T. Sjöstrand, report LU-TP-95-20 hep-ph/9508391 [139] G. Ingelman et al 108 Comput. Phys. Commun. 101 1997 Comput. Phys. Commun. 101 (1997) 108 [140] T. Bolton hep-ex/9708014 [141] P. Annis et al. (CHORUS Collab.) 458 Phys. Lett., B 435 1998 Phys. Lett. B 435 (1998) 458 [142] J. Conrad et al 1341 Rev. Mod. Phys. 70 1998 Rev. Mod. Phys. 70 (1998) 1341 [143] T. Adams et al. (NuTeV Collab.) 092001 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 092001 [144] A.E. Asratian et al. (BBCN Collab.) 55 Z. Phys., C 58 1993 Z. Phys. C 58 (1993) 55 [145] J.D. Richman and P.R. Burchat 893 Rev. Mod. Phys. 67 1995 Rev. Mod. Phys. 67 (1995) 893 [146] J. Collins, L. Frankfurt and M. Strikman 2982 Phys. Rev., D 56 1997 Phys. Rev. D 56 (1997) 2982 [147] A.V. Radyushkin 5524 Phys. Rev., D 56 1997 Phys. Rev. D 56 (1997) 5524 [148] S.J. Brodsky, L. Frankfurt, J.F. Gunion, A.H. Mueller and M. Strikman 3134 Phys. Rev., D 50 1994 Phys. Rev. D 50 (1994) 3134 A. V. Radyushkin 333 Phys. Lett., B 385 1996 Phys. Lett. B 385 (1996) 333 L. Mankiewicz, G. Piller and T. Weigl 119 Eur. Phys. J., C 5 1998 Eur. Phys. J. C 5 (1998) 119 and 017501 Phys. Rev., D 59 1999 Phys. Rev. D 59 (1999) 017501 M. Vanderhaeghen, P.A.M. Guichon and M. Guidal 5064 Phys. Rev. Lett. 80 1998 Phys. Rev. Lett. 80 (1998) 5064 [149] B. Lehmann-Dronke, P.V. Pobylitsa, M.V. Polyakov, A. Schäfer and K. Goeke 147 Phys. Lett., B 475 2000 Phys. Lett. B 475 (2000) 147 B. Lehmann-Dronke, M.V. Polyakov, A. Schäfer and K. Goeke 114001 Phys. Rev., D 63 2001 Phys. Rev. D 63 (2001) 114001 hep-ph/0012108 [150] M. Wirbel, B. Stech and M. Bauer 637 Z. Phys., C 29 1985 Z. Phys. C 29 (1985) 637 M. Bauer and M. Wirbel 671 Z. Phys. 42 1989 Z. Phys. 42 (1989) 671 [151] H-n. Li and B. Meli´c 695 Eur. Phys. J., C 11 1999 Eur. Phys. J. C 11 (1999) 695 [152] A. Abada et al 268 Nucl. Phys. B, Proc. Suppl. 83 2000 Nucl. Phys. B, Proc. Suppl. 83 (2000) 268 D. Becirevic et al hep-lat/0002025 A. Ali Khan et al hep-lat/0010009 A. S. Kronfeld hep-ph/0010074 L. Lellouch and C.J.D. Lin (UKQCD Collab.) hep-ph/0011086 71 [153] A.V. Radyushkin 014030 Phys. Rev., D 59 1999 Phys. Rev. D 59 (1999) 014030 [154] A.D. Martin, R.G. Roberts and W.J. Stirling 155 Phys. Lett., B 354 1995 Phys. Lett. B 354 (1995) 155 [155] J.T. Jones et al. (WA21 Collab.) 23 Z. Phys., C 28 1987 Z. Phys. C 28 (1987) 23 [156] S. Willocq et al. (WA59 Collab.) 207 Z. Phys., C 53 1992 Z. Phys. C 53 (1992) 207 [157] D. DeProspo et al. (E632 Collab.) 6691 Phys. Rev., D 50 1994 Phys. Rev. D 50 (1994) 6691 [158] P. Astier et al. (NOMAD Collab.) 3 Nucl. Phys., B 588 2000 Nucl. Phys. B 588 (2000) 3 [159] L. Trentadue and G. Veneziano 201 Phys. Lett., B 323 1994 Phys. Lett. B 323 (1994) 201 [160] M. Anselmino, M. Boglione, J. Hansson, and F. Murgia 828 Phys. Rev., D 54 1996 Phys. Rev. D 54 (1996) 828 [161] R.L. Jaffe 6581 Phys. Rev., D 54 1996 Phys. Rev. D 54 (1996) 6581 [162] J. Ellis, D.E. Kharzeev and A. Kotzinian 467 Z. Phys., C 69 1996 Z. Phys. C 69 (1996) 467 [163] D. de Florian, M. Stratmann, and W. Vogelsang 5811 Phys. Rev., D 57 1998 Phys. Rev. D 57 (1998) 5811 [164] A. Kotzinian, A. Bravar and D. von Harrach 329 Eur. Phys. J., C 2 1998 Eur. Phys. J. C 2 (1998) 329 [165] A. Kotzinian hep-ph/9709259 [166] S.L. Belostotski 526 Nucl. Phys. B, Proc. Suppl. 79 1999 Nucl. Phys. B, Proc. Suppl. 79 (1999) 526 [167] D. Boer, R. Jakob, and P.J. Mulders 471 Nucl. Phys., B 564 2000 Nucl. Phys. B 564 (2000) 471 [168] C. Boros, J.T. Londergan and A.W. Thomas 014007 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 014007 and D : 62 (2000) 014021 [169] D. Ashery and H.J. Lipkin 263 Phys. Lett., B 469 1999 Phys. Lett. B 469 (1999) 263 [170] B-Q. Ma, I. Schmidt, J. Soffer, and J-Y. Yang 657 Eur. Phys. J., C 16 2000 Eur. Phys. J. C 16 (2000) 657 114009 Phys. Rev., D 62 2000 Phys. Rev. D 62 (2000) 114009 [171] M. Anselmino, M. Boglione, and F. Murgia 253 Phys. Lett., B 481 2000 Phys. Lett. B 481 (2000) 253 [172] M. Anselmino, D. Boer, U. D'Alesio, and F. Murgia 054029 Phys. Rev., D 63 2001 Phys. Rev. D 63 (2001) 054029 [173] D. Indumathi, H.S. Mani and A. Rastogi 094014 Phys. Rev., D 58 1998 Phys. Rev. D 58 (1998) 094014 [174] M. Burkardt and R.L. Jaffe 2537 Phys. Rev. Lett. 70 1993 Phys. Rev. Lett. 70 (1993) 2537 [175] I.I. Bigi 43 Nuovo Cimento 41 1977 Nuovo Cimento 41 (1977) 43 and 581 [176] W. Melnitchouk and A.W. Thomas 311 Z. Phys., A 353 1996 Z. Phys. A 353 (1996) 311 [177] J. Ellis, M. Karliner, D.E. Kharzeev and M.G. Sapozhnikov 256 Nucl. Phys., A 673 2000 Nucl. Phys. A 673 (2000) 256 [178] R. Carlitz and M. Kislinger 336 Phys. Rev., D 2 1970 Phys. Rev. D 2 (1970) 336 [179] D. Naumov hep-ph/0101355 [180] P. Migliozzi et al 19 Phys. Lett., B 494 2000 Phys. Lett. B 494 (2000) 19 [181] A. Alton et al hep-ex/0008068 72 [182] Y. Grossman 141 Phys. Lett., B 359 1995 Phys. Lett. B 359 (1995) 141 [183] P. Langacker, M. Luo and A. Mann 87 Rev. Mod. Phys. 64 1992 Rev. Mod. Phys. 64 (1992) 87 [184] F. Cuypers and S. Davidson 503 Eur. Phys. J., C 2 1998 Eur. Phys. J. C 2 (1998) 503 S. Davidson, D. Bailey and B.A. Campbell 613 Z. Phys., C 61 1994 Z. Phys. C 61 (1994) 613 [185] A. Leike 143 Phys. Rep. 317 1999 Phys. Rep. 317 (1999) 143 [186] A. Datta, R. Gandhi, B. Mukhopadhyaya and P. Mehta hep-ph/0011375 [187] G. Giudice et al., Report of the Stopped-Muon Working Group, to appear. 73 eng BNL-40718 FERMILAB-Pub-87-222-T Nason, P Brookhaven Nat. Lab. The total cross section for the production of heavy quarks in hadronic collisions Upton, IL Brookhaven Nat. Lab. 23 Dec 1987 42 p SzGeCERN Particle Physics - Phenomenology Dawson, S Ellis, R K 11 1987 1990-01-29 50 2002-01-04 BATCH SLAC 1773607 h 198804n PREPRINT eng CERN-PRE-82-006 Ellis, J CERN From the standard model to grand unification Geneva CERN 1982 mult. p SzGeCERN General Theoretical Physics 1982 11 TH 1990-01-28 50 2001-09-15 BATCH 820332 oai:cds.cern.ch:CERN-PRE-82-006 cern:theory h 1982n PREPRINT astro-ph/0104076 eng Dev, A Delhi University Cosmic equation of state, Gravitational Lensing Statistics and Merging of Galaxies 4 Apr 2001 28 p In this paper we investigate observational constraints on the cosmic equation of state of dark energy ($p = w \rho$) using gravitational lensing statistics. We carry out likelihood analysis of the lens surveys to constrain thecosmological parameters $\Omega_{m}$ and $w$. We start by constraining $\Omega_{m}$ and $w$ in the no-evolution model of galaxies where the comoving number density of galaxies is constant. We extend our study to evolutionary modelsof galaxies - Volmerange $&$ Guiderdoni Model and Fast-Merging Model (of Broadhurst, Ellis $&$ Glazebrook). For the no-evolution model we get $w \leq -0.24$ and $\Omega_{m}\leq 0.48$ at $1\sigma$ (68% confidence level). For theVolmerange $&$ Guiderdoni Model we have $w \leq -0.2$ and $\Omega_{m} \leq 0.58$ at $1 \sigma$, and for the Fast Merging Model we get $w \leq -0.02$ and $\Omega_{m} \leq 0.93$ at $1\sigma$. For the case of constant $\Lambda$ ($w=-1$), all the models permit $\Omega_{m} = 0.3$ with 68% CL. We observe that the constraints on $w$ and $\Omega_{m}$ (and on $\Omega_{m}$ in the case of $w = -1$) obtained in the case of evolutionary models are weaker than thoseobtained in the case of the no-evolution model. LANL EDS SzGeCERN Astrophysics and Astronomy Jain, D Panchapakesan, N Mahajan, S Bhatia, V B Deepak Jain <deepak@physics.du.ac.in> http://cdsware.cern.ch/download/invenio-demo-site-files/0104076.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0104076.ps.gz 2001 10 Delhi University 2001-04-05 00 2001-04-10 BATCH Dev, Abha Jain, Deepak CER n 200231 PREPRINT [1] S. Perlmutter et al 565 Astrophys. J. 517 1999 Astrophys. J. 517 (1999) 565 [2] S. Perlmutter et al., Phy. Rev. Lett.: 83 (1999) 670 [3] A. G. Riess et al 1009 Astron. J. 116 1998 Astron. J. 116 (1998) 1009 [4] P. de Bernardis et al 955 Nature 404 2000 Nature 404 (2000) 955 [5] P. J. Ostriker & P. J. Steinhardt 600 Nature 377 1995 Nature 377 (1995) 600 [6] V. Sahni & Alexei Starobinsky, IJMP, D : 9 (2000) 373 [7] L. F. Bloomfield Torres & I. Waga 712 Mon. Not. R. Astron. Soc. 279 1996 Mon. Not. R. Astron. Soc. 279 (1996) 712 [8] V. Silveira & I. Waga 4890 Phys. Rev., D 50 1994 Phys. Rev. D 50 (1994) 4890 [9] V.Silveira & I. Waga 4625 Phys. Rev., D 56 1997 Phys. Rev. D 56 (1997) 4625 [10] I. Waga & Ana P. M. R. Miceli, Phy. Rev. D : 59 (1999) 103507 [11] M. S. Turner & M. White, Phy. Rev. D : 56 (1997) 4439 [12] D. Huterer & M. S. Turner, Phy. Rev. D : 60 (1999) 081301 [13] T. Chiba, N. Sugiyama & T. Nakamura, Mon. Not. R. As-tron. Soc.: 289 (1997) L5 [14] T. Chiba, N. Sugiyama & T. Nakamura, Mon. Not. R. As-tron. Soc.: 301 (1998) 72 [15] P. J. E. Peebles 439 Astrophys. J. 284 1984 Astrophys. J. 284 (1984) 439 [16] B. Ratra & P. J. E. Peebles, Phy. Rev. D : 37 (1988) 3406 [17] R. R. Caldwell, R. Dave & P. J. Steinhardt, Phy. Rev. Lett.: 80 (1998) 1582 [18] G. Efstathiou astro-ph/9904356 (1999) [19] J. A. S. Lima & J. S. Alcaniz 893 Mon. Not. R. Astron. Soc. 317 2000 Mon. Not. R. Astron. Soc. 317 (2000) 893 [20] Wang et al 17 Astrophys. J. 530 2000 Astrophys. J. 530 (2000) 17 [21] H. W. Rix, D. Maoz, E. Turner & M. Fukugita 49 Astrophys. J. 435 1994 Astrophys. J. 435 (1994) 49 [22] S. Mao & C. S. Kochanek 569 Mon. Not. R. Astron. Soc. 268 1994 Mon. Not. R. Astron. Soc. 268 (1994) 569 [23] D. Jain, N. Panchapakesan, S. Mahajan & V. B. Bhatia, MPLA : 15 (2000) 41 [24] T. Broadhurst, R. Ellis & K. Glazebrook 55 Nature 355 1992 Nature 355 (1992) 55 [BEG] [25] B. Rocca-Volmerange & B. Guiderdoni, Mon. Not. R. As-tron. Soc.: 247 (1990) 166 [26] A. Toomre, in The Evolution of Galaxies and Stellar Pop-ulations eds: B. M. Tinsley & R. B. Larson ( Yale Univ. Observatory), p-401 (1977) [27] F. Schwezier 109 Astron. J. 111 1996 Astron. J. 111 (1996) 109 [28] O. J. Eggen, D. Lynden-Bell & A. R. Sandage 748 Astrophys. J. 136 1962 Astrophys. J. 136 (1962) 748 [29] R. B. Partridge & P. J. E. Peebles 868 Astrophys. J. 147 1967 Astrophys. J. 147 (1967) 868 [30] S. P. Driver et al L23 Astrophys. J. 449 1995 Astrophys. J. 449 (1995) L23 [31] J. M. Burkey et al L13 Astrophys. J. 429 1994 Astrophys. J. 429 (1994) L13 [32] S. M. Faber et al 668 Astrophys. J. 204 1976 Astrophys. J. 204 (1976) 668 [33] R. G. Carlberg et al 540 Astrophys. J. 435 1994 Astrophys. J. 435 (1994) 540 [34] S. E. Zepf 377 Nature 390 1997 Nature 390 (1997) 377 [35] K. Glazebrook et al 157 Mon. Not. R. Astron. Soc. 273 1995 Mon. Not. R. Astron. Soc. 273 (1995) 157 [36] S. J. Lilly et al 108 Astrophys. J. 455 1995 Astrophys. J. 455 (1995) 108 [37] R. S. Ellis et al 235 Mon. Not. R. Astron. Soc. 280 1996 Mon. Not. R. Astron. Soc. 280 (1996) 235 [38] R. S. Ellis, Ann. Rev 389 Astron. Astrophys. 35 1997 Astron. Astrophys. 35 (1997) 389 [39] B. Guiderdoni & B. Rocca-Volmerange 435 Astron. Astrophys. 252 1991 Astron. Astrophys. 252 (1991) 435 [40] S. E. Zepf, & D. C. Koo 34 Astrophys. J. 337 1989 Astrophys. J. 337 (1989) 34 [41] H. K. C. Yee & E. Ellingson 37 Astrophys. J. 445 1995 Astrophys. J. 445 (1995) 37 [42] S. Cole et al 781 Mon. Not. R. Astron. Soc. 271 1994 Mon. Not. R. Astron. Soc. 271 (1994) 781 [43] C. M. Baugh, S. Cole & C. S. Frenk L27 Mon. Not. R. Astron. Soc. 282 1996 Mon. Not. R. Astron. Soc. 282 (1996) L27 [44] C. M. Baugh, S. Cole & C. S. Frenk 1361 Mon. Not. R. Astron. Soc. 283 1996 Mon. Not. R. Astron. Soc. 283 (1996) 1361 [45] C. M. Baugh et al 504 Astrophys. J. 498 1998 Astrophys. J. 498 (1998) 504 [46] P. Schechter 297 Astrophys. J. 203 1976 Astrophys. J. 203 (1976) 297 [47] W. H. Press & P. Schechter 487 Astrophys. J. 187 1974 Astrophys. J. 187 (1974) 487 [48] J. E. Gunn & J. R. Gott 1 Astrophys. J. 176 1972 Astrophys. J. 176 (1972) 1 [49] J. Loveday, B. A. Peterson, G. Efstathiou & S. J. Maddox 338 Astrophys. J. 390 1994 Astrophys. J. 390 (1994) 338 [50] E. L. Turner, J. P. ostriker & J. R. Gott III 1 Astrophys. J. 284 1984 Astrophys. J. 284 (1984) 1 [51] E. L. Turner L43 Astrophys. J. 365 1990 Astrophys. J. 365 (1990) L43 [52] M. Fukugita & E. L. Turner 99 Mon. Not. R. Astron. Soc. 253 1991 Mon. Not. R. Astron. Soc. 253 (1991) 99 [53] M. Fukugita, T. Futamase, M. Kasai & E. L. Turner, As-trophys. J.: 393 (1992) 3 [54] C. S. Kochanek 12 Astrophys. J. 419 1993 Astrophys. J. 419 (1993) 12 [55] C. S. Kochanek 638 Astrophys. J. 466 1996 Astrophys. J. 466 (1996) 638 [56] F. D. A. Hartwich & D. Schade, Ann. Rev. Astron. Astro-phys.: 28 (1990) 437 [57] Yu-Chung N. Cheng & L. M. Krauss 697 Int. J. Mod. Phys., A 15 2000 Int. J. Mod. Phys. A 15 (2000) 697 [58] J. N. Bahcall et al 56 Astrophys. J. 387 1992 Astrophys. J. 387 (1992) 56 [59] P. C. Hewett et al., Astron. J.109, 1498(LBQS) (1995) [60] D. Maoz et al., Astrophys. J.409, 28(Snapshot) (1993) [61] D. Crampton, R. D. McClure & J. M. Fletcher 23 Astrophys. J. 392 1992 Astrophys. J. 392 (1992) 23 [62] H. K. C. Yee, A. V. Filippenko & D. Tang 7 Astron. J. 105 1993 Astron. J. 105 (1993) 7 [63] J. Surdej et al 2064 Astron. J. 105 1993 Astron. J. 105 (1993) 2064 [64] D. Jain, N. Panchapakesan, S. Mahajan & V. B. Bhatia astro-ph/9807129 (1998) [65] D. Jain, N. Panchapakesan, S. Mahajan & V. B. Bhatia, IJMP, A : 13 (1998) 4227 [66] M. lampton, B. Margon & S. Bowyer 177 Astrophys. J. 208 1976 Astrophys. J. 208 (1976) 177 eng DOE-ER-40048-24-P-4 Abbott, R B Washington U. Seattle Cosmological perturbations in Kaluza-Klein models Washington, DC US. Dept. Energy. Office Adm. Serv. Nov 1985 26 p SzGeCERN General Theoretical Physics Bednarz, B F Ellis, S D 1985 11 1990-01-29 50 2002-01-04 BATCH h 198608n PREPRINT eng CERN-PPE-92-085 HEPHY-PUB-568 Albajar, C CERN Multifractal analysis of minimum bias events in \Sqrt s = 630 GeV $\overline{p}$p collisions Geneva CERN 1 Jun 1992 27 p SzGeCERN Particle Physics - Experimental Results Allkofer, O C Apsimon, R J Bartha, S Bezaguet, A Bohrer, A Buschbeck, B Cennini, P Cittolin, S Clayton, E Coughlan, J A Dau, D Della Negra, M Demoulin, M Dibon, H Dowell, J D Eggert, K Eisenhandler, E F Ellis, N Faissner, H Fensome, I F Ferrando, A Garvey, J Geiser, A Givernaud, A Gonidec, A Jank, W Jorat, G Josa-Mutuberria, I Kalmus, P I P Karimaki, V Kenyon, I R Kinnunen, R Krammer, M Lammel, S Landon, M Levegrun, S Lipa, P Markou, C Markytan, M Maurin, G McMahon, S Meyer, T Moers, T Morsch, A Moulin, A Naumann, L Neumeister, N Norton, A Pancheri, G Pauss, F Pietarinen, E Pimia, M Placci, A Porte, J P Priem, R Prosi, R Radermacher, E Rauschkolb, M Reithler, H Revol, J P Robinson, D Rubbia, C Salicio, J M Samyn, D Schinzel, D Schleichert, R Seez, C Shah, T P Sphicas, P Sumorok, K Szoncso, F Tan, C H Taurok, A Taylor, L Tether, S Teykal, H F Thompson, G Terrente-Lujan, E Tuchscherer, H Tuominiemi, J Virdee, T S von Schlippe, W Vuillemin, V Wacker, K Wagner, H Walzel, G Weselka, D Wulz, C E AACHEN - BIRMINGHAM - CERN - HELSINKI - KIEL - IMP. COLL. LONDON - QUEEN MARY COLL. LONDON - MADRID CIEMAT - MIT - RUTHERFORD APPLETON LAB. - VIENNA Collaboration 1992 13 UA1 PPE P00003707 CERN SPS CERN 1992-06-16 50 2001-04-12 BATCH 37-46 Z. Phys., C 56 1992 SLAC 2576562 oai:cds.cern.ch:CERN-PPE-92-085 cern:experiment n 199226 a1992 ARTICLE eng CERN-TH-4036 Ellis, J CERN Non-compact supergravity solves problems Geneva CERN Oct 1984 15 p SzGeCERN General Theoretical Physics Kahler manifolds gravitinos axions constraints noscale Enqvist, K Nanopoulos, D V 1985 13 TH CERN 1990-01-29 50 2001-09-15 BATCH 357-362 Phys. Lett., B 151 1985 oai:cds.cern.ch:CERN-TH-4036 cern:theory h 198451 a1985 ARTICLE eng STAN-CS-81-898-MF Whang, K Stanford University Separability as a physical database design methodology Stanford, CA Stanford Univ. Comput. Sci. Dept. Oct 1981 60 p Ordered for J Blake/DD SzGeCERN Computing and Computers Wiederhold, G Sagalowicz, D 1981 19 Stanford Univ. 1990-01-28 50 2002-01-04 BATCH n 198238n REPORT eng JYFL-RR-82-7 Arje, J University of Jyvaskyla Charge creation and reset mechanisms in an ion guide isotope separator (IGIS) Jyvaskyla Finland Univ. Dept. Phys. Jul 1982 18 p SzGeCERN Detectors and Experimental Techniques 1982 19 Jyväsklä Univ. 1990-01-28 50 2002-01-04 BATCH n 198238n REPORT 0898710022 eng 519.2 Lindley, Dennis Victor University College London Bayesian statistics a review Philadelphia, PA SIAM 1972 88 p CBMS-NSF Reg. Conf. Ser. Appl. Math. 2 Society for Industrial and Applied Mathematics. Philadelphia 1972 21 1990-01-27 00 2002-04-12 BATCH m 198604 BOOK 0844621951 eng 621.396.615 621.385.3 Hamilton, Donald R MIT Klystrons and microwave triodes New York, NY McGraw-Hill 1948 547 p M.I.T. Radiat. Lab. 7 Knipp, Julian K Kuper, J B Horner 1948 21 1990-01-27 00 2002-04-12 BATCH m 198604 BOOK eng 621.313 621.382.333.33 Draper, Alec Electrical machines 2nd ed London Longmans 1967 404 p Electrical engineering series 1967 21 1990-01-27 00 2002-04-12 BATCH m 198604 BOOK 1563964554 eng 539.1.078 539.143.44 621.384.8 Quadrupole mass spectrometry and its applications Amsterdam North-Holland 1976 ed. Dawson, Peter H 368 p 1976 21 1990-01-27 00 2002-04-12 BATCH m 198604 BOOK 2225350574 fre 518.5:62.01 Dasse, Michel Analyse informatique t.1 Les preliminaires Paris Masson 1972 Informatique 1972 21 1990-01-27 00 2002-04-12 BATCH m 198604 BOOK 2225350574 fre 518.5:62.01 Dasse, Michel Analyse informatique t.2 L'accomplissement Paris Masson 1972 Informatique 1972 21 1990-01-27 00 2002-04-12 BATCH m 198604 BOOK 0023506709 eng 519.2 Harshbarger, Thad R Introductory statistics a decision map 2nd ed New York, NY Macmillan 1977 597 p 1977 21 1990-01-27 00 2002-04-12 BATCH m 198604 BOOK eng 519.2 Fry, Thornton C Bell Teleph Labs Probability and its engineering uses Princeton, NJ Van Nostrand 1928 490 p Bell Teleph Lab. Ser. 1928 21 1990-01-27 00 2002-04-12 BATCH m 198606 BOOK 0720421039 eng 517.11 Kleene, Stephen Cole University of Wisconsin Introduction to metamathematics Amsterdam North-Holland 1952 (repr.1964.) 560 p Bibl. Matematica 1 1952 21 1990-01-27 00 2002-04-12 BATCH m 198606 BOOK eng 621.38 Hughes, Robert James Introduction to electronics London English Univ. Press 1962 432 p 65/0938, Blair, W/PE, pp Pipe, Peter 1962 21 1990-01-27 50 2002-04-12 BATCH m 198606 BOOK eng 519.2 518.5:519.2 Burford, Roger L Indiana University Statistics a computer approach Columbus, OH Merrill 1968 814 p 1968 21 1990-01-27 00 2002-04-12 BATCH m 198606 BOOK 0471155039 eng 539.1.075 Chiang, Hai Hung Basic nuclear electronics New York, NY Wiley 1969 354 p 1969 21 1990-01-27 00 2002-04-12 BATCH m 198606 BOOK eng 621-5 Dransfield, Peter Engineering systems and automatic control Englewood Cliffs, N.J. Prentice-Hall 1968 432 p 1968 21 1990-01-27 00 2002-04-12 BATCH m 198606 BOOK 0387940758 eng 537.52 Electrical breakdown in gases London Macmillan 1973 ed. Rees, J A 303 p 1973 21 1990-01-27 00 2002-04-12 BATCH m 198606 BOOK eng Tavanapong, W University of Central Florida A High-performance Video Browsing System Orlando, FL Central Florida Univ. 1999 dir. Hua, K A 172 p No fulltext Not held by the library Ph.D. : Univ. Central Florida : 1999 Recent advances in multimedia processing technologies, internetworking technologies, and the World Wide Web phenomenon have resulted in a vast creation and use of digital videos in all kinds of applications ranging fromentertainment, business solutions, to education. Designing efficient techniques for searching and retrieving videos over the networks becomes increasingly more important as future applications will include a huge volume of multimediacontent. One practical approach to search for a video segment is as follows. Step 1: Apply an initial search to determine the set of candidate videos. Step 2: Browse the candidates to identify the relevant videos. Step 3: Searchwithin the relevant videos for interesting video segments. In practice, a user might have to iterate through these steps multiple times in order to locate the desired video segments. Independently, database researchers have beeninvestigating techniques for the initial search in Step 1. Multimedia researchers have proposed several techniques for video browsing in Step 2. Computer communications researchers have been investigating video delivery techniques. Iidentify that searching for video data is an interactive process which involves the transmission of video data. Developing techniques for each step independently could result in a system with less performance. In this dissertation, Ipresent a unified approach taking into accounts all fundamental characteristics of multimedia data. I evaluated the proposed techniques through both simulation and system implementation. The resulting system is less expensive andoffers better performance. The simulation results demonstrate that the proposed technique can offer video browsing and search operations with little delays and with minimum storage overhead at the server. Client machines can handletheir search operations without involving the server making the design more scalable, which is vital for large systems deployed over the Internet. The implemented system shows that the visual quality of the browsing and the searchoperations are excellent. PROQUEST200009 SzGeCERN Computing and Computers THESIS notheld 1999 14 2000-09-22 00 2002-02-22 BATCH PROQUEST 9923724 PROQUEST DAI-B60/03p1177Sep1999 n 200034 THESIS eng Teshome, D California State Univ Neural Networks For Speech Recognition Of A Phonetic Language Long Beach, CA Calif. State Univ. 1999 55 p No fulltext Not held by the library Ms : California State Univ. : 1999 The goal of this thesis is to explore a possibility for a viable alternative/replacement to the Amharic typewriter. Amharic is the national language of Ethiopia. It is one of the oldest languages in the world. Actually, the root-language of Amharic, called Geez, is a descendent of Sabean, which is the direct ancestor of all Semitic languages including English. A phonetic language with 276 phonemes/characters, Amharic has posed quite a challenge to those who,like the author of this thesis, have attempted to design an easy-to-use word processor that interfaces with the conventional keyboard. With current Amharic word processing software, each character requires an average of threekeystrokes thus making typing Amharic literature quite a task. This thesis researches the feasibility of developing a PC-based speech recognition system to recognize the spoken phonemes of the Amharic language. Artificial NeuralNetworks are used for the recognition of spoken alphabets that form Amharic words. A neural network with feed-forward architecture is trained with a series of alphabets and is evaluated on its ability to recognize subsequent testdata. The neural network used in this project is a static classification network; that is, it focuses on the frequency domain of speech while making no attempt to process temporal information. The network training procedure uses thegeneralized Delta Rule. The recognition system developed in this project is an Isolated Speech Recognition System. The approach taken is to recognize the spoken word character by character. This approach is expected to work well dueto the phonetic nature of Amharic. PROQUEST200009 SzGeCERN Computing and Computers THESIS notheld 1999 14 2000-09-22 00 2002-02-22 BATCH PROQUEST 1397120 PROQUEST MAI38/02p448Apr2000 n 200034 THESIS eng Topcuoglu, H R Syracuse Univ. Scheduling Task Graphs In Heterogeneous Computing Environments Syracuse, NY Syracuse Univ. 1999 dir. Hariri, S 126 p No fulltext Not held by the library Ph.D. : Syracuse Univ. : 1999 Efficient application scheduling is critical for achieving high performance in heterogeneous computing environments. An application is represented by a directed acyclic graph (DAG) whose nodes represent tasks and whose edgesrepresent communication messages and precedence constraints among the tasks. The general task-scheduling problem maps the tasks of an application on processors and orders their execution so that task precedence requirements aresatisfied and a minimum schedule length is obtained. The task-scheduling problem has been shown to be NP- complete in general cases as well as in several restricted cases. Although a large number of scheduling heuristics arepresented in the literature, most of them target homogeneous processors. Existing algorithms for heterogeneous processors are not generally efficient because of their high complexity and the quality of their results. This thesisstudies the scheduling of DAG-structured application tasks on heterogeneous domains. We develop two novel low-complexity and efficient scheduling algorithms for bounded number of heterogeneous processors, the HeterogeneousEarliest-Finish-Time (HEFT) algorithm and the Critical-Path-on-a-Processor (CPOP) algorithm. The experimental work presented in this thesis shows that these algorithms significantly surpass previous approaches in terms of performance(schedule length ratio, speed-up, and frequency of best results) and cost (running time and time complexity). Our experimental work includes randomly generated graphs and graphs deducted from real applications. As part of thecomparison study, a parametric graph generator is introduced to generate graphs with various characteristics. We also present a further optimization of the HEFT Algorithm by introducing alternative methods for task prioritizing andprocessor selection phases. A novel processor selection policy based on the earliest finish time of the critical child task improves the performance of the HEFT algorithm. Several strategies for selecting the critical child task of agiven task are presented. This thesis addresses embedding the task scheduling algorithms into an application-development environment for distributed resources. An analytical model is introduced for setting the computation costs oftasks and communication costs of edges of a graph. As part of the design framework of our application development environment, a novel, two-phase, distributed scheduling algorithm is presented for scheduling an application overwide-area distributed resources. PROQUEST200009 SzGeCERN Computing and Computers THESIS notheld 1999 14 2000-09-22 00 2002-02-08 BATCH PROQUEST 9946509 PROQUEST DAI-B60/09p4718Mar2000 n 200034 THESIS spa Trespalacios-Mantilla, J H Puerto Rico Univ. Software De Apoyo Educativo Al Concepto De Funcion En Precalculo I (spanish Text) Rio Piedras Puerto Rico Univ. 1999 dir. Monroy, H 64 p No fulltext Not held by the library Ms : Univ. Puerto Rico : 1999 This thesis reports on the evaluation of the use of an educational software, designed to improve student's learning of the concept of mathematical function. The students in the study were registered in Precalculus I at theUniversity of Puerto Rico, Mayaguez Campus. The educational software allows the practice of changing the representation of a function among tabular, analytic, and graphical representations. To carry the evaluation, 59 students wereselected and were divided in two groups: control and experimental. Both groups received the 'traditional' classroom lectures on the topic. The experimental group, in addition, was allowed to practice with the educational software. Tomeasure their performance and the effect of the educational software, two tests were given: a pre-test and a post-test. The results of this study shows that the experimental group improved significantly more than the control group,thus demonstrating the validity of the educational software in the learning of the concept of mathematical function. PROQUEST200009 SzGeCERN Computing and Computers THESIS notheld 1999 14 2000-09-22 00 2002-02-08 BATCH PROQUEST 1395476 PROQUEST MAI37/06p1890Dec1999 n 200034 THESIS 0612382052 fre Troudi, N Laval Univ. Systeme Multiagent Pour Les Environnements Riches En Informations (french Text) Laval Laval Univ. 1999 dir. Chaib-Draa, B 101 p No fulltext Not held by the library Msc : Universite Laval : 1999 La croissance du Web est spectaculaire, puisqu'on estime aujourd'hui a plus de 50 millions, le nombre de pages sur le Web qui ne demandent qu'a etre consultees. Un simple calcul montre qu'en consacrant ne serait-ce qu'une minute parpage, il faudrait environ 95 ans pour consulter toutes ces pages. L'utilisation d'une strategie de recherche est donc vitale. Dans ce cadre, de nombreux outils de recherche ont ete proposes. Ces outils appeles souvent moteurs derecherche se sont averes aujourd'hui incapables de fournir de l'aide aux utilisateurs. Les raisons principales a cela sont les suivantes: (1) La nature ouverte de l'Internet: aucune supervision centrale ne s'applique quant audeveloppement d'Internet, puisque toute personne qui desire l'utiliser et/ou offrir des informations est libre de le faire; (2) La nature dynamique des informations: les informations qui ne sont pas disponibles aujourd'hui peuventl'etre demain et inversement; (3) La nature heterogene de l'information: l'information est offerte sous plusieurs formats et de plusieurs facons, compliquant ainsi la recherche automatique de l'information. Devant ce constat, ilsemble important de chercher de nouvelles solutions pour aider l'utilisateur dans sa recherche d'informations. (Abstract shortened by UMI.) PROQUEST200009 SzGeCERN Computing and Computers THESIS notheld 1999 14 2000-09-22 00 2002-02-08 BATCH PROQUEST MQ38205 PROQUEST MAI37/06p1890Dec1999 n 200034 THESIS eng LBL-22304 Manes, J L Calif. Univ. Berkeley Anomalies in quantum field theory and differential geometry Berkeley, CA Lawrence Berkeley Nat. Lab. Apr 1986 76 p Thesis : Calif. Univ. Berkeley SzGeCERN General Theoretical Physics bibliography REPORT THESIS 1986 14 1990-01-29 50 2002-03-22 BATCH SLAC 1594192 h 198650n THESIS eng LBL-21916 Ingermanson, R Calif. Univ. Berkeley Accelerating the loop expansion Berkeley, CA Lawrence Berkeley Nat. Lab. Jul 1986 96 p Thesis : Calif. Univ. Berkeley SzGeCERN General Theoretical Physics bibliography REPORT THESIS 1986 14 1990-01-29 50 2002-03-22 BATCH SLAC 1594184 h 198650n THESIS eng LBL-28106 Bertsche, K J Calif. Univ. Berkeley A small low energy cyclotron for radioisotope measurements Berkeley, CA Lawrence Berkeley Nat. Lab. Nov 1989 155 p Thesis : Calif. Univ. Berkeley SzGeCERN Accelerators and Storage Rings bibliography REPORT THESIS 14 1989 1990-02-28 50 2002-03-22 BATCH h 199010n THESIS gr-qc/0204045 eng Khalatnikov, I M L D Landau Institute for Theoretical Physics of Russian Academy of Sciences Comment about quasiisotropic solution of Einstein equations near cosmological singularity 12 Apr 2002 7 p We generalize for the case of arbitrary hydrodynamical matter the quasiisotropic solution of Einstein equations near cosmological singularity, found by Lifshitz and Khalatnikov in 1960 for the case of radiation-dominated universe. Itis shown that this solution always exists, but dependence of terms in the quasiisotropic expansion acquires a more complicated form. LANL EDS SzGeCERN General Relativity and Cosmology Kamenshchik, A Y Alexander Kamenshchik <sasha.kamenshchik@centrovolta.it> http://cdsware.cern.ch/download/invenio-demo-site-files/0204045.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204045.ps.gz CER n 200231 2002 11 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] Lifshitz E M and Khalatnikov I M 1960 149 Zh. Eksp. Teor. Fiz. 39 1960 Zh. Eksp. Teor. Fiz. 39 (1960) 149 [2] Lifshitz E M and Khalatnikov I M 1964 Sov. Phys. Uspekhi 6 495 [3] Landau L D and Lifshitz E M 1979 The Classical Theory of Fields (Perg-amon Press) [4] Starobinsky A A 1986 Stochastic De Sitter (inflationary stage) in the early universe in Field Theory, Quantum Gravity and Strings, (Eds. H.J. De Vega and N. Sanchez, Springer-Verlag, Berlin) 107; Linde A D 1990 Particle Physics and Inflationary Cosmology (Harward Academic Publishers, New York) [5] Banks T and Fischler W 2001 M theory observables for cosmolog-ical space-times hep-th/0102077 An Holographic Cosmology hep-th/0111142 [6] Perlmutter S J et al 1999 565 Astrophys. J. 517 1999 Astrophys. J. 517 (1999) 565 Riess A et al 1998 1009 Astron. J. 116 1998 Astron. J. 116 (1998) 1009 [7] Sahni V and Starobinsky A A 2000 373 Int. J. Mod. Phys., D 9 2000 Int. J. Mod. Phys. D 9 (2000) 373 gr-qc/0204046 eng Bento, M C CERN Supergravity Inflation on the Brane 12 Apr 2002 5 p We study N=1 Supergravity inflation in the context of the braneworld scenario. Particular attention is paid to the problem of the onset of inflation at sub-Planckian field values and the ensued inflationary observables. We find thatthe so-called $\eta$-problem encountered in supergravity inspired inflationary models can be solved in the context of the braneworld scenario, for some range of the parameters involved. Furthermore, we obtain an upper bound on thescale of the fifth dimension, $M_5 \lsim 10^{-3} M_P$, in case the inflationary potential is quadratic in the inflaton field, $\phi$. If the inflationary potential is cubic in $\phi$, consistency with observational data requires that$M_5 \simeq 9.2 \times 10^{-4} M_P$. LANL EDS SzGeCERN General Relativity and Cosmology Bertolami, O Sen, A A Maria da Conceicao Bento <bento@sirius.ist.utl.pt> http://cdsware.cern.ch/download/invenio-demo-site-files/0204046.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204046.ps.gz 2002 11 2002-04-15 00 2002-04-15 BATCH n 200216 PREPRINT hep-th/0204098 eng Alhaidari, A D King Fadh University Reply to 'Comment on "Solution of the Relativistic Dirac-Morse Problem"' 11 Apr 2002 This combines a reply to the Comment [hep-th/0203067 v1] by A. N. Vaidya and R. de L. Rodrigues with an erratum to our Letter [Phys. Rev. Lett. 87, 210405 (2001)] LANL EDS SzGeCERN Particle Physics - Theory A. D. Alhaidari <haidari@kfupm.edu.sa> http://cdsware.cern.ch/download/invenio-demo-site-files/0204098.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204098.ps.gz 2002 11 2002-04-15 00 2002-04-15 BATCH n 200216 PREPRINT hep-th/0204099 eng CU-TP-1043 Easther, R Columbia Univ. Cosmological String Gas on Orbifolds Irvington-on-Hudson, NY Columbia Univ. Dept. Phys. 12 Apr 2002 14 p It has long been known that strings wound around incontractible cycles can play a vital role in cosmology. In particular, in a spacetime with toroidal spatial hypersurfaces, the dynamics of the winding modes may help yield threelarge spatial dimensions. However, toroidal compactifications are phenomenologically unrealistic. In this paper we therefore take a first step toward extending these cosmological considerations to $D$-dimensional toroidal orbifolds.We use numerical simulation to study the timescales over which "pseudo-wound" strings unwind on these orbifolds with trivial fundamental group. We show that pseudo-wound strings can persist for many ``Hubble times'' in some of thesespaces, suggesting that they may affect the dynamics in the same way as genuinely wound strings. We also outline some possible extensions that include higher-dimensional wrapped branes. LANL EDS SzGeCERN Particle Physics - Theory Greene, B R Jackson, M G M. G. Jackson <markj@phys.columbia.edu> http://cdsware.cern.ch/download/invenio-demo-site-files/0204099.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204099.ps.gz CER n 200231 2002 11 Easther, Richard Greene, Brian R. Jackson, Mark G. 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] R. Brandenberger and C. Vafa 391 Nucl. Phys., B 316 1989 Nucl. Phys. B 316 (1989) 391 [2] A. A. Tseytlin and C. Vafa 443 Nucl. Phys., B 372 1992 Nucl. Phys. B 372 (1992) 443 hep-th/9109048 [3] M. Sakellariadou 319 Nucl. Phys., B 468 1996 Nucl. Phys. B 468 (1996) 319 hep-th/9511075 [4] A. G. Smith and A. Vilenkin 990 Phys. Rev., D 36 1987 Phys. Rev. D 36 (1987) 990 [5] R. Brandenberger, D. A. Easson and D. Kimberly 421 Nucl. Phys., B 623 2002 Nucl. Phys. B 623 (2002) 421 hep-th/0109165 [6] B. R. Greene, A. D. Shapere, C. Vafa, and S. T. Yau 1 Nucl. Phys., B 337 1990 Nucl. Phys. B 337 (1990) 1 [7] L. Dixon, J. Harvey, C. Vafa and E. Witten 678 Nucl. Phys., B 261 1985 Nucl. Phys. B 261 (1985) 678 L. Dixon, J. Harvey, C. Vafa and E. Witten 285 Nucl. Phys., B 274 1986 Nucl. Phys. B 274 (1986) 285 [8] M. Sakellariadou and A. Vilenkin 885 Phys. Rev., D 37 1988 Phys. Rev. D 37 (1988) 885 [9] J. J. Atick and E. Witten 291 Nucl. Phys., B 310 1988 Nucl. Phys. B 310 (1988) 291 [10] D. Mitchell and N. Turok 1577 Phys. Rev. Lett. 58 1987 Phys. Rev. Lett. 58 (1987) 1577 Imperial College report, 1987 (unpublished) [11] S. Alexander, R. Brandenberger, and D. Easson 103509 Phys. Rev., D 62 2000 Phys. Rev. D 62 (2000) 103509 hep-th/0005212 [12] D. Easson hep-th/0110225 hep-ph/0204132 eng NUC-MINN-02-3-T Shovkovy, I A Minnesota Univ. Thermal conductivity of dense quark matter and cooling of stars Minneapolis, MN Minnesota Univ. 11 Apr 2002 9 p The thermal conductivity of the color-flavor locked phase of dense quark matter is calculated. The dominant contribution to the conductivity comes from photons and Nambu-Goldstone bosons associated with breaking of baryon numberwhich are trapped in the quark core. Because of their very large mean free path the conductivity is also very large. The cooling of the quark core arises mostly from the heat flux across the surface of direct contact with the nuclearmatter. As the thermal conductivity of the neighboring layer is also high, the whole interior of the star should be nearly isothermal. Our results imply that the cooling time of compact stars with color-flavor locked quark cores issimilar to that of ordinary neutron stars. LANL EDS SzGeCERN Particle Physics - Phenomenology Ellis, P J Igor Shovkovy <shovkovy@physics.umn.edu> http://cdsware.cern.ch/download/invenio-demo-site-files/0204132.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204132.ps.gz CER n 200231 2002 11 Shovkovy, Igor A. Ellis, Paul J. 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] J.C. Collins and M.J. Perry 1353 Phys. Rev. Lett. 34 1975 Phys. Rev. Lett. 34 (1975) 1353 [2] B. C. Barrois 390 Nucl. Phys., B 129 1977 Nucl. Phys. B 129 (1977) 390 S. C. Frautschi, in "Hadronic matter at extreme energy density", edited by N. Cabibbo and L. Sertorio (Plenum Press, 1980); D. Bailin and A. Love 325 Phys. Rep. 107 1984 Phys. Rep. 107 (1984) 325 [3] M. G. Alford, K. Rajagopal and F. Wilczek 247 Phys. Lett., B 422 1998 Phys. Lett. B 422 (1998) 247 R. Rapp, T. Schäfer, E. V. Shuryak and M. Velkovsky 53 Phys. Rev. Lett. 81 1998 Phys. Rev. Lett. 81 (1998) 53 [4] D. T. Son 094019 Phys. Rev., D 59 1999 Phys. Rev. D 59 (1999) 094019 R. D. Pisarski and D. H. Rischke 37 Phys. Rev. Lett. 83 1999 Phys. Rev. Lett. 83 (1999) 37 [5] T. Schafer and F. Wilczek 114033 Phys. Rev., D 60 1999 Phys. Rev. D 60 (1999) 114033 D. K. Hong, V. A. Miransky, I. A. Shovkovy and L. C. R. Wijewardhana 056001 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 056001 erratum 059903 Phys. Rev., D 62 2000 Phys. Rev. D 62 (2000) 059903 R. D. Pisarski and D. H. Rischke 051501 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 051501 [6] S. D. Hsu and M. Schwetz 211 Nucl. Phys., B 572 2000 Nucl. Phys. B 572 (2000) 211 W. E. Brown, J. T. Liu and H. C. Ren 114012 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 114012 [7] I. A. Shovkovy and L. C. R. Wijewardhana 189 Phys. Lett., B 470 1999 Phys. Lett. B 470 (1999) 189 T. Schäfer 269 Nucl. Phys., B 575 2000 Nucl. Phys. B 575 (2000) 269 [8] K. Rajagopal and F. Wilczek, arXiv hep-ph/0011333 M. G. Alford 131 Annu. Rev. Nucl. Part. Sci. 51 2001 Annu. Rev. Nucl. Part. Sci. 51 (2001) 131 [9] M. Alford and K. Rajagopal, arXiv hep-ph/0204001 [10] M. Alford, K. Rajagopal and F. Wilczek 443 Nucl. Phys., B 537 1999 Nucl. Phys. B 537 (1999) 443 [11] R. Casalbuoni and R. Gatto 111 Phys. Lett., B 464 1999 Phys. Lett. B 464 (1999) 111 [12] D. T. Son and M. A. Stephanov 074012 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 074012 erratum 059902 Phys. Rev., D 62 2000 Phys. Rev. D 62 (2000) 059902 [13] P. F. Bedaque and T. Schäfer 802 Nucl. Phys., A 697 2002 Nucl. Phys. A 697 (2002) 802 [14] V. A. Miransky and I. A. Shovkovy 111601 Phys. Rev. Lett. 88 2002 Phys. Rev. Lett. 88 (2002) 111601 [arXiv hep-ph/0108178 T. Schafer, D. T. Son, M. A. Stephanov, D. Toublan and J. J. Ver-baarschot 67 Phys. Lett., B 522 2001 Phys. Lett. B 522 (2001) 67 [arXiv hep-ph/0108210 [15] D. T. Son, arXiv hep-ph/0108260 [16] P. Jaikumar, M. Prakash and T. Schäfer, arXiv astro-ph/0203088 [17] E. J. Ferrer, V. P. Gusynin and V. de la Incera, arXiv: cond-matt/0203217 [18] I. S. Gradshteyn and I. M. Ryzhik, Tables of Integrals, Series and Products (Academic, New York, 1965) 3.252.9 [19] J. J. Freeman and A. C. Anderson 5684 Phys. Rev., B 34 1986 Phys. Rev. B 34 (1986) 5684 [20] C. Kittel, Introduction to Solid State Phys.(John Wi-ley & Sons, Inc., 1960) p. 139 [21] V. P. Gusynin and I. A. Shovkovy 577 Nucl. Phys., A 700 2002 Nucl. Phys. A 700 (2002) 577 [22] I. M. Khalatnikov, An introduction to the theory of su-perfluidity, (Addison-Wesley Pub. Co., 1989) [23] P. A. Sturrock, Plasma Physics, (Cambridge University Press, 1994) [24] J. M. Lattimer, K. A. Van Riper, M. Prakash and M. Prakash 802 Astrophys. J. 425 1994 Astrophys. J. 425 (1994) 802 [25] M. G. Alford, K. Rajagopal, S. Reddy and F. Wilczek 074017 Phys. Rev., D 64 2001 Phys. Rev. D 64 (2001) 074017 [26] G. W. Carter and S. Reddy 103002 Phys. Rev., D 62 2000 Phys. Rev. D 62 (2000) 103002 [27] A. W. Steiner, M. Prakash and J. M. Lattimer 10 Phys. Lett., B 509 2001 Phys. Lett. B 509 (2001) 10 [arXiv astro-ph/0101566 [28] S. Reddy, M. Sadzikowski and M. Tachibana, arXiv nucl-th/0203011 [29] M. Prakash, J. M. Lattimer, J. A. Pons, A. W. Steiner and S. Reddy 364 Lect. Notes Phys. 578 2001 Lect. Notes Phys. 578 (2001) 364 [30] S. L. Shapiro and S. A. Teukolsky, Black holes, white dwarfs, and neutron stars: the physics of compact ob-jects, (John Wiley & Sons, 1983) [31] D. Blaschke, H. Grigorian and D. N. Voskresensky, As-tron. Astrophys. : 368 (2001) 561 [32] D. Page, M. Prakash, J. M. Lattimer and A. W. Steiner 2048 Phys. Rev. Lett. 85 2000 Phys. Rev. Lett. 85 (2000) 2048 [33] K. Rajagopal and F. Wilczek 3492 Phys. Rev. Lett. 86 2001 Phys. Rev. Lett. 86 (2001) 3492 [34] J. I. Kapusta, Finite-temperature field theory, (Cam-bridge University Press, 1989) [35] S. M. Johns, P. J. Ellis and J. M. Lattimer 1020 Astrophys. J. 473 1996 Astrophys. J. 473 (1996) 1020 hep-ph/0204133 eng Gomez, M E CFIF Lepton-Flavour Violation in SUSY with and without R-parity 12 Apr 2002 11 p We study whether the individual violation of the lepton numbers L_{e,mu,tau} in the charged sector can lead to measurable rates for BR(mu->e gamma) and BR(tau->mu gamma). We consider three different scenarios, the fist onecorresponds to the Minimal Supersymmetric Standard Model with non-universal soft terms. In the other two cases the violation of flavor in the leptonic charged sector is associated to the neutrino problem in models with a see-sawmechanism and with R-parity violation respectively. LANL EDS SzGeCERN Particle Physics - Phenomenology Carvalho, D F Mario E. Gomez <mgomez@gtae3.ist.utl.pt> http://cdsware.cern.ch/download/invenio-demo-site-files/0204133.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204133.ps.gz CER n 200231 2002 11 2002-04-15 00 2002-04-15 BATCH TALK GIVEN BY M E G AT THE CORFU SUMMER INSTITUTE ON ELEMENTARY PARTICLE PHYSICS CORFU 2001 11 PAGES 5 FIGURES PREPRINT [1] Y. Fukuda et al., Super-Kamiokande collaboration 9 Phys. Lett., B 433 1998 Phys. Lett. B 433 (1998) 9 33 Phys. Lett., B 436 1998 Phys. Lett. B 436 (1998) 33 1562 Phys. Rev. Lett. 81 1998 Phys. Rev. Lett. 81 (1998) 1562 [2] M. Apollonio et al., Chooz collaboration 397 Phys. Lett. B 420 1998 Phys. Lett. B 420 (1998) 397 [3] H. N. Brown et al. [Muon g-2 Collaboration] 2227 Phys. Rev. Lett. 86 2001 Phys. Rev. Lett. 86 (2001) 2227 hep-ex/0102017 [4] Review of Particle Physics, D. E. Groom et al 1 Eur. Phys. J., C 15 2000 Eur. Phys. J. C 15 (2000) 1 [5] D. F. Carvalho, M. E. Gomez and S. Khalil 001 J. High Energy Phys. 0107 2001 J. High Energy Phys. 0107 (2001) 001 hep-ph/0104292 [6] D. F. Carvalho, J. R. Ellis, M. Gomez and S. Lola 323 Phys. Lett., B 515 2001 Phys. Lett. B 515 (2001) 323 [7] D. F. Carvalho, M. E. Gomez and J. C. Romao hep-ph/0202054 (to appear in Phys. Rev., D [8] J. R. Ellis, M. E. Gomez, G. K. Leontaris, S. Lola and D. V. Nanopoulos 319 Eur. Phys. J., C 14 2000 Eur. Phys. J. C 14 (2000) 319 [9] A. Belyaev et al 715 Eur. Phys. J., C 22 2002 Eur. Phys. J. C 22 (2002) 715 A. Belyaev et al. [Kaon Physics Working Group Collaboration] hep-ph/0107046 [10] J. Hisano, T. Moroi, K. Tobe and M. Yamaguchi 2442 Phys. Rev., D 53 1996 Phys. Rev. D 53 (1996) 2442 J. Hisano and D. Nomura 116005 Phys. Rev., D 59 1999 Phys. Rev. D 59 (1999) 116005 [11] R. Barbieri and L.J. Hall 212 Phys. Lett., B 338 1994 Phys. Lett. B 338 (1994) 212 R. Barbieri et al 219 Nucl. Phys., B 445 1995 Nucl. Phys. B 445 (1995) 219 Nima Arkani-Hamed, Hsin-Chia Cheng and L.J. Hall 413 Phys. Rev., D 53 1996 Phys. Rev. D 53 (1996) 413 P. Ciafaloni, A. Romanino and A. Strumia 3 Nucl. Phys., B 458 1996 Nucl. Phys. B 458 (1996) 3 M. E. Gomez and H. Goldberg 5244 Phys. Rev., D 53 1996 Phys. Rev. D 53 (1996) 5244 J. Hisano, D. Nomura, Y. Okada, Y. Shimizu and M. Tanaka 116010 Phys. Rev., D 58 1998 Phys. Rev. D 58 (1998) 116010 [12] M. E. Gomez, G. K. Leontaris, S. Lola and J. D. Vergados 116009 Phys. Rev., D 59 1999 Phys. Rev. D 59 (1999) 116009 G.K. Leontaris and N.D. Tracas 90 Phys. Lett., B 431 1998 Phys. Lett. B 431 (1998) 90 W. Buchmuller, D. Delepine and F. Vissani 171 Phys. Lett., B 459 1999 Phys. Lett. B 459 (1999) 171 W. Buchmuller, D. Delepine and L. T. Handoko 445 Nucl. Phys., B 576 2000 Nucl. Phys. B 576 (2000) 445 Q. Shafi and Z. Tavartkiladze 145 Phys. Lett., B 473 2000 Phys. Lett. B 473 (2000) 145 J. L. Feng, Y. Nir and Y. Shadmi 113005 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 113005 [13] A. Brignole, L. E. Iba nez and C. Mu noz 125 Nucl. Phys., B 422 1994 Nucl. Phys. B 422 (1994) 125 [Erratum 747 Nucl. Phys., B 436 1994 Nucl. Phys. B 436 (1994) 747 ] [14] L. Iba nez and G.G. Ross 100 Phys. Lett., B 332 1994 Phys. Lett. B 332 (1994) 100 G.K. Leontaris, S. Lola and G.G. Ross 25 Nucl. Phys., B 454 1995 Nucl. Phys. B 454 (1995) 25 S. Lola and G.G. Ross 81 Nucl. Phys., B 553 1999 Nucl. Phys. B 553 (1999) 81 [15] M. Gell-Mann, P. Ramond and R. Slansky, Proceedings of the Stony Brook Super-gravity Workshop, New York, 1979, eds. P. Van Nieuwenhuizen and D. Freedman (North-Holland, Amsterdam) [16] J. A. Casas and A. Ibarra 171 Nucl. Phys., B 618 2001 Nucl. Phys. B 618 (2001) 171 S. Lavignac, I. Masina and C. A. Savoy 269 Phys. Lett., B 520 2001 Phys. Lett. B 520 (2001) 269 and hep-ph/0202086 [17] J. R. Ellis, D. V. Nanopoulos and K. A. Olive 65 Phys. Lett., B 508 2001 Phys. Lett. B 508 (2001) 65 [18] J. C. Romao, M. A. Diaz, M. Hirsch, W. Porod and J. W. Valle 071703 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 071703 113008 Phys. Rev., D 62 2000 Phys. Rev. D 62 (2000) 113008 [19] M. E. Gomez and K. Tamvakis 057701 Phys. Rev., D 58 1998 Phys. Rev. D 58 (1998) 057701 [20] M. Hirsch, W. Porod, J. W. F. Valle and J. C. Rom ao hep-ph/0202149 [21] L. M. Barkov et al., Research Proposal to PSI, 1999 http://www.icepp.s.u-tokyo.ac.jp/meg http://www.icepp.s.u-tokyo.ac.jp/meg [22] The homepage of the PRISM project http://www-prism.kek.jp/ http://www-prism.kek.jp/ Y. Kuno, Lep-ton Flavor Violation Experiments at KEK/JAERI Joint Project of High Intensity Proton Machine, in Proceedings of Workshop of "LOWNU/NOON 2000", Tokyo, December 4-8, 2000 [23] W. Porod, M. Hirsch, J. Rom ao and J. W. Valle 115004 Phys. Rev., D 63 2001 Phys. Rev. D 63 (2001) 115004 hep-ph/0204134 eng Dzuba, V A University of New South Wales Precise calculation of parity nonconservation in cesium and test of the standard model 12 Apr 2002 24 p We have calculated the 6s-7s parity nonconserving (PNC) E1 transition amplitude, E_{PNC}, in cesium. We have used an improved all-order technique in the calculation of the correlations and have included all significant contributionsto E_{PNC}. Our final value E_{PNC} = 0.904 (1 +/- 0.5 %) \times 10^{-11}iea_{B}(-Q_{W}/N) has half the uncertainty claimed in old calculations used for the interpretation of Cs PNC experiments. The resulting nuclear weak chargeQ_{W} for Cs deviates by about 2 standard deviations from the value predicted by the standard model. LANL EDS SzGeCERN Particle Physics - Phenomenology Flambaum, V V Ginges, J S M "Jacinda S.M. GINGES" <ginges@phys.unsw.edu.au> http://cdsware.cern.ch/download/invenio-demo-site-files/0204134.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204134.ps.gz CER n 200231 2002 11 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] I.B. Khriplovich, Parity Nonconservation in Atomic Phenomena (Gordon and Breach, Philadelphia, 1991) [2] M.-A. Bouchiat and C. Bouchiat 1351 Rep. Prog. Phys. 60 1997 Rep. Prog. Phys. 60 (1997) 1351 [3] C.S. Wood et al 1759 Science 275 1997 Science 275 (1997) 1759 [4] V.A. Dzuba, V.V. Flambaum, and O.P. Sushkov 147 Phys. Lett., A 141 1989 Phys. Lett. A 141 (1989) 147 [5] S.A. Blundell, W.R. Johnson, and J. Sapirstein 1411 Phys. Rev. Lett. 65 1990 Phys. Rev. Lett. 65 (1990) 1411 S.A. Blundell, J. Sapirstein, and W.R. Johnson 1602 Phys. Rev., D 45 1992 Phys. Rev. D 45 (1992) 1602 [6] R.J. Rafac, and C.E. Tanner, Phys. Rev., A58 1087 (1998); R.J. Rafac, C.E. Tanner, A.E. Livingston, and H.G. Berry, Phys. Rev., A60 3648 (1999) [7] S.C. Bennett, J.L. Roberts, and C.E. Wieman R16 Phys. Rev., A 59 1999 Phys. Rev. A 59 (1999) R16 [8] S.C. Bennett and C.E. Wieman 2484 Phys. Rev. Lett. 82 1999 Phys. Rev. Lett. 82 (1999) 2484 82, 4153(E) (1999); 83, 889(E) (1999) [9] R. Casalbuoni, S. De Curtis, D. Dominici, and R. Gatto 135 Phys. Lett., B 460 1999 Phys. Lett. B 460 (1999) 135 [10] J. L. Rosner 016006 Phys. Rev., D 61 1999 Phys. Rev. D 61 (1999) 016006 [11] J. Erler and P. Langacker 212 Phys. Rev. Lett. 84 2000 Phys. Rev. Lett. 84 (2000) 212 [12] A. Derevianko 1618 Phys. Rev. Lett. 85 2000 Phys. Rev. Lett. 85 (2000) 1618 [13] V.A. Dzuba, C. Harabati, W.R. Johnson, and M.S. Safronova 044103 Phys. Rev., A 63 2001 Phys. Rev. A 63 (2001) 044103 [14] M.G. Kozlov, S.G. Porsev, and I.I. Tupitsyn 3260 Phys. Rev. Lett. 86 2001 Phys. Rev. Lett. 86 (2001) 3260 [15] W.J. Marciano and A. Sirlin 552 Phys. Rev., D 27 1983 Phys. Rev. D 27 (1983) 552 W.J. Marciano and J.L. Rosner 2963 Phys. Rev. Lett. 65 1990 Phys. Rev. Lett. 65 (1990) 2963 [16] B.W. Lynn and P.G.H. Sandars 1469 J. Phys., B 27 1994 J. Phys. B 27 (1994) 1469 I. Bednyakov et al 012103 Phys. Rev., A 61 1999 Phys. Rev. A 61 (1999) 012103 [17] A.I. Milstein and O.P. Sushkov, e-print hep-ph/0109257 [18] W.R. Johnson, I. Bednyakov, and G. Soff 233001 Phys. Rev. Lett. 87 2001 Phys. Rev. Lett. 87 (2001) 233001 [19] A. Derevianko 012106 Phys. Rev., A 65 2002 Phys. Rev. A 65 (2002) 012106 [20] V.A. Dzuba and V.V. Flambaum 052101 Phys. Rev., A 62 2000 Phys. Rev. A 62 (2000) 052101 [21] V.A. Dzuba, V.V. Flambaum, and O.P. Sushkov R4357 Phys. Rev., A 56 1997 Phys. Rev. A 56 (1997) R4357 [22] D.E. Groom et al., Euro. Phys. J. C : 15 (2000) 1 [23] V.A. Dzuba, V.V. Flambaum, P.G. Silvestrov, and O.P. Sushkov 1399 J. Phys., B 20 1987 J. Phys. B 20 (1987) 1399 [24] V.A. Dzuba, V.V. Flambaum, and O.P. Sushkov 493 Phys. Lett., A 140 1989 Phys. Lett. A 140 (1989) 493 [25] V.A. Dzuba, V.V. Flambaum, A.Ya. Kraftmakher, and O.P. Sushkov 373 Phys. Lett., A 142 1989 Phys. Lett. A 142 (1989) 373 [26] G. Fricke et al 177 At. Data Nucl. Data Tables 60 1995 At. Data Nucl. Data Tables 60 (1995) 177 [27] A. Trzci´nska et al 082501 Phys. Rev. Lett. 87 2001 Phys. Rev. Lett. 87 (2001) 082501 [28] V.B. Berestetskii, E.M. Lifshitz, and L.P. Pitaevskii, Relativistic Quantum Theory (Pergamon Press, Oxford, 1982) [29] P.J. Mohr and Y.-K. Kim 2727 Phys. Rev., A 45 1992 Phys. Rev. A 45 (1992) 2727 P.J. Mohr 4421 Phys. Rev., A 46 1992 Phys. Rev. A 46 (1992) 4421 [30] L.W. Fullerton and G.A. Rinker, Jr 1283 Phys. Rev., A 13 1976 Phys. Rev. A 13 (1976) 1283 [31] E.H. Wichmann and N.M. Kroll 343 Phys. Rev. 101 1956 Phys. Rev. 101 (1956) 343 [32] A.I. Milstein and V.M. Strakhovenko 1247 Zh. Eksp. Teor. Fiz. 84 1983 Zh. Eksp. Teor. Fiz. 84 (1983) 1247 [33] V.V. Flambaum and V.G. Zelevinsky 3108 Phys. Rev. Lett. 83 1999 Phys. Rev. Lett. 83 (1999) 3108 [34] C.E. Moore, Natl. Stand. Ref. Data Ser. (U.S., Natl. Bur. Stand.), 3 (1971) [35] R.J. Rafac, C.E. Tanner, A.E. Livingston, and H.G. Berry 3648 Phys. Rev., A 60 1999 Phys. Rev. A 60 (1999) 3648 [36] M.-A. Bouchiat, J. Gu´ena, and L. Pottier, J. Phys.(France) Lett. : 45 (1984) L523 [37] E. Arimondo, M. Inguscio, and P. Violino 31 Rev. Mod. Phys. 49 1977 Rev. Mod. Phys. 49 (1977) 31 [38] S.L. Gilbert, R.N. Watts, and C.E. Wieman 581 Phys. Rev., A 27 1983 Phys. Rev. A 27 (1983) 581 [39] R.J. Rafac and C.E. Tanner 1027 Phys. Rev., A 56 1997 Phys. Rev. A 56 (1997) 1027 [40] M.-A. Bouchiat and J. Gu´ena, J. Phys.(France) : 49 (1988) 2037 [41] D. Cho et al 1007 Phys. Rev., A 55 1997 Phys. Rev. A 55 (1997) 1007 [42] A.A. Vasilyev, I.M. Savukov, M.S. Safronova, and H.G. Berry, e-print physics/0112071 hep-ph/0204135 eng Bertin, V Universite Blaise Pascal Neutrino Indirect Detection of Neutralino Dark Matter in the CMSSM 12 Apr 2002 16 p We study potential signals of neutralino dark matter indirect detection by neutrino telescopes in a wide range of CMSSM parameters. We also compare with direct detection potential signals taking into account in both cases present andfuture experiment sensitivities. Only models with neutralino annihilation into gauge bosons can satisfy cosmological constraints and current neutrino indirect detection sensitivities. For both direct and indirect detection, only nextgeneration experiments will be able to really test this kind of models. LANL EDS SzGeCERN Particle Physics - Phenomenology Nezri, E Orloff, J Jean Orloff <orloff@in2p3.fr> http://cdsware.cern.ch/download/invenio-demo-site-files/0204135.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204135.ps.gz 2002 11 2002-04-15 00 2002-04-15 BATCH n 200216 PREPRINT hep-ph/0204136 eng FISIST-14-2001-CFIF IPPP-01-58 DCPT-01-114 Branco, G C FCIF Supersymmetry and a rationale for small CP violating phases 12 Apr 2002 28 p We analyse the CP problem in the context of a supersymmetric extension of the standard model with universal strength of Yukawa couplings. A salient feature of these models is that the CP phases are constrained to be very small by thehierarchy of the quark masses, and the pattern of CKM mixing angles. This leads to a small amount of CP violation from the usual KM mechanism and a significant contribution from supersymmetry is required. Due to the large generationmixing in some of the supersymmetric interactions, the electric dipole moments impose severe constraints on the parameter space, forcing the trilinear couplings to be factorizable in matrix form. We find that the LL mass insertionsgive the dominant gluino contribution to saturate epsilon_K. The chargino contributions to epsilon'/epsilon are significant and can accommodate the experimental results. In this framework, the standard model gives a negligiblecontribution to the CP asymmetry in B-meson decay, a_{J/\psi K_s}. However, due to supersymmetric contributions to B_d-\bar{B}_d mixing, the recent large value of a_{J/\psi K_s} can be accommodated. LANL EDS SzGeCERN Particle Physics - Phenomenology Gomez, M E Khalil, S Teixeira, A M Shaaban Khalil <shaaban.khalil@durham.ac.uk> http://cdsware.cern.ch/download/invenio-demo-site-files/0204136.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204136.ps.gz CER n 200231 2002 11 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] A.G. Cohen, D.B. Kaplan and A.E. Nelson 27 Annu. Rev. Nucl. Part. Sci. 43 1993 Annu. Rev. Nucl. Part. Sci. 43 (1993) 27 M.B. Gavela, P. Hernandez, J. Orloff, O. P ene and C. Quimbay 345 Nucl. Phys., B 430 1994 Nucl. Phys. B 430 (1994) 345 382 Nucl. Phys., B 430 1994 Nucl. Phys. B 430 (1994) 382 A.D. Dolgov hep-ph/9707419 V.A. Rubakov and M.E. Shaposhnikov, Usp. Fiz. Nauk : 166 (1996) 493[ 461 Phys. Usp. 39 1996 Phys. Usp. 39 (1996) 461 ] [2] S. Abel, S. Khalil and O. Lebedev 151 Nucl. Phys., B 606 2001 Nucl. Phys. B 606 (2001) 151 [3] S. Pokorski, J. Rosiek and C. A. Savoy 81 Nucl. Phys., B 570 2000 Nucl. Phys. B 570 (2000) 81 [4] Recent Developments in Gauge Theories, Proceedings of Nato Advanced Study Insti-tute (Carg ese, 1979), edited by G. 't Hooft et al., Plenum, New York (1980) [5] G. C. Branco, J. I. Silva-Marcos and M. N. Rebelo 446 Phys. Lett., B 237 1990 Phys. Lett. B 237 (1990) 446 G. C. Branco, D. Emmanuel­Costa and J. I. Silva-Marcos 107 Phys. Rev., D 56 1997 Phys. Rev. D 56 (1997) 107 [6] P. M. Fishbane and P. Q. Hung 2743 Phys. Rev., D 57 1998 Phys. Rev. D 57 (1998) 2743 [7] P. Q. Hung and M. Seco hep-ph/0111013 [8] G. C. Branco and J. I. Silva-Marcos 166 Phys. Lett., B 359 1995 Phys. Lett. B 359 (1995) 166 [9] M. V. Romalis, W. C. Griffith and E. N. Fortson 2505 Phys. Rev. Lett. 86 2001 Phys. Rev. Lett. 86 (2001) 2505 J. P. Jacobs et al 3782 Phys. Rev. Lett. 71 1993 Phys. Rev. Lett. 71 (1993) 3782 [10] BABAR Collaboration, B. Aubert et al 091801 Phys. Rev. Lett. 87 2001 Phys. Rev. Lett. 87 (2001) 091801 [11] BELLE Collaboration, K. Abe et al 091802 Phys. Rev. Lett. 87 2001 Phys. Rev. Lett. 87 (2001) 091802 [12] G. Eyal and Y. Nir 21 Nucl. Phys., B 528 1998 Nucl. Phys. B 528 (1998) 21 and references therein [13] G. C. Branco, F. Cagarrinho and F. Krüger 224 Phys. Lett., B 459 1999 Phys. Lett. B 459 (1999) 224 [14] H. Fritzsch and J. Plankl 584 Phys. Rev., D 49 1994 Phys. Rev. D 49 (1994) 584 H. Fritzsch and P. Minkowski 393 Nuovo Cimento 30 1975 Nuovo Cimento 30 (1975) 393 H. Fritzsch and D. Jackson 365 Phys. Lett., B 66 1977 Phys. Lett. B 66 (1977) 365 P. Kaus and S. Meshkov 1863 Phys. Rev., D 42 1990 Phys. Rev. D 42 (1990) 1863 [15] H. Fusaoka and Y. Koide 3986 Phys. Rev., D 57 1998 Phys. Rev. D 57 (1998) 3986 [16] See for example, V. Barger, M. S. Berger and P. Ohmann 1093 Phys. Rev., D 47 1993 Phys. Rev. D 47 (1993) 1093 4908 Phys. Rev., D 49 1994 Phys. Rev. D 49 (1994) 4908 [17] G. C. Branco and J. I. Silva-Marcos 390 Phys. Lett., B 331 1994 Phys. Lett. B 331 (1994) 390 [18] Particle Data Group 1 Eur. Phys. J., C 15 2000 Eur. Phys. J. C 15 (2000) 1 [19] C. Jarlskog 1039 Phys. Rev. Lett. 55 1985 Phys. Rev. Lett. 55 (1985) 1039 491 Z. Phys., C 29 1985 Z. Phys. C 29 (1985) 491 [20] M. Dugan, B. Grinstein and L. J. Hall 413 Nucl. Phys., B 255 1985 Nucl. Phys. B 255 (1985) 413 [21] D. A. Demir, A. Masiero and O. Vives 230 Phys. Lett., B 479 2000 Phys. Lett. B 479 (2000) 230 S. M. Barr and S. Khalil 035005 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 035005 [22] S. A. Abel and J. M. Fr ere 1632 Phys. Rev., D 55 1997 Phys. Rev. D 55 (1997) 1632 S. Khalil, T. Kobayashi and A. Masiero 075003 Phys. Rev., D 60 1999 Phys. Rev. D 60 (1999) 075003 S. Khalil and T. Kobayashi 341 Phys. Lett., B 460 1999 Phys. Lett. B 460 (1999) 341 [23] S. Khalil, T. Kobayashi and O. Vives 275 Nucl. Phys., B 580 2000 Nucl. Phys. B 580 (2000) 275 T. Kobayashi and O. Vives 323 Phys. Lett., B 406 2001 Phys. Lett. B 406 (2001) 323 [24] S. Abel, D. Bailin, S. Khalil and O. Lebedev 241 Phys. Lett., B 504 2001 Phys. Lett. B 504 (2001) 241 [25] A. Masiero, M. Piai, A. Romanino and L. Silvestrini 075005 Phys. Rev., D 64 2001 Phys. Rev. D 64 (2001) 075005 and references therein [26] P. G. Harris et al 904 Phys. Rev. Lett. 82 1999 Phys. Rev. Lett. 82 (1999) 904 [27] M. Ciuchini et al 008 J. High Energy Phys. 10 1998 J. High Energy Phys. 10 (1998) 008 [28] S. Khalil and O. Lebedev 387 Phys. Lett., B 515 2001 Phys. Lett. B 515 (2001) 387 [29] A. J. Buras hep-ph/0101336 [30] See, for example, G. C. Branco, L. Lavoura and J. P. Silva, CP Violation, Interna-tional Series of Monographs on Physics (103), Oxford University Press, Clarendon (1999) [31] F. Gabbiani, E. Gabrielli, A. Masiero and L. Silverstrini 321 Nucl. Phys., B 477 1996 Nucl. Phys. B 477 (1996) 321 [32] V. Fanti et al 335 Phys. Lett., B 465 1999 Phys. Lett. B 465 (1999) 335 [33] T. Gershon (NA48) hep-ex/0101034 [34] A. J. Buras, M. Jamin and M. E. Lautenbacher 209 Nucl. Phys., B 408 1993 Nucl. Phys. B 408 (1993) 209 [35] S. Bertolini, M. Fabbrichesi and E. Gabrielli 136 Phys. Lett., B 327 1994 Phys. Lett. B 327 (1994) 136 [36] G. Colangelo and G. Isidori 009 J. High Energy Phys. 09 1998 J. High Energy Phys. 09 (1998) 009 A. Buras, G. Colan-gelo, G. Isidori, A. Romanino and L. Silvestrini 3 Nucl. Phys., B 566 2000 Nucl. Phys. B 566 (2000) 3 [37] OPAL Collaboration, K. Ackerstaff et al 379 Eur. Phys. J., C 5 1998 Eur. Phys. J. C 5 (1998) 379 CDF Collaboration, T. Affolder et al 072005 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 072005 CDF Collaboration, C. A. Blocker, Proceedings of 3rd Workshop on Physics and Detectors for DAPHNE (DAPHNE 99), Frascati, Italy, 16-19 Nov 1999; ALEPH Collaboration, R. Barate et al 259 Phys. Lett., B 492 2000 Phys. Lett. B 492 (2000) 259 [38] S. Bertolini, F. Borzumati, A. Masiero and G. Ridolfi 591 Nucl. Phys., B 353 1991 Nucl. Phys. B 353 (1991) 591 [39] CLEO Collaboration, S. Ahmed et al, CLEO-CONF-99-10 hep-ex/9908022 [40] E. Gabrielli, S. Khalil and E. Torrente­Lujan 3 Nucl. Phys., B 594 2001 Nucl. Phys. B 594 (2001) 3 hep-ph/0204137 eng DO-TH-02-05 Paschos, E A Univ. Dortmund Leptogenesis with Majorana neutrinos Dortmund Dortmund Univ. Inst. Phys. 12 Apr 2002 6 p I review the origin of the lepton asymmetry which is converted to a baryon excess at the electroweak scale. This scenario becomes more attractive if we can relate it to other physical phenomena. For this reason I elaborate on theconditions of the early universe which lead to a sizable lepton asymmetry. Then I describe methods and models which relate the low energy parameters of neutrinos to the high energy (cosmological) CP-violation and to neutrinolessdouble beta-decay. LANL EDS SzGeCERN Particle Physics - Phenomenology Emmanuel A. Paschos <paschos@hal1.physik.uni-dortmund.de> http://cdsware.cern.ch/download/invenio-demo-site-files/0204137.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204137.ps.gz CER n 200231 11 2002 2002-04-15 00 2002-04-15 BATCH CONTRIBUTED TO 1ST WORKSHOP ON NEUTRINO - NUCLEUS INTERACTIONS IN THE FEW GEV REGION (NUINT01) TSUKUBA JAPAN 13-16 DEC 2001 6 PAGES 6 FIGURES PREPRINT 1. Fukugida and Yanagida 45 Phys. Lett., B 174 1986 Phys. Lett. B 174 (1986) 45 2. M. Flanz, E.A. Paschos and U. Sarkar 248 Phys. Lett., B 345 1995 Phys. Lett. B 345 (1995) 248 3. M. Luty 445 Phys. Rev., D 45 1992 Phys. Rev. D 45 (1992) 445 4. M. Flanz, E.A. Paschos, U. Sarkar and J. Weiss 693 Phys. Lett., B 389 1996 Phys. Lett. B 389 (1996) 693 M. Flanz and E.A. Paschos 113009 Phys. Rev., D 58 1998 Phys. Rev. D 58 (1998) 113009 5. A. Pilaftsis 5431 Phys. Rev., D 56 1997 Phys. Rev. D 56 (1997) 5431 6. W. Buchmüller and M. Plümacher 354 Phys. Lett., B 431 1998 Phys. Lett. B 431 (1998) 354 7. L. Covi, E. Roulet and F. Vissani 169 Phys. Lett., B 384 1996 Phys. Lett. B 384 (1996) 169 8. E.K. Akhmedov, V.A. Rubakov and A.Y. Smirnov 1359 Phys. Rev. Lett. 81 1998 Phys. Rev. Lett. 81 (1998) 1359 9. S.Y. Khlepnikov and M.E. Shaposhnikov 885 Nucl. Phys., B 308 1988 Nucl. Phys. B 308 (1988) 885 and references therein 10. J. Ellis, S. Lola and D.V. Nanopoulos 87 Phys. Lett., B 452 1999 Phys. Lett. B 452 (1999) 87 11. G. Lazarides and N. Vlachos 482 Phys. Lett., B 459 1999 Phys. Lett. B 459 (1999) 482 12. M.S. Berger and B. Brahmachari 073009 Phys. Rev., D 60 1999 Phys. Rev. D 60 (1999) 073009 13. K. Kang, S.K. Kang and U. Sarkar 391 Phys. Lett., B 486 2000 Phys. Lett. B 486 (2000) 391 14. H. Goldberg 389 Phys. Lett., B 474 2000 Phys. Lett. B 474 (2000) 389 15. M. Hirsch and S.F. King 113005 Phys. Rev., D 64 2001 Phys. Rev. D 64 (2001) 113005 16. H. Nielsen and Y. Takanishi 241 Phys. Lett., B 507 2001 Phys. Lett. B 507 (2001) 241 17. W. Buchmüller and D. Wyler 291 Phys. Lett., B 521 2001 Phys. Lett. B 521 (2001) 291 18. Falcone and Tramontano 1 Phys. Lett., B 506 2001 Phys. Lett. B 506 (2001) 1 F. Buccella et al 241 Phys. Lett., B 524 2002 Phys. Lett. B 524 (2002) 241 19. G.C. Branco et al., Nucl. Phys., B617 (2001) 475 20. A.S. Joshipura, E.A. Paschos and W. Rodejo-hann 227 Nucl. Phys., B 611 2001 Nucl. Phys. B 611 (2001) 227 and 29 J. High Energy Phys. 0108 2001 J. High Energy Phys. 0108 (2001) 29 21. mentioned by H.V. Klapdor­Kleingrothaus, in hep-ph/0103062 hep-ph/0204138 eng DO-TH-02-06 Paschos, E A Univ. Dortmund Neutrino Interactions at Low and Medium Energies Dortmund Dortmund Univ. Inst. Phys. 12 Apr 2002 9 p We discuss the calculations for several neutrino induced reactions from low energies to the GeV region. Special attention is paid to nuclear corrections when the targets are medium or heavy nuclei. Finally, we present several ratiosof neutral to charged current reactions whose values on isoscalar targets can be estimated accurately. The ratios are useful for investigating neutrino oscillations in Long Baseline experiments. LANL EDS SzGeCERN Particle Physics - Phenomenology Emmanuel A. Paschos <paschos@hal1.physik.uni-dortmund.de> http://cdsware.cern.ch/download/invenio-demo-site-files/0204138.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204138.ps.gz CER n 200231 11 2002 2002-04-15 00 2002-04-15 BATCH CONTRIBUTED TO 1ST WORKSHOP ON NEUTRINO - NUCLEUS INTERACTIONS IN THE FEW GEV REGION (NUINT01) TSUKUBA JAPAN 13-16 DEC 2001 9 PAGES 15 FIGURES PREPRINT 1. E. A. Paschos, L. Pasquali and J. Y. Yu 263 Nucl. Phys., B 588 2000 Nucl. Phys. B 588 (2000) 263 2. E. A. Paschos and J. Y. Yu 033002 Phys. Rev., D 65 2002 Phys. Rev. D 65 (2002) 033002 3. C. Albright and C. Jarlskog 467 Nucl. Phys., B 84 1975 Nucl. Phys. B 84 (1975) 467 4. N. J. Baker et al 617 Phys. Rev., D 25 1982 Phys. Rev. D 25 (1982) 617 5. M. Hirai, S. Kumano and M. Miyama 034003 Phys. Rev., D 64 2001 Phys. Rev. D 64 (2001) 034003 6. K. J. Eskola, V. J. Kolhinen and P. V. Ru-uskanen 351 Nucl. Phys., B 535 1998 Nucl. Phys. B 535 (1998) 351 K. J. Eskola, V. J. Kolhinen, P. V. Ruuskanen and C. A. Salgado 645 Nucl. Phys., A 661 1999 Nucl. Phys. A 661 (1999) 645 7. See Figure 1 in Ref. [2] 8. P. A. Schreiner and F. V. von Hippel 333 Nucl. Phys., B 58 1973 Nucl. Phys. B 58 (1973) 333 9. S. L. Adler, S. Nussinov. and E. A. Paschos 2125 Phys. Rev., D 9 1974 Phys. Rev. D 9 (1974) 2125 10. S. L. Adler 2644 Phys. Rev., D 12 1975 Phys. Rev. D 12 (1975) 2644 11. P. Musset and J. P. Vialle 1 Phys. Rep. 39 1978 Phys. Rep. 39 (1978) 1 12. H. Kluttig, J. G. Morfin and W. Van Do-minick 446 Phys. Lett., B 71 1977 Phys. Lett. B 71 (1977) 446 13. R. Merenyi et al 743 Phys. Rev., D 45 1982 Phys. Rev. D 45 (1982) 743 14. S. K. Singh, M. T. Vicente-Vacas and E. Oset 23 Phys. Lett., B 416 1998 Phys. Lett. B 416 (1998) 23 15. E. A. Paschos and L. Wolfenstein 91 Phys. Rev., D 7 1973 Phys. Rev. D 7 (1973) 91 see equation (15) 16. E. A. Paschos, Precise Ratios for Neutrino-Nucleon and Neutrino-Nucleus Interactions, Dortmund preprint­DO­TH 02/02; hep-ph 0204090 17. G. Gounaris, E. A. Paschos and P. Porfyri-adis 63 Phys. Lett., B 525 2002 Phys. Lett. B 525 (2002) 63 18. J. Bouchez and I. Giomataris, CEA/Saclay internal note, DAPNIA/01-07, June 2001 hep-ph/0204139 eng Van Beveren, E University of Coimbra Remarks on the f_0(400-1200) scalar meson as the dynamically generated chiral partner of the pion 12 Apr 2002 15 p The quark-level linear sigma model is revisited, in particular concerning the identification of the f_0(400-1200) (or \sigma(600)) scalar meson as the chiral partner of the pion. We demonstrate the predictive power of the linearsigma model through the pi-pi and pi-N s-wave scattering lengths, as well as several electromagnetic, weak, and strong decays of pseudoscalar and vector mesons. The ease with which the data for these observables are reproduced in thelinear sigma model lends credit to the necessity to include the sigma as a fundamental q\bar{q} degree of freedom, to be contrasted with approaches like chiral perturbation theory or the confining NJL model of Shakin and Wang. LANL EDS SzGeCERN Particle Physics - Phenomenology Kleefeld, F Rupp, G Scadron, M D George Rupp <george@ajax.ist.utl.pt> http://cdsware.cern.ch/download/invenio-demo-site-files/0204139.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204139.ps.gz CER n 200231 2002 11 Beveren, Eef van Kleefeld, Frieder Rupp, George Scadron, Michael D. 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] D. E. Groom et al. [Particle Data Group Collaboration] 1 Eur. Phys. J., C 15 2000 Eur. Phys. J. C 15 (2000) 1 [2] N. Isgur and J. Speth 2332 Phys. Rev. Lett. 77 1996 Phys. Rev. Lett. 77 (1996) 2332 [3] N. A. Törnqvist and M. Roos 1575 Phys. Rev. Lett. 76 1996 Phys. Rev. Lett. 76 (1996) 1575 hep-ph/9511210 [4] M. Harada, F. Sannino, and J. Schechter 1603 Phys. Rev. Lett. 78 1997 Phys. Rev. Lett. 78 (1997) 1603 hep-ph/9609428 [5] E. van Beveren, T. A. Rijken, K. Metzger, C. Dullemond, G. Rupp, and J. E. Ribeiro 615 Z. Phys., C 30 1986 Z. Phys. C 30 (1986) 615 Eef van Beveren and George Rupp 469 Eur. Phys. J., C 10 1999 Eur. Phys. J. C 10 (1999) 469 hep-ph/9806246 [6] M. Boglione and M. R. Pennington hep-ph/0203149 [7] M. Gell-Mann and M. L´evy 705 Nuovo Cimento 16 1960 Nuovo Cimento 16 (1960) 705 also see V. de Alfaro, S. Fubini, G. Furlan, and C. Rossetti, in Currents in Hadron Physics, North-Holland Publ., Amsterdam, Chap. 5 (1973) [8] R. Delbourgo and M. D. Scadron 251 Mod. Phys. Lett., A 10 1995 Mod. Phys. Lett. A 10 (1995) 251 hep-ph/9910242 657 Int. J. Mod. Phys., A 13 1998 Int. J. Mod. Phys. A 13 (1998) 657 hep-ph/9807504 [9] Y. Nambu and G. Jona-Lasinio 345 Phys. Rev. 122 1961 Phys. Rev. 122 (1961) 345 [10] C. M. Shakin and Huangsheng Wang 094020 Phys. Rev., D 64 2001 Phys. Rev. D 64 (2001) 094020 [11] C. M. Shakin and Huangsheng Wang 014019 Phys. Rev., D 63 2000 Phys. Rev. D 63 (2000) 014019 [12] G. Rupp, E. van Beveren, and M. D. Scadron 078501 Phys. Rev., D 65 2002 Phys. Rev. D 65 (2002) 078501 hep-ph/0104087 [13] Eef van Beveren, George Rupp, and Michael D. Scadron 300 Phys. Lett., B 495 2000 Phys. Lett. B 495 (2000) 300 [Erratum 365 Phys. Lett., B 509 2000 Phys. Lett. B 509 (2000) 365 ] hep-ph/0009265 Frieder Kleefeld, Eef van Beveren, George Rupp, and Michael D. Scadron hep-ph/0109158 [14] M. D. Scadron 239 Phys. Rev., D 26 1982 Phys. Rev. D 26 (1982) 239 2076 Phys. Rev., D 29 1984 Phys. Rev. D 29 (1984) 2076 669 Mod. Phys. Lett., A 7 1992 Mod. Phys. Lett. A 7 (1992) 669 [15] M. L´evy 23 Nuovo Cimento, A 52 1967 Nuovo Cimento, A 52 (1967) 23 S. Gasiorowicz and D. A. Geffen 531 Rev. Mod. Phys. 41 1969 Rev. Mod. Phys. 41 (1969) 531 J. Schechter and Y. Ueda 2874 Phys. Rev., D 3 1971 Phys. Rev. D 3 (1971) 2874 [Erratum 987 Phys. Rev., D 8 1973 Phys. Rev. D 8 (1973) 987 ] [16] T. Eguchi 2755 Phys. Rev., D 14 1976 Phys. Rev. D 14 (1976) 2755 T. Eguchi 611 Phys. Rev., D 17 1978 Phys. Rev. D 17 (1978) 611 [17] The once-subtracted dispersion-relation result hep-ph/0204140 eng CERN-TH-2002-069 RM3-TH-02-4 Aglietti, U CERN A new model-independent way of extracting |V_ub/V_cb| Geneva CERN 12 Apr 2002 20 p The ratio between the photon spectrum in B -> X_s gamma and the differential semileptonic rate wrt the hadronic variable M_X/E_X is a short-distance quantity calculable in perturbation theory and independent of the Fermi motion ofthe b quark in the B meson. We present a NLO analysis of this ratio and show how it can be used to determine |V_ub/V_cb| independently of any model for the shape function. We also discuss how this relation can be used to test thevalidity of the shape-function theory on the data. LANL EDS SzGeCERN Particle Physics - Phenomenology Ciuchini, M Gambino, P Paolo Gambino <paolo.gambino@cern.ch> http://cdsware.cern.ch/download/invenio-demo-site-files/0204140.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204140.ps.gz CER n 200231 2002 11 TH CERN 2002-04-15 00 2002-04-15 BATCH Aglietti, Ugo Ciuchini, Marco Gambino, Paolo PREPRINT [1] I. I. Bigi, M. A. Shifman, N. G. Uraltsev and A. I. Vainshtein 496 Phys. Rev. Lett. 71 1993 Phys. Rev. Lett. 71 (1993) 496 [arXiv hep-ph/9304225 and 2467 Int. J. Mod. Phys., A 9 1994 Int. J. Mod. Phys. A 9 (1994) 2467 [arXiv hep-ph/9312359 [2] M. Neubert 4623 Phys. Rev., D 49 1994 Phys. Rev. D 49 (1994) 4623 [arXiv hep-ph/9312311 [3] R. Akhoury and I. Z. Rothstein 2349 Phys. Rev., D 54 1996 Phys. Rev. D 54 (1996) 2349 [arXiv hep-ph/9512303 A. K. Leibovich and I. Z. Rothstein 074006 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 074006 [arXiv hep-ph/9907391 A. K. Leibovich, I. Low and I. Z. Rothstein 053006 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 053006 [arXiv hep-ph/9909404 A. K. Leibovich, I. Low and I. Z. Rothstein 86 Phys. Lett., B 486 2000 Phys. Lett. B 486 (2000) 86 [arXiv hep-ph/0005124 M. Neubert 88 Phys. Lett., B 513 2001 Phys. Lett. B 513 (2001) 88 [arXiv hep-ph/0104280 A. K. Leibovich, I. Low and I. Z. Rothstein 83 Phys. Lett., B 513 2001 Phys. Lett. B 513 (2001) 83 [arXiv hep-ph/0105066 [4] V. D. Barger, C. S. Kim and R. J. Phillips 629 Phys. Lett., B 251 1990 Phys. Lett. B 251 (1990) 629 A. F. Falk, Z. Ligeti and M. B. Wise 225 Phys. Lett., B 406 1997 Phys. Lett. B 406 (1997) 225 [arXiv hep-ph/9705235 I. I. Bigi, R. D. Dikeman and N. Uraltsev 453 Eur. Phys. J., C 4 1998 Eur. Phys. J. C 4 (1998) 453 [arXiv hep-ph/9706520 [5] R. Barate et al. (ALEPH Coll.) 555 Eur. Phys. J., C 6 1999 Eur. Phys. J. C 6 (1999) 555 M. Acciarri et al. (L3 Coll.), Phys. Lett., B436 (1998); P. Abreu et al. (DELPHI Coll.) 14 Phys. Lett., B 478 2000 Phys. Lett. B 478 (2000) 14 G. Abbiendi et al. (OPAL Coll.) 399 Eur. Phys. J., C 21 2001 Eur. Phys. J. C 21 (2001) 399 [6] A. Bornheim [CLEO Coll.], arXiv hep-ex/0202019 [7] C. W. Bauer, Z. Ligeti and M. E. Luke 395 Phys. Lett., B 479 2000 Phys. Lett. B 479 (2000) 395 [arXiv hep-ph/0002161 [8] C. W. Bauer, Z. Ligeti and M. E. Luke 113004 Phys. Rev., D 64 2001 Phys. Rev. D 64 (2001) 113004 [arXiv hep-ph/0107074 [9] U. Aglietti, arXiv hep-ph/0010251 [10] U. Aglietti 308 Phys. Lett., B 515 2001 Phys. Lett. B 515 (2001) 308 [arXiv hep-ph/0103002 [11] U. Aglietti 293 Nucl. Phys., B 610 2001 Nucl. Phys. B 610 (2001) 293 [arXiv hep-ph/0104020 [12] A. Ali and E. Pietarinen 519 Nucl. Phys., B 154 1979 Nucl. Phys. B 154 (1979) 519 [13] G. Altarelli, N. Cabibbo, G. Corb o, L. Maiani and G. Martinelli 365 Nucl. Phys., B 208 1982 Nucl. Phys. B 208 (1982) 365 [14] R. L. Jaffe and L. Randall 79 Nucl. Phys., B 412 1994 Nucl. Phys. B 412 (1994) 79 [arXiv hep-ph/9306201 [15] M. Neubert 3392 Phys. Rev., D 49 1994 Phys. Rev. D 49 (1994) 3392 [arXiv hep-ph/9311325 [16] U. Aglietti, M. Ciuchini, G. Corb o, E. Franco, G. Martinelli and L. Silvestrini 411 Phys. Lett., B 432 1998 Phys. Lett. B 432 (1998) 411 [arXiv hep-ph/9804416 [17] S. Catani, L. Trentadue, G. Turnock and B. R. Webber 3 Nucl. Phys., B 407 1993 Nucl. Phys. B 407 (1993) 3 [18] V. Lubicz 116 Nucl. Phys. B, Proc. Suppl. 94 2001 Nucl. Phys. B, Proc. Suppl. 94 (2001) 116 [arXiv hep-lat/0012003 [19] A. J. Buras, M. Jamin, M. E. Lautenbacher and P. H. Weisz 37 Nucl. Phys., B 400 1993 Nucl. Phys. B 400 (1993) 37 [arXiv hep-ph/9211304 M. Ciuchini, E. Franco, G. Martinelli and L. Reina 403 Nucl. Phys., B 415 1994 Nucl. Phys. B 415 (1994) 403 [arXiv hep-ph/9304257 [20] K. Chetyrkin, M. Misiak and M. Munz 206 Phys. Lett., B 400 1997 Phys. Lett. B 400 (1997) 206 [Erratum 414 Phys. Lett., B 425 1997 Phys. Lett. B 425 (1997) 414 ] [arXiv hep-ph/9612313 and refs. therein [21] P. Gambino and M. Misiak 338 Nucl. Phys., B 611 2001 Nucl. Phys. B 611 (2001) 338 [arXiv hep-ph/0104034 [22] M.B. Voloshin 275 Phys. Lett., B 397 1997 Phys. Lett. B 397 (1997) 275 A. Khodjamirian et al 167 Phys. Lett., B 402 1997 Phys. Lett. B 402 (1997) 167 Z. Ligeti, L. Randall and M.B. Wise 178 Phys. Lett., B 402 1997 Phys. Lett. B 402 (1997) 178 A.K. Grant, A.G. Morgan, S. Nussinov and R.D. Peccei 3151 Phys. Rev., D 56 1997 Phys. Rev. D 56 (1997) 3151 G. Buchalla, G. Isidori and S.J. Rey 594 Nucl. Phys., B 511 1998 Nucl. Phys. B 511 (1998) 594 [23] P. Gambino and U. Haisch 020 J. High Energy Phys. 0110 2001 J. High Energy Phys. 0110 (2001) 020 [arXiv hep-ph/0109058 and 001 J. High Energy Phys. 0009 2000 J. High Energy Phys. 0009 (2000) 001 [arXiv hep-ph/0007259 [24] F. De Fazio and M. Neubert 017 J. High Energy Phys. 9906 1999 J. High Energy Phys. 9906 (1999) 017 [arXiv hep-ph/9905351 [25] U. Aglietti, arXiv hep-ph/0105168 to appear in the Proceedings of "XIII Convegno sulla Fisica al LEP (LEPTRE 2001)", Rome (Italy), 18-20 April 2001 [26] T. van Ritbergen 353 Phys. Lett., B 454 1999 Phys. Lett. B 454 (1999) 353 [27] C. W. Bauer, M. E. Luke and T. Mannel, arXiv hep-ph/0102089 [28] The Particle Data Group 1 Eur. Phys. J., C 15 2000 Eur. Phys. J. C 15 (2000) 1 [29] M. Ciuchini et al 013 J. High Energy Phys. 0107 2001 J. High Energy Phys. 0107 (2001) 013 [arXiv hep-ph/0012308 [30] S. Chen et al., CLEO Coll 251807 Phys. Rev. Lett. 87 2001 Phys. Rev. Lett. 87 (2001) 251807 [31] N. Pott 938 Phys. Rev. D 54 1996 Phys. Rev., D 54 (1996) 938 [arXiv hep-ph/9512252 [32] C. Greub, T. Hurth and D. Wyler 3350 Phys. Rev. D 54 1996 Phys. Rev., D 54 (1996) 3350 [arXiv hep-ph/9603404 A. J. Buras, A. Czarnecki, M. Misiak and J. Urban 488 Nucl. Phys. B 611 2001 Nucl. Phys., B 611 (2001) 488 [arXiv hep-ph/0105160 [33] A. J. Buras, A. Czarnecki, M. Misiak and J. Urban, arXiv hep-ph/0203135 hep-ph/0204141 eng Appelquist, T Yale University Neutrino Masses in Theories with Dynamical Electroweak Symmetry Breaking 12 Apr 2002 4 p We address the problem of accounting for light neutrino masses in theories with dynamical electroweak symmetry breaking. As a possible solution, we embed (extended) technicolor in a theory in which a $|\Delta L|=2$ neutrinocondensate forms at a scale $\Lambda_N \gsim 10^{11}$ GeV, and produces acceptably small (Majorana) neutrino masses. We present an explicit model illustrating this mechanism. LANL EDS SzGeCERN Particle Physics - Phenomenology Shrock, R Robert Shrock <shrock@insti.physics.sunysb.edu> http://cdsware.cern.ch/download/invenio-demo-site-files/0204141.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204141.ps.gz CER n 200231 2002 11 Appelquist, Thomas Shrock, Robert 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] S. Fukuda, et al 5651 Phys. Rev. Lett. 86 2001 Phys. Rev. Lett. 86 (2001) 5651 S. Fukuda et al. ibid, 5656 (2001) (SuperK) and Q.R. Ahmad et al 071301 Phys. Rev. Lett. 87 2001 Phys. Rev. Lett. 87 (2001) 071301 (SNO). Other data is from the Homestake, Kamiokande, GALLEX, and SAGE experiments [2] Y. Fukuda et al 9 Phys. Lett., B 433 1998 Phys. Lett. B 433 (1998) 9 1562 Phys. Rev. Lett. 81 1998 Phys. Rev. Lett. 81 (1998) 1562 2644 Phys. Rev. Lett. 82 1999 Phys. Rev. Lett. 82 (1999) 2644 185 Phys. Lett., B 467 1999 Phys. Lett. B 467 (1999) 185 (SuperK) and data from Kamiokande, IMB, Soudan-2, and MACRO experiments. The data, which is consistent with results from K2K, indicates that |m( 3)2 - m( 2)2| |m( 3)2 - m( 1)2| 2.5 × 10-3 eV2. With a hierarchical mass assumption, one infers m( 3) m232 0.05 eV [3] S. Weinberg 1277 Phys. Rev., D 19 1979 Phys. Rev. D 19 (1979) 1277 L. Susskind 2619 Phys. Rev., D 20 1979 Phys. Rev. D 20 (1979) 2619 E. Eichten and K. Lane 125 Phys. Lett., B 90 1980 Phys. Lett. B 90 (1980) 125 [4] P. Sikivie, L. Susskind, M. Voloshin, and V. Zakharov 189 Nucl. Phys., B 173 1980 Nucl. Phys. B 173 (1980) 189 [5] B. Holdom 301 Phys. Lett., B 150 1985 Phys. Lett. B 150 (1985) 301 K Yamawaki, M. Bando, and K. Matumoto 1335 Phys. Rev. Lett. 56 1986 Phys. Rev. Lett. 56 (1986) 1335 T. Appelquist, D. Karabali, and L. Wijeward-hana 957 Phys. Rev. Lett. 57 1986 Phys. Rev. Lett. 57 (1986) 957 T. Appelquist and L.C.R. Wijewardhana 774 Phys. Rev., D 35 1987 Phys. Rev. D 35 (1987) 774 568 Phys. Rev., D 36 1987 Phys. Rev. D 36 (1987) 568 [6] B. Holdom 1637 Phys. Rev., D 23 1981 Phys. Rev. D 23 (1981) 1637 169 Phys. Lett., B 246 1990 Phys. Lett. B 246 (1990) 169 [7] T. Appelquist and J. Terning 139 Phys. Lett., B 315 1993 Phys. Lett. B 315 (1993) 139 T. Appelquist, J. Terning, and L. Wijewardhana 1214 Phys. Rev. Lett. 77 1996 Phys. Rev. Lett. 77 (1996) 1214 2767 Phys. Rev. Lett. 79 1997 Phys. Rev. Lett. 79 (1997) 2767 T. Appelquist, N. Evans, S. Selipsky 145 Phys. Lett., B 374 1996 Phys. Lett. B 374 (1996) 145 T. Appelquist and S. Selipsky 364 Phys. Lett., B 400 1997 Phys. Lett. B 400 (1997) 364 [8] T. Appelquist, J. Terning 2116 Phys. Rev., D 50 1994 Phys. Rev. D 50 (1994) 2116 [9] Recent reviews include R. Chivukula hep-ph/0011264 K. Lane hep-ph/0202255 C. Hill E. Simmons hep-ph/0203079 [10] M. Gell-Mann, P. Ramond, R. Slansky, in Supergrav-ity (North Holland, Amsterdam, 1979), p. 315; T. Yanagida in proceedings of Workshop on Unified Theory and Baryon Number in the Universe, KEK, 1979 [11] Although we require our model to yield a small S, a re-analysis of precision electroweak data is called for in view of the value of sin2 W reported in G. Zeller et al 091802 Phys. Rev. Lett. 88 2002 Phys. Rev. Lett. 88 (2002) 091802 [12] For a vectorial SU(N) theory with Nf fermions in the fundamental representation, an IRFP occurs if Nf > Nf,min,IR, where, perturbatively, Nf,min,IR 34N3/(13N2 -3). At this IRFP, using the criticality con-dition [13], the theory is expected to exist in a confining phase with S SB if Nf,min,IR < Nf < Nf,con, where Nf,con (2/5)N(50N2 - 33)/(5N2 - 3) and in a confor-mal phase if Nf,con < Nf < 11N/2. For N = 2 we have Nf,min,IR 5 and Nf,con 8, respectively. For attempts at lattice measurements, see R. Mawhinney 57 Nucl. Phys. B, Proc. Suppl. 83 2000 Nucl. Phys. B, Proc. Suppl. 83 (2000) 57 [13] In the approximation of a single-gauge-boson exchange, the critical coupling for the condensation of fermion rep-resentations R1 × R2 Rc is 3 2 C2 = 1, where C2 = [C2(R1) + C2(R2) - C2(Rc)], and C2(R) is the quadratic Casimir invariant. Instanton contributions are also important [7] [14] J. Gasser, H. Leutwyler 77 Phys. Rep. 87 1982 Phys. Rep. 87 (1982) 77 H. Leutwyler, in 108 Nucl. Phys. B, Proc. Suppl. 94 2001 Nucl. Phys. B, Proc. Suppl. 94 (2001) 108 [15] A. Ali Khan et al 4674 Phys. Rev. Lett. 85 2000 Phys. Rev. Lett. 85 (2000) 4674 M. Wingate et al., Int. J. Mod. Phys., A16 S B1 (2001) 585 [16] Here a = exp[ETC,a fF (dµ/µ) ( (µ))], and in walking TC theories the anomalous dimension 1 so a ETC,a/fF [17] By convention, we write SM-singlet neutrinos as right-handed fields j,R. These are assigned lepton number 1. Thus, in writing SU(4)PS SU(3)c × U(1), the U(1) is not U(1)B-L since some neutrinos in the model are SU(4)PS-singlet states [18] Z. Maki, M. Nakagawa, S. Sakata 870 Prog. Theor. Phys. 28 1962 Prog. Theor. Phys. 28 (1962) 870 (2 × 2 matrix); B. W. Lee, S. Pakvasa, R. Shrock, and H. Sugawara 937 Phys. Rev. Lett. 38 1977 Phys. Rev. Lett. 38 (1977) 937 (3 × 3 matrix) [19] T. Appelquist and R. Shrock, to appear [20] K. Dienes, E. Dudas, T. Gherghetta 25 Nucl. Phys., B 557 1999 Nucl. Phys. B 557 (1999) 25 N. Arkani-Hamed, S. Dimopoulos, G. Dvali, and J. March-Russell hep-ph/9811448 T. Appelquist, B. Dobrescu, E. Ponton, and H.-U. Yee hep-ph/0201131 hep-th/0204100 eng LBNL-50097 UCB-PTH-02-14 Gaillard, M K University of California, Berkeley Modular Invariant Anomalous U(1) Breaking Berkeley, CA Lawrence Berkeley Nat. Lab. 11 Apr 2002 19 p We describe the effective supergravity theory present below the scale of spontaneous gauge symmetry breaking due to an anomalous U(1), obtained by integrating out tree-level interactions of massive modes. A simple case is examined insome detail. We find that the effective theory can be expressed in the linear multiplet formulation, with some interesting consequences. Among them, the modified linearity conditions lead to new interactions not present in the theorywithout an anomalous U(1). These additional interactions are compactly expressed through a superfield functional. LANL EDS SzGeCERN Particle Physics - Theory Giedt, J Mary K Gaillard <gaillard@thsrv.lbl.gov> http://cdsware.cern.ch/download/invenio-demo-site-files/0204100.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204100.ps.gz CER n 200231 2002 11 Gaillard, Mary K. Giedt, Joel 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] J. Giedt 1 Ann. Phys. (N.Y.) 297 2002 Ann. Phys. (N.Y.) 297 (2002) 1 hep-th/0108244 [2] M. Dine, N. Seiberg and E. Witten 585 Nucl. Phys., B 289 1987 Nucl. Phys. B 289 (1987) 585 J. J. Atick, L. Dixon and A. Sen 109 Nucl. Phys., B 292 1987 Nucl. Phys. B 292 (1987) 109 M. Dine, I. Ichinose and N. Seiberg 253 Nucl. Phys., B 293 1987 Nucl. Phys. B 293 (1987) 253 [3] M. B. Green and J. H. Schwarz 117 Phys. Lett., B 149 1984 Phys. Lett. B 149 (1984) 117 [4] P. Bin´etruy, G. Girardi and R. Grimm 111 Phys. Lett., B 265 1991 Phys. Lett. B 265 (1991) 111 [5] M. Müller 292 Nucl. Phys., B 264 1986 Nucl. Phys. B 264 (1986) 292 P. Bin´etruy, G. Girardi, R. Grimm and M. Müller 389 Phys. Lett., B 189 1987 Phys. Lett. B 189 (1987) 389 [6] P. Bin´etruy, G. Girardi and R. Grimm 255 Phys. Rep. 343 2001 Phys. Rep. 343 (2001) 255 [7] G. Girardi and R. Grimm 49 Ann. Phys. (N.Y.) 272 1999 Ann. Phys. (N.Y.) 272 (1999) 49 [8] P. Bin´etruy, M. K. Gaillard and Y.-Y. Wu 109 Nucl. Phys., B 481 1996 Nucl. Phys. B 481 (1996) 109 [9] P. Bin´etruy, M. K. Gaillard and Y.-Y. Wu 27 Nucl. Phys., B 493 1997 Nucl. Phys. B 493 (1997) 27 P. Bin´etruy, M. K. Gaillard and Y.-Y. Wu 288 Phys. Lett., B 412 1997 Phys. Lett. B 412 (1997) 288 [10] M. K. Gaillard, B. Nelson and Y.-Y. Wu 549 Phys. Lett., B 459 1999 Phys. Lett. B 459 (1999) 549 [11] M. K. Gaillard and B. Nelson 3 Nucl. Phys., B 571 2000 Nucl. Phys. B 571 (2000) 3 [12] S. Ferrara, C. Kounnas and M. Porrati 263 Phys. Lett., B 181 1986 Phys. Lett. B 181 (1986) 263 [13] M. Cveti c, J. Louis and B. A. Ovrut 227 Phys. Lett., B 206 1988 Phys. Lett. B 206 (1988) 227 L. E. Iba nez and D. Lüst 305 Nucl. Phys., B 382 1992 Nucl. Phys. B 382 (1992) 305 [14] M. K. Gaillard 125 Phys. Lett., B 342 1995 Phys. Lett. B 342 (1995) 125 105027 Phys. Rev., D 58 1998 Phys. Rev. D 58 (1998) 105027 D : 61 (2000) 084028 [15] E. Witten 151 Phys. Lett., B 155 1985 Phys. Lett. B 155 (1985) 151 [16] L. J. Dixon, V. S. Kaplunovsky and J. Louis 27 Nucl. Phys., B 329 1990 Nucl. Phys. B 329 (1990) 27 [17] S.J. Gates, M. Grisaru, M. Ro cek and W. Siegel, Superspace (Benjamin/Cummings, 1983) [18] M.K. Gaillard and T.R. Taylor 577 Nucl. Phys., B 381 1992 Nucl. Phys. B 381 (1992) 577 [19] J. Wess and J. Bagger, Supersymmetry and supergravity (Princeton, 1992) [20] P. Bin´etruy, C. Deffayet and P. Peter 163 Phys. Lett., B 441 1998 Phys. Lett. B 441 (1998) 163 [21] M. K. Gaillard and J. Giedt, in progress hep-ph/0204142 eng Chacko, Z University of California, Berkeley Fine Structure Constant Variation from a Late Phase Transition Berkeley, CA Lawrence Berkeley Nat. Lab. 12 Apr 2002 9 p Recent experimental data indicates that the fine structure constant alpha may be varying on cosmological time scales. We consider the possibility that such a variation could be induced by a second order phase transition which occursat late times (z ~ 1 - 3) and involves a change in the vacuum expectation value (vev) of a scalar with milli-eV mass. Such light scalars are natural in supersymmetric theories with low SUSY breaking scale. If the vev of this scalarcontributes to masses of electrically charged fields, the low-energy value of alpha changes during the phase transition. The observational predictions of this scenario include isotope-dependent deviations from Newtonian gravity atsub-millimeter distances, and (if the phase transition is a sharp event on cosmological time scales) the presence of a well-defined step-like feature in the alpha(z) plot. The relation between the fractional changes in alpha and theQCD confinement scale is highly model dependent, and even in grand unified theories the change in alpha does not need to be accompanied by a large shift in nucleon masses. LANL EDS SzGeCERN Particle Physics - Phenomenology Grojean, C Perelstein, M Maxim Perelstein <meperelstein@lbl.gov> http://cdsware.cern.ch/download/invenio-demo-site-files/0204142.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204142.ps.gz CER n 200231 2002 11 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] J. K. Webb, M. T. Murphy, V. V. Flambaum, V. A. Dzuba, J. D. Barrow, C. W. Churchill, J. X. Prochaska and A. M. Wolfe 091301 Phys. Rev. Lett. 87 2001 Phys. Rev. Lett. 87 (2001) 091301 astro-ph/0012539 see also J. K. Webb, V. V. Flambaum, C. W. Churchill, M. J. Drinkwater and J. D. Barrow 884 Phys. Rev. Lett. 82 1999 Phys. Rev. Lett. 82 (1999) 884 astro-ph/9803165 V. A. Dzuba, V. V. Flambaum, and J. K. Webb 888 Phys. Rev. Lett. 82 1999 Phys. Rev. Lett. 82 (1999) 888 [2] P. A. Dirac 323 Nature 139 1937 Nature 139 (1937) 323 for an historial perspective, see F. Dyson, "The fundamental constants and their time variation", in Aspects of Quantum Theory, eds A. Salam and E. Wigner [3] T. Damour gr-qc/0109063 [4] J. D. Bekenstein 1527 Phys. Rev., D 25 1982 Phys. Rev. D 25 (1982) 1527 [5] G. R. Dvali and M. Zaldarriaga 091303 Phys. Rev. Lett. 88 2002 Phys. Rev. Lett. 88 (2002) 091303 hep-ph/0108217 [6] K. A. Olive and M. Pospelov 085044 Phys. Rev., D 65 2002 Phys. Rev. D 65 (2002) 085044 hep-ph/0110377 [7] T. Banks, M. Dine and M. R. Douglas 131301 Phys. Rev. Lett. 88 2002 Phys. Rev. Lett. 88 (2002) 131301 hep-ph/0112059 [8] P. Langacker, G. Segr e and M. J. Strassler 121 Phys. Lett., B 528 2002 Phys. Lett. B 528 (2002) 121 hep-ph/0112233 [9] A. Y. Potekhin, A. V. Ivanchik, D. A. Varshalovich, K. M. Lanzetta, J. A. Bald-win, G. M. Williger and R. F. Carswell 523 Astrophys. J. 505 1998 Astrophys. J. 505 (1998) 523 astro-ph/9804116 [10] S. Weinberg 3357 Phys. Rev., D 9 1974 Phys. Rev. D 9 (1974) 3357 L. Dolan and R. Jackiw 3320 Phys. Rev., D 9 1974 Phys. Rev. D 9 (1974) 3320 [11] N. Arkani-Hamed, L. J. Hall, C. Kolda and H. Murayama 4434 Phys. Rev. Lett. 85 2000 Phys. Rev. Lett. 85 (2000) 4434 astro-ph/0005111 [12] M. Dine, W. Fischler and M. Srednicki 575 Nucl. Phys., B 189 1981 Nucl. Phys. B 189 (1981) 575 S. Dimopou-los and S. Raby 353 Nucl. Phys., B 192 1981 Nucl. Phys. B 192 (1981) 353 L. Alvarez-Gaum´e, M. Claudson and M. B. Wise 96 Nucl. Phys., B 207 1982 Nucl. Phys. B 207 (1982) 96 M. Dine and A. E. Nelson 1277 Phys. Rev., D 48 1993 Phys. Rev. D 48 (1993) 1277 hep-ph/9303230 M. Dine, A. E. Nelson and Y. Shirman 1362 Phys. Rev., D 51 1995 Phys. Rev. D 51 (1995) 1362 hep-ph/9408384 M. Dine, A. E. Nelson, Y. Nir and Y. Shirman 2658 Phys. Rev., D 53 1996 Phys. Rev. D 53 (1996) 2658 hep-ph/9507378 [13] N. Arkani-Hamed, S. Dimopoulos, N. Kaloper and R. Sundrum 193 Phys. Lett., B 480 2000 Phys. Lett. B 480 (2000) 193 hep-th/0001197 S. Kachru, M. Schulz and E. Silverstein 045021 Phys. Rev., D 62 2000 Phys. Rev. D 62 (2000) 045021 hep-th/0001206 C. Cs´aki, J. Erlich and C. Grojean 312 Nucl. Phys., B 604 2001 Nucl. Phys. B 604 (2001) 312 hep-th/0012143 [14] X. Calmet and H. Fritzsch hep-ph/0112110 H. Fritzsch hep-ph/0201198 [15] G. R. Dvali and S. Pokorski 126 Phys. Lett., B 379 1996 Phys. Lett. B 379 (1996) 126 hep-ph/9601358 [16] Z. Chacko and R. N. Mohapatra 2836 Phys. Rev. Lett. 82 1999 Phys. Rev. Lett. 82 (1999) 2836 hep-ph/9810315 [17] J. P. Turneaure, C. M. Will, B. F. Farrell, E. M. Mattison and R. F. C. Vessot 1705 Phys. Rev., D 27 1983 Phys. Rev. D 27 (1983) 1705 J. D. Prestage, R. L. Tjoelker and L. Maleki 3511 Phys. Rev. Lett. 74 1995 Phys. Rev. Lett. 74 (1995) 3511 [18] A. I. Shlyakhter 340 Nature 264 1976 Nature 264 (1976) 340 T. Damour and F. Dyson 37 Nucl. Phys., B 480 1996 Nucl. Phys. B 480 (1996) 37 hep-ph/9606486 Y. Fujii, A. Iwamoto, T. Fukahori, T. Ohnuki, M. Nakagawa, H. Hidaka, Y. Oura, P. Möller 377 Nucl. Phys., B 573 2000 Nucl. Phys. B 573 (2000) 377 hep-ph/9809549 [19] E. W. Kolb, M. J. Perry and T. P. Walker 869 Phys. Rev., D 33 1986 Phys. Rev. D 33 (1986) 869 B. A. Camp-bell and K. A. Olive 429 Phys. Lett., B 345 1995 Phys. Lett. B 345 (1995) 429 hep-ph/9411272 L. Bergström, S. Iguri and H. Rubinstein 045005 Phys. Rev., D 60 1999 Phys. Rev. D 60 (1999) 045005 astro-ph/9902157 P. P. Avelino et al 103505 Phys. Rev., D 64 2001 Phys. Rev. D 64 (2001) 103505 astro-ph/0102144 [20] S. Hannestad 023515 Phys. Rev., D 60 1999 Phys. Rev. D 60 (1999) 023515 astro-ph/9810102 M. Kaplinghat, R. J. Scherrer and M. S. Turner 023516 Phys. Rev., D 60 1999 Phys. Rev. D 60 (1999) 023516 astro-ph/9810133 P. P. Avelino, C. J. Martins, G. Rocha and P. Viana 123508 Phys. Rev., D 62 2000 Phys. Rev. D 62 (2000) 123508 astro-ph/0008446 [21] C. D. Hoyle, U. Schmidt, B. R. Heckel, E. G. Adelberger, J. H. Gundlach, D. J. Kap-ner and H. E. Swanson 1418 Phys. Rev. Lett. 86 2001 Phys. Rev. Lett. 86 (2001) 1418 hep-ph/0011014 E. G. Adelberger [EOT-WASH Group Collaboration] hep-ex/0202008 [22] S. Coleman, Aspects of symmetry. (Cambridge Univ. Press, 1985.) hep-ph/0204143 eng Domin, P Comenius University Phenomenological Study of Solar-Neutrino Induced Double Beta Decay of Mo100 12 Apr 2002 8 p The detection of solar-neutrinos of different origin via induced beta beta process of Mo100 is investigated. The particular counting rates and energy distributions of emitted electrons are presented. A discussion in respect tosolar-neutrino detector consisting of 10 tones of Mo100 is included. Both the cases of the standard solar model and neutrino oscillation scenarios are analyzed. Moreover, new beta^- beta^+ and beta^-/EC channels of the double-betaprocess are introduced and possibilities of their experimental observation are addressed. LANL EDS SzGeCERN Particle Physics - Phenomenology Simkovic, F Semenov, S V Gaponov, Y V Pavol Domin <domin@chavena.dnp.fmph.uniba.sk> http://cdsware.cern.ch/download/invenio-demo-site-files/0204143.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204143.ps.gz CER n 200231 2002 11 Gaponov, Yu. V. 2002-04-15 00 2002-04-15 BATCH 8 PAGES LATEX 2 POSTSCRIPT FIGURES TALK PRESENTED BY P DOMIN ON THE WORKSHOP MEDEX'01 (PRAGUE JUNE 2001) TO APPEAR IN CZECH J PHYS 52 (2002) PREPRINT [1] S. M. Bilenky, C. Giunti and W. Grimus 1 Prog. Part. Nucl. Phys. 45 1999 Prog. Part. Nucl. Phys. 45 (1999) 1 [2] J. N. Bahcall, S. Basu and M. H. Pinsonneault 1 Phys. Lett., B 433 1998 Phys. Lett. B 433 (1998) 1 [3] R. Davis Jr 13 Prog. Part. Nucl. Phys. 32 1994 Prog. Part. Nucl. Phys. 32 (1994) 13 [4] Kamiokande Collaboration, Y Fukuda et al 1683 Phys. Rev. Lett. 77 1996 Phys. Rev. Lett. 77 (1996) 1683 [5] SAGE collaboration, A. I. Abazov et al 3332 Phys. Rev. Lett. 67 1991 Phys. Rev. Lett. 67 (1991) 3332 D. N. Abdurashitov et al 4708 Phys. Rev. Lett. 77 1996 Phys. Rev. Lett. 77 (1996) 4708 [6] GALLEX collaboration, P. Anselmann et al 376 Phys. Lett., B 285 1992 Phys. Lett. B 285 (1992) 376 W. Hampel et al 384 Phys. Lett., B 388 1996 Phys. Lett. B 388 (1996) 384 [7] Super-Kamiokande Coll., S. Fukuda et al 5651 Phys. Rev. Lett. 86 2001 Phys. Rev. Lett. 86 (2001) 5651 [8] SNO Collaboration, Q.R. Ahmad et. al 071301 Phys. Rev. Lett. 87 2001 Phys. Rev. Lett. 87 (2001) 071301 [9] H. Ejiri et al., Phys. Rev. Lett.85 2917 (2000); H. Ejiri 265 Phys. Rep. 338 2000 Phys. Rep. 338 (2000) 265 [10] S. V. Semenov, Yu. V. Gaponov and R. U. Khafizov 1379 Yad. Fiz. 61 1998 Yad. Fiz. 61 (1998) 1379 [11] L. V. Inzhechik, Yu. V. Gaponov and S. V. Semenov 1384 Yad. Fiz. 61 1998 Yad. Fiz. 61 (1998) 1384 [12] http://www.sns.ias.edu/~jnb. http://www.sns.ias.edu/~jnb [13] B. Singh et al 478 Nucl. Data Sheets 84 1998 Nucl. Data Sheets 84 (1998) 478 [14] H. Akimune et al 23 Phys. Lett., B 394 1997 Phys. Lett. B 394 (1997) 23 [15] J. N. Bahcall, P. I. Krastev, and A. Yu. Smirnov 096016 Phys. Rev., D 58 1998 Phys. Rev. D 58 (1998) 096016 hep-th/0204101 eng CSULB-PA-02-2 Nishino, H California State University Axisymmetric Gravitational Solutions as Possible Classical Backgrounds around Closed String Mass Distributions 12 Apr 2002 15 p By studying singularities in stationary axisymmetric Kerr and Tomimatsu-Sato solutions with distortion parameter \d = 2, 3, ... in general relativity, we conclude that these singularities can be regarded as nothing other than closedstring-like circular mass distributions. We use two different regularizations to identify \d-function type singularities in the energy-momentum tensor for these solutions, realizing a regulator independent result. This result givessupporting evidence that these axisymmetric exact solutions may well be the classical solutions around closed string-like mass distributions, just like Schwarzschild solution corresponding to a point mass distribution. In otherwords, these axisymmetric exact solutions may well provide the classical backgrounds around closed strings. LANL EDS SzGeCERN Particle Physics - Theory Rajpoot, S Hitoshi Nishino <hnishino@csulb.edu> http://cdsware.cern.ch/download/invenio-demo-site-files/0204101.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204101.ps.gz CER n 200231 2002 11 Nishino, Hitoshi Rajpoot, Subhash 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] K. Schwarzschild, Sitzungsberichte Preuss. Akad. Wiss., 424 (1916) [2] M. Green, J.H. Schwarz and E. Witten, `Superstring Theory', Vols. I and II, Cambridge University Press (1987) [3] J. Chazy, Bull. Soc. Math. France: 52 (1924) 17H.E.J. Curzon, Proc. London Math. Soc. : 23 (1924) 477 [4] P. Ho rava and E. Witten 506 Nucl. Phys., B 460 1996 Nucl. Phys. B 460 (1996) 506 94 Nucl. Phys., B 475 1996 Nucl. Phys. B 475 (1996) 94 [5] N. Arkani-Hamed, S. Dimopoulos and G. Dvali 263 Phys. Lett. 429 1998 Phys. Lett. 429 (1998) 263 I. Anto-niadis, N. Arkani-Hamed, S. Dimopoulos and G. Dvali 257 Phys. Lett. 436 1998 Phys. Lett. 436 (1998) 257 [6] L. Randall and R. Sundrum 3370 Phys. Rev. Lett. 83 1999 Phys. Rev. Lett. 83 (1999) 3370 4690 Phys. Rev. Lett. 83 1999 Phys. Rev. Lett. 83 (1999) 4690 [7] R.P. Kerr 237 Phys. Rev. Lett. 11 1963 Phys. Rev. Lett. 11 (1963) 237 [8] A. Ya Burinskii 441 Phys. Lett., A 185 1994 Phys. Lett. A 185 (1994) 441 `Complex String as Source of Kerr Ge-ometry' hep-th/9503094 2392 Phys. Rev., D 57 1998 Phys. Rev. D 57 (1998) 2392 `Structure of Spinning Parti-cle Suggested by Gravity, Supergravity & Low-Energy String Theory' hep-th/9910045 Czech. J. Phys.50S : 1 (2000) 201 [9] See, e.g., A. Sen 2081 Mod. Phys. Lett., A 10 1995 Mod. Phys. Lett. A 10 (1995) 2081 P.H. Frampton and T.W. Kephart 2571 Mod. Phys. Lett., A 10 1995 Mod. Phys. Lett. A 10 (1995) 2571 A. Strominger and C. Vafa 99 Phys. Lett. 379 1996 Phys. Lett. 379 (1996) 99 K. Behrndt 188 Nucl. Phys., B 455 1995 Nucl. Phys. B 455 (1995) 188 J.C. Breckenridge, D.A. Lowe, R.C. Myers, A.W. Peet, A. Strominger and C. Vafa 423 Phys. Lett., B 381 1996 Phys. Lett. B 381 (1996) 423 C. Callan and J. Maldacena 591 Nucl. Phys., B 472 1996 Nucl. Phys. B 472 (1996) 591 G. Horowitz and A. Stro-minger 2368 Phys. Rev. Lett. 77 1996 Phys. Rev. Lett. 77 (1996) 2368 J.M. Maldacena, `Black Holes in String Theory', Ph.D. Thesis hep-th/9607235 A. Dabholkar and J.A. Harvey 478 Phys. Rev. Lett. 63 1989 Phys. Rev. Lett. 63 (1989) 478 A. Dabholkar, G.W. Gibbons, J.A. Harvey and F. Ruiz Ruiz 33 Nucl. Phys., B 340 1990 Nucl. Phys. B 340 (1990) 33 C.G. Callan, Jr., J.M. Maldacena, A.W. Peet 645 Nucl. Phys. B 475 1996 Nucl. Phys. B 475 (1996) 645 [10] A. Tomimatu and H. Sato 95 Prog. Theor. Phys. 50 1973 Prog. Theor. Phys. 50 (1973) 95 [11] M. Yamazaki and S. Hori 696 Prog. Theor. Phys. 57 1977 Prog. Theor. Phys. 57 (1977) 696 erratum 1248 Prog. Theor. Phys. 60 1978 Prog. Theor. Phys. 60 (1978) 1248 S. Hori 1870 Prog. Theor. Phys. 59 1978 Prog. Theor. Phys. 59 (1978) 1870 erratum 365 Prog. Theor. Phys. 61 1979 Prog. Theor. Phys. 61 (1979) 365 [12] H. Nishino 77 Phys. Lett. 359 1995 Phys. Lett. 359 (1995) 77 [13] H. Weyl, Ann. de Phys. : 54 (1917) 117 [14] J.M. Bardeen, Astrophys. Jour. : 162 (1970) 71 [15] D. Kramer, H. Stephani, E. Herlt and M. MacCallum, `Exact Solutions of Einstein's Field Equations', Cambridge University Press (1980) [16] R. Arnowitt, S. Deser and C. Misner, in `Gravitation': `An Introduction to Current Re-search', ed. L. Witten (New York, Wiley, 1962) hep-th/0204102 eng Bo-Yu, H Northwest University, China Soliton on Noncommutative Orbifold $ T^2/Z_k $ 12 Apr 2002 13 p Following the construction of the projection operators on $ T^2 $ presented by Gopakumar, Headrick and Spradin, we construct the projection operators on the integral noncommutative orbifold $ T^2/G (G=Z_k,k=2, 3, 4, 6)$. Suchoperators are expressed by a function on this orbifold. So it provides a complete set of projection operators upon the moduli space $T^2 \times K/Z_k$. All these operators has the same trace 1/A ($A$ is an integer). Since theprojection operators correspond to solitons in noncommutative string field theory, we obtained the explicit expression of all the soliton solutions on $ T^2/Z_k $. LANL EDS SzGeCERN Particle Physics - Theory Kangjie, S Zhan-ying, Y Zhanying Yang <yzy@phy.nwu.edu.cn> http://cdsware.cern.ch/download/invenio-demo-site-files/0204102.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204102.ps.gz CER n 200231 2002 11 Bo-yu, Hou Kangjie, Shi Zhan-ying, Yang 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] A. Connes, Non-commutative Geometry, Academic Press, 1994 [2] G. Landi," An introduction to non-commutative space and their geometry" hep-th/9701078 J. Varilly, "An introduction to non-commutative Geometry" physics/9709045 [3] J. Madore, "An introduction to non-commutative Differential Geometry and its physical Applications", Cambridge University press 2nd edition, 1999 [4] A. Connes, M. Douglas, A. Schwartz, Matrix theory compactification on Tori 003 J. High Energy Phys. 9802 1998 J. High Energy Phys. 9802 (1998) 003 hep-th/9711162 M. dougals, C. Hull 008 J. High Energy Phys. 9802 1998 J. High Energy Phys. 9802 (1998) 008 hep-th/9711165 [5] Nathan. Seiberg and Edward. Witten," String theory and non-commutative geometry" 032 J. High Energy Phys. 9909 1999 J. High Energy Phys. 9909 (1999) 032 hep-th/9908142 V. Schomerus," D-branes and Deformation Quan-tization" 030 J. High Energy Phys. 9906 1999 J. High Energy Phys. 9906 (1999) 030 [6] E. Witten, "Noncommutative Geometry and String Field Theory" 253 Nucl. Phys., B 268 1986 Nucl. Phys. B 268 (1986) 253 [7] R. B. Laughlin, "The quantum Hall Effect", edited by R. Prange and S. Girvin, p233 [8] L. Susskind hep-th/0101029 J. P. Hu and S. C. Zhang cond-mat/0112432 [9] R. Gopakumar, M. Headrick, M. Spradin, "on Noncommutative Multi-solitons" hep-th/0103256 [10] E. J. Martinec and G. Moore, "Noncommutative Solitons on Orbifolds" hep-th/0101199 [11] D. J. Gross and N. A. Nekrasov, " Solitons in noncommutative Gauge Theory" hep-th/0010090 M. R. Douglas and N. A. Nekrasov, "Noncommutative Field Theory" hep-th/0106048 [12] R. Gopakumar, S. Minwalla and A. Strominger, " Noncommutative Soliton" 048 J. High Energy Phys. 005 2000 J. High Energy Phys. 005 (2000) 048 hep-th/0003160 [13] J. Harvey, " Komaba Lectures on Noncommutative Solitons and D-branes hep-th/0102076 J. A. Harvey, P. Kraus and F.Larsen, J. High Energy Phys.0012 (200) 024 hep-th/0010060 [14] A. Konechny and A. Schwarz, "Compactification of M(atrix) theory on noncommutative toroidal orbifolds" 667 Nucl. Phys., B 591 2000 Nucl. Phys. B 591 (2000) 667 hep-th/9912185 " Moduli spaces of max-imally supersymmetric solutions on noncommutative tori and noncommutative orbifolds", J. High Energy Phys.0009, (2000) 005 hep-th/0005167 [15] S. Walters, "Projective modules over noncommutative sphere", J. London Math. Soc. : 51 (1995) 589"Chern characters of Fourier modules", Can. J. Math. : 52 (2000) 633 [16] M. Rieffel, Pacific J. Math. : 93 (1981) 415 [17] F. P. Boca 325 Commun. Math. Phys. 202 1999 Commun. Math. Phys. 202 (1999) 325 [18] H. Bacry, A. Grossman and J. Zak 1118 Phys. Rev., B 12 1975 Phys. Rev. B 12 (1975) 1118 [19] J. Zak, In Solid State Phys.edited by H. Ehrenreich, F. Seitz and D. Turnbull (Aca-demic,new York,1972), Vol. 27 nucl-th/0204031 eng LA-UR-02-2040 Page, P R Los Alamos Sci. Lab. Hybrid Baryons Los Alamos, NM Los Alamos Sci. Lab. 11 Apr 2002 12 p We review the status of hybrid baryons. The only known way to study hybrids rigorously is via excited adiabatic potentials. Hybrids can be modelled by both the bag and flux-tube models. The low-lying hybrid baryon is N 1/2^+ with amass of 1.5-1.8 GeV. Hybrid baryons can be produced in the glue-rich processes of diffractive gamma N and pi N production, Psi decays and p pbar annihilation. LANL EDS SzGeCERN Nuclear Physics "Philip R. page" <prp@t16prp.lanl.gov> http://cdsware.cern.ch/download/invenio-demo-site-files/0204031.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204031.ps.gz 11 2002 Page, Philip R. 2002-04-15 00 2002-04-15 BATCH INVITED PLENARY TALK PRESENTED AT THE ``9TH INTERNATIONAL CONFERENCE ON THE STRUCTURE OF BARYONS'' (BARYONS 2002) 3-8 MARCH NEWPORT NEWS VA USA 12 PAGES 7 ENCAPSULATED POSTSCRIPT FIGURES LATEX n 200216 PREPRINT nucl-th/0204032 eng Amos, K The University of Melbourne A simple functional form for proton-nucleus total reaction cross sections 12 Apr 2002 13 p A simple functional form has been found that gives a good representation of the total reaction cross sections for the scattering of protons from (15) nuclei spanning the mass range ${}^{9}$Be to ${}^{238}$U and for proton energiesranging from 20 to 300 MeV. LANL EDS SzGeCERN Nuclear Physics Deb, P K Ken Amos <amos@physics.unimelb.edu.au> http://cdsware.cern.ch/download/invenio-demo-site-files/0204032.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204032.ps.gz 2002 11 2002-04-15 00 2002-04-15 BATCH n 200216 PREPRINT nucl-th/0204033 eng Oyamatsu, K Aichi Shukutoku Univ Saturation of nuclear matter and radii of unstable nuclei 12 Apr 2002 26 p We examine relations among the parameters characterizing the phenomenological equation of state (EOS) of nearly symmetric, uniform nuclear matter near the saturation density by comparing macroscopic calculations of radii and massesof stable nuclei with the experimental data. The EOS parameters of interest here are the symmetry energy S_0, the symmetry energy density-derivative coefficient L and the incompressibility K_0 at the normal nuclear density. We find aconstraint on the relation between K_0 and L from the empirically allowed values of the slope of the saturation line (the line joining the saturation points of nuclear matter at finite neutron excess), together with a strongcorrelation between S_0 and L. In the light of the uncertainties in the values of K_0 and L, we macroscopically calculate radii of unstable nuclei as expected to be produced in future facilities. We find that the matter radii dependstrongly on L while being almost independent of K_0, a feature that will help to determine the L value via systematic measurements of nuclear size. LANL EDS SzGeCERN Nuclear Physics Iida, K Kei Iida <keiiida@postman.riken.go.jp> http://cdsware.cern.ch/download/invenio-demo-site-files/0204033.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204033.ps.gz CER n 200231 2002 11 Oyamatsu, Kazuhiro Iida, Kei 2002-04-15 00 2002-04-15 BATCH PREPRINT [1] J.M. Blatt and V.F. Weisskopf, Theoretical Nuclear Physics, Wiley, New York, 1952 [2] H. Heiselberg, V.R. Pandharipande 481 Annu. Rev. Nucl. Part. Sci. 50 2000 Annu. Rev. Nucl. Part. Sci. 50 (2000) 481 [3] K. Oyamatsu, I. Tanihata, Y. Sugahara, K. Sumiyoshi, H. Toki 3 Nucl. Phys., A 634 1998 Nucl. Phys. A 634 (1998) 3 [4] B.A. Brown 5296 Phys. Rev. Lett. 85 2000 Phys. Rev. Lett. 85 (2000) 5296 [5] K.C. Chung, C.S. Wang, A.J. Santiago nucl-th/0102017 [6] B.A. Li 4221 Phys. Rev. Lett. 85 2000 Phys. Rev. Lett. 85 (2000) 4221 [7] C. Sturm et al 39 Phys. Rev. Lett. 86 2001 Phys. Rev. Lett. 86 (2001) 39 [8] C. Fuchs, A. Faessler, E. Zabrodin, Y.M. Zheng 1974 Phys. Rev. Lett. 86 2001 Phys. Rev. Lett. 86 (2001) 1974 [9] P. Danielewicz, in: Proc. Int. Symp. on Non-Equilibrium and Nonlinear Dynamics in Nuclear and Other Finite Systems, Beijing, 2001 nucl-th/0112006 [10] D.H. Youngblood, H.L. Clark, Y.-W. Lui 691 Phys. Rev. Lett. 82 1999 Phys. Rev. Lett. 82 (1999) 691 [11] J.A. Pons, F.M. Walter, J.M. Lattimer, M. Prakash, R. Neuhaeuser, P. An 981 Astrophys. J. 564 2002 Astrophys. J. 564 (2002) 981 [12] J.M. Lattimer 337 Annu. Rev. Nucl. Part. Sci. 31 1981 Annu. Rev. Nucl. Part. Sci. 31 (1981) 337 [13] K. Oyamatsu 431 Nucl. Phys., A 561 1993 Nucl. Phys. A 561 (1993) 431 [14] L.R.B. Elton, A. Swift 52 Nucl. Phys., A 94 1967 Nucl. Phys. A 94 (1967) 52 [15] M. Yamada 512 Prog. Theor. Phys. 32 1964 Prog. Theor. Phys. 32 (1964) 512 [16] H. de Vries, C.W. de Jager, C. de Vries 495 At. Data Nucl. Data Tables 36 1987 At. Data Nucl. Data Tables 36 (1987) 495 [17] G. Audi, A.H. Wapstra 409 Nucl. Phys., A 595 1995 Nucl. Phys. A 595 (1995) 409 [18] S. Goriely, F. Tondeur, J.M. Pearson 311 At. Data Nucl. Data Tables 77 2001 At. Data Nucl. Data Tables 77 (2001) 311 [19] M. Samyn, S. Goriely, P.-H. Heenen, J.M. Pearson, F. Tondeur 142 Nucl. Phys., A 700 2002 Nucl. Phys. A 700 (2002) 142 [20] E. Chabanat, P. Bonche, P. Haensel, J. Meyer, R. Schaeffer 231 Nucl. Phys., A 635 1998 Nucl. Phys. A 635 (1998) 231 [21] Y. Sugahara, H. Toki 557 Nucl. Phys., A 579 1994 Nucl. Phys. A 579 (1994) 557 [22] A. Ozawa, T. Suzuki, I. Tanihata 32 Nucl. Phys., A 693 2001 Nucl. Phys. A 693 (2001) 32 [23] C.J. Batty, E. Friedman, H.J. Gils, H. Rebel 1 Adv. Nucl. Phys. 19 1989 Adv. Nucl. Phys. 19 (1989) 1 [24] G. Fricke, C. Bernhardt, K. Heilig, L.A. Schaller, L. Schellenberg, E.B. Shera, C.W. de Jager 177 At. Data Nucl. Data Tables 60 1995 At. Data Nucl. Data Tables 60 (1995) 177 [25] G. Huber et al 2342 Phys. Rev., C 18 1978 Phys. Rev. C 18 (1978) 2342 [26] L. Ray, G.W. Hoffmann, W.R. Coker 223 Phys. Rep. 212 1992 Phys. Rep. 212 (1992) 223 [27] S. Yoshida, H. Sagawa, N. Takigawa 2796 Phys. Rev., C 58 1998 Phys. Rev. C 58 (1998) 2796 [28] C.J. Pethick, D.G. Ravenhall 173 Nucl. Phys., A 606 1996 Nucl. Phys. A 606 (1996) 173 [29] K. Iida, K. Oyamatsu, unpublished [30] J.P. Blaizot, J.F. Berger, J. Decharg´e, M. Girod 435 Nucl. Phys., A 591 1995 Nucl. Phys. A 591 (1995) 435 nucl-th/0204034 eng Bozek, P Institute of Nuclear Physics, Cracow, Poland Nuclear matter with off-shell propagation 12 Apr 2002 Symmetric nuclear matter is studied within the conserving, self-consistent T-matrix approximation. This approach involves off-shell propagation of nucleons in the ladder diagrams. The binding energy receives contributions from thebackground part of the spectral function, away form the quasiparticle peak. The Fermi energy at the saturation point fulfills the Hugenholz-Van Hove relation. In comparison to the Brueckner-Hartree-Fock approach, the binding energyis reduced and the equation of state is harder LANL EDS SzGeCERN Nuclear Physics Bozek <bozek@sothis.ifj.edu.pl> http://cdsware.cern.ch/download/invenio-demo-site-files/0204034.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0204034.ps.gz 2002 11 2002-04-15 00 2002-04-15 BATCH n 200216 PREPRINT SCAN-9605068 eng McGILL-96-15 Contogouris, A P University of Athens One loop corrections for certain reactions initiated by 5-parton subprocesses via helicity amplitudes Montreal McGill Univ. Phys. Dept. Apr 1996? 28 p UNC9808 SzGeCERN Particle Physics - Phenomenology Merebashvili, Z V Lebessis, F Veropoulos, G http://cdsware.cern.ch/download/invenio-demo-site-files/convert_SCAN-9605068.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/SCAN-9605068.tif 13 1996 1996-05-08 50 2001-12-14 BATCH 4234-4243 7 Phys. Rev., D 54 1996 h 199620 ARTICLE eng TRI-PP-86-73 Bryman, D A University of British Columbia Exotic muon decay mu --> e + x Burnaby, BC TRIUMF Aug 1986 8 p jv200203 SzGeCERN Particle Physics - Experimental Results Clifford, E T H 13 1986 1990-01-29 50 2002-03-26 BATCH 2787-88 22 Phys. Rev. Lett. 57 1986 SLAC 1594699 h 198648n ARTICLE hep-th/0003289 eng PUPT-1926 Costa, M S Princeton University A Test of the AdS/CFT Duality on the Coulomb Branch Princeton, NJ Princeton Univ. Joseph-Henry Lab. Phys. 31 Mar 2000 11 p We consider the N=4 SU(N) Super Yang Mills theory on the Coulomb branch with gauge symmetry broken to S(U(N_1)*U(N_2)). By integrating the W particles, the effective action near the IR SU(N_i) conformal fixed points is seen to be adeformation of the Super Yang Mills theory by a non-renormalized, irrelevant, dimension 8 operator. The correction to the two-point function of the dilaton field dual operator near the IR is related to a three-point function ofchiral primary operators at the conformal fixed points and agrees with the classical gravity prediction, including the numerical factor. LANL EDS LANLPUBL200104 SzGeCERN Particle Physics - Theory Miguel S Costa <miguel@feynman.princeton.edu> http://cdsware.cern.ch/download/invenio-demo-site-files/0003289.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0003289.ps.gz 2000 13 Princeton University 2000-04-03 50 2001-11-09 BATCH Costa, Miguel S. 287-292 Phys. Lett., B 482 2000 SLAC 4356110 n 200014 ARTICLE [1] J.M. Maldacena, The Large N Limit of Superconformal Field Theories and Supergrav-ity 231 Adv. Theor. Math. Phys. 2 1998 Adv. Theor. Math. Phys. 2 (1998) 231 hep-th/9711200 [2] S.S. Gubser, I.R. Klebanov and A.M. Polyakov, Gauge Theory Correlators from Non-Critical String Theory 105 Phys. Lett., B 428 1998 Phys. Lett. B 428 (1998) 105 hep-th/9802109 [3] E. Witten, Anti De Sitter Space And Holography 253 Adv. Theor. Math. Phys. 2 1998 Adv. Theor. Math. Phys. 2 (1998) 253 hep-th/9802150 [4] O. Aharony, S.S. Gubser, J. Maldacena, H. Ooguri and Y. Oz, Large N Field Theories, String Theory and Gravity 183 Phys. Rep. 323 2000 Phys. Rep. 323 (2000) 183 hep-th/9905111 [5] J.A. Minahan and N.P. Warner, Quark Potentials in the Higgs Phase of Large N Supersymmetric Yang-Mills Theories 005 J. High Energy Phys. 06 1998 J. High Energy Phys. 06 (1998) 005 hep-th/9805104 [6] M.R. Douglas and W. Taylor, Branes in the bulk of Anti-de Sitter space hep-th/9807225 [7] A.A. Tseytlin and S. Yankielowicz, Free energy of N=4 super Yang-Mills in Higgs phase and non-extremal D3-brane interactions 145 Nucl. Phys., B 541 1999 Nucl. Phys. B 541 (1999) 145 hep-th/9809032 [8] Y. Wu, A Note on AdS/SYM Correspondence on the Coulomb Branch hep-th/9809055 [9] P. Kraus, F. Larsen, S. Trivedi, The Coulomb Branch of Gauge Theory from Rotating Branes 003 J. High Energy Phys. 03 1999 J. High Energy Phys. 03 (1999) 003 hep-th/9811120 [10] I.R. Klebanov and E. Witten, AdS/CFT Correspondence and Symmetry Breaking 89 Nucl. Phys., B 556 1999 Nucl. Phys. B 556 (1999) 89 hep-th/9905104 [11] D.Z. Freedman, S.S. Gubser, K. Pilch and N.P. Warner, Continuous distributions of D3-branes and gauged supergravity hep-th/9906194 [12] A. Brandhuber and K. Sfetsos, Wilson loops from multicentre and rotating branes, mass gaps and phase structure in gauge theories hep-th/9906201 [13] I. Chepelev and R. Roiban, A note on correlation functions in AdS5/SY M4 corre-spondence on the Coulomb branch 74 Phys. Lett., B 462 1999 Phys. Lett. B 462 (1999) 74 hep-th/9906224 [14] S.B. Giddings and S.F. Ross, D3-brane shells to black branes on the Coulomb branch 024036 Phys. Rev., D 61 2000 Phys. Rev. D 61 (2000) 024036 hep-th/9907204 [15] M. Cvetic, S.S. Gubser, H. Lu and C.N. Pope, Symmetric Potentials of Gauged Su-pergravities in Diverse Dimensions and Coulomb Branch of Gauge Theories hep-th/9909121 [16] R.C.Rashkov and K.S.Viswanathan, Correlation functions in the Coulomb branch of N=4 SYM from AdS/CFT correspondence hep-th/9911160 [17] M.S. Costa, Absorption by Double-Centered D3-Branes and the Coulomb Branch of N = 4 SYM Theory hep-th/9912073 [18] Y.S. Myung, G. Kang and H.W. Lee, Greybody factor for D3-branes in B field hep-th/9911193 S-wave absorption of scalars by noncommutative D3-branes hep-th/9912288 [19] R. Manvelyan, H.J.W. Mueller-Kirsten, J.-Q. Liang, Y. Zhang, Absorption Cross Sec-tion of Scalar Field in Supergravity Background hep-th/0001179 [20] S.S. Gubser and I.R. Klebanov, Absorption by Branes and Schwinger Terms in the World Volume Theory 41 Phys. Lett., B 413 1997 Phys. Lett. B 413 (1997) 41 hep-th/9708005 [21] K. Intriligator, Maximally Supersymmetric RG Flows and AdS Duality hep-th/9909082 [22] S.S. Gubser, A. Hashimoto, I.R. Klebanov and M. Krasnitz, Scalar Absorption and the Breaking of the World Volume Conformal Invariance 393 Nucl. Phys., B 526 1998 Nucl. Phys. B 526 (1998) 393 hep-th/9803023 [23] S. Lee, S. Minwalla, M. Rangamani and N. Seiberg, Three-Point Functions of Chiral Operators in D=4, N = 4 SYM at Large N 697 Adv. Theor. Math. Phys. 2 1998 Adv. Theor. Math. Phys. 2 (1998) 697 hep-th/9806074 [24] E. D'Hoker, D.Z. Freedman and W. Skiba, Field Theory Tests for Correlators in the AdS/CFT Correspondence 045008 Phys. Rev., D 59 1999 Phys. Rev. D 59 (1999) 045008 hep-th/9807098 [25] F. Gonzalez-Rey, B. Kulik and I.Y. Park, Non-renormalization of two and three Point Correlators of N=4 SYM in N=1 Superspace 164 Phys. Lett., B 455 1999 Phys. Lett. B 455 (1999) 164 hep-th/9903094 [26] K. Intriligator, Bonus Symmetries of N=4 Super-Yang-Mills Correlation Functions via AdS Duality 575 Nucl. Phys., B 551 1999 Nucl. Phys. B 551 (1999) 575 hep-th/9811047 K. Intriligator and W. Skiba, Bonus Symmetry and the Operator Product Expansion of N=4 Super-Yang-Mills 165 Nucl. Phys., B 559 1999 Nucl. Phys. B 559 (1999) 165 hep-th/9905020 [27] B. Eden, P.S. Howe and P.C. West, Nilpotent invariants in N=4 SYM 19 Phys. Lett., B 463 1999 Phys. Lett. B 463 (1999) 19 hep-th/9905085 P.S. Howe, C. Schubert, E. Sokatchev and P.C. West, Explicit construction of nilpotent covariants in N=4 SYM hep-th/9910011 [28] A. Petkou and K. Skenderis, A non-renormalization theorem for conformal anomalies 100 Nucl. Phys., B 561 1999 Nucl. Phys. B 561 (1999) 100 hep-th/9906030 [29] M.R. Douglas, D. Kabat, P. Pouliot and S.H. Shenker, D-branes and Short Distances in String Theory 85 Nucl. Phys., B 485 1997 Nucl. Phys. B 485 (1997) 85 hep-th/9608024 [30] G. Lifschytz and S.D. Mathur, Supersymmetry and Membrane Interactions in M(atrix) Theory 621 Nucl. Phys., B 507 1997 Nucl. Phys. B 507 (1997) 621 hep-th/9612087 [31] J. Maldacena, Probing Near Extremal Black Holes with D-branes 3736 Phys. Rev., D 57 1998 Phys. Rev. D 57 (1998) 3736 hep-th/9705053 Branes probing black holes 17 Nucl. Phys. B, Proc. Suppl. 68 1998 Nucl. Phys. B, Proc. Suppl. 68 (1998) 17 hep-th/9709099 [32] I. Chepelev and A.A. Tseytlin, Interactions of type IIB D-branes from D-instanton ma-trix model 629 Nucl. Phys., B 511 1998 Nucl. Phys. B 511 (1998) 629 hep-th/9705120 Long-distance interactions of branes: correspondence between supergravity and super Yang-Mills descriptions 73 Nucl. Phys., B 515 1998 Nucl. Phys. B 515 (1998) 73 hep-th/9709087 A.A. Tseytlin, Interactions Between Branes and Matrix Theories 99 Nucl. Phys. B, Proc. Suppl. 68 1998 Nucl. Phys. B, Proc. Suppl. 68 (1998) 99 hep-th/9709123 [33] M. Dine and N. Seiberg, Comments on Higher Derivative Operators in Some SUSY Field Theories 239 Phys. Lett., B 409 1997 Phys. Lett. B 409 (1997) 239 hep-th/9705057 [34] A.A. Tseytlin, On non-abelian generalisation of Born-Infeld action in string theory 41 Nucl. Phys., B 501 1997 Nucl. Phys. B 501 (1997) 41 hep-th/9701125 [35] S.S. Gubser and A. Hashimoto, Exact absorption probabilities for the D3-brane, Com-mun. Math. Phys. : 203 (1999) 325 hep-th/9805140 [36] S.S. Gubser, Non-conformal examples of AdS/CFT 1081 Class. Quantum Gravity 17 2000 Class. Quantum Gravity 17 (2000) 1081 hep-th/9910117 eng Bollen, G Institut fur Physic, Universitat Mainz ISOLTRAP : a tandem penning trap system for accurate on-line mass determination of short-lived isotopes SzGeCERN Detectors and Experimental Techniques Becker, S Kluge, H J Konig, M Moore, M Otto, T Raimbault-Hartmann, H Savard, G Schweikhard, L Stolzenberg, H ISOLDE Collaboration 1996 13 IS302 ISOLDE PPE CERN PS 1996-05-08 50 2001-12-14 BATCH 675-697 Nucl. Instrum. Methods Phys. Res., A 368 1996 n 199600 a1996 ARTICLE hep-th/0003291 eng McInnes, B National University of Singapore AdS/CFT For Non-Boundary Manifolds In its Euclidean formulation, the AdS/CFT correspondence begins as a study of Yang-Mills conformal field theories on the sphere, S^4. It has been successfully extended, however, to S^1 X S^3 and to the torus T^4. It is natural tohope that it can be made to work for any manifold on which it is possible to define a stable Yang-Mills conformal field theory. We consider a possible classification of such manifolds, and show how to deal with the most obviousobjection : the existence of manifolds which cannot be represented as boundaries. We confirm Witten's suggestion that this can be done with the help of a brane in the bulk. LANL EDS SzGeCERN Particle Physics - Theory Brett McInnes <matmcinn@nus.edu.sg> http://cdsware.cern.ch/download/invenio-demo-site-files/0003291.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0003291.ps.gz 2000 13 Innes, Brett Mc 2000-04-03 50 2001-11-09 BATCH 025 J. High Energy Phys. 05 2000 SLAC 4356136 n 200014 ARTICLE SCAN-9605071 eng KEK-Preprint-95-196 TUAT-HEP-96-1 DPNU-96-04 Emi, K KEK Study of a dE/dx measurement and the gas-gain saturation by a prototype drift chamber for the BELLE-CDC Tsukuba KEK Jan 1996 20 p UNC9806 SzGeCERN Detectors and Experimental Techniques Tsukamoto, T Hirano, H Mamada, H Sakai, Y Uno, S Itami, S Kajikawa, R Nitoh, O Ohishi, N Sugiyama, A Suzuki, S Takahashi, T Tamagawa, Y Tomoto, M Yamaki, T library@kekvax.kek.jp http://cdsware.cern.ch/download/invenio-demo-site-files/convert_SCAN-9605071.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/SCAN-9605071.tif 1996 13 1996-05-08 50 2001-12-14 BATCH 225 2 Nucl. Instrum. Methods Phys. Res., A 379 1996 SLAC 3328660 h 199620 ARTICLE hep-th/0003293 eng Smailagic, A University of Osijek Higher Dimensional Schwinger-like Anomalous Effective Action We construct explicit form of the anomalous effective action, in arbitrary even dimension, for Abelian vector and axial gauge fields coupled to Dirac fermions. It turns out to be a surprisingly simple extension of 2D Schwinger modeleffective action. LANL EDS LANLPUBL200104 SzGeCERN Particle Physics - Theory Spallucci, E spallucci@ts.infn.it http://cdsware.cern.ch/download/invenio-demo-site-files/0003293.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0003293.ps.gz CER n 200231 2000 13 2000-04-03 50 2001-11-09 BATCH 045010 Phys. Rev., D 62 2000 SLAC 4356152 ARTICLE [1] S.L. Adler 2426 Phys. Rev. 177 1969 Phys. Rev. 177 (1969) 2426 [2] S.E.Treiman, R. Jackiw, D.J.Gross " Lectures on Current Algebra and its Applications ", Princeton UP, Princeton NJ, (1972) [3] T.Berger " Fermions in two (1+1)-dimensional anomalous gauge theories: the chiral Schwinger model and the chiral quantum gravity " Hamburg U DESY-90-084 July 1990 [4] L.Rosenberg, Phys. Rev.129, (1963) 2786 [5] R.Jackiw " Topological Investigations of Quantized Gauge Theories " in Relativity, Groups and Topology eds. B.deWitt and R.Stora (Elsevier, Amsterdam 1984) [6] M.T.Grisaru, N.K.Nielsen, W.Siegel, D.Zanon 157 Nucl. Phys., B 247 1984 Nucl. Phys. B 247 (1984) 157 [7] A.M. Polyakov 207 Phys. Lett., B 103 1981 Phys. Lett. B 103 (1981) 207 A.M. Polyakov 893 Mod. Phys. Lett., A 2 1987 Mod. Phys. Lett. A 2 (1987) 893 [8] R.J. Riegert 56 Phys. Lett. 134 1984 Phys. Lett. 134 (1984) 56 [9] K.Fujikawa 1195 Phys. Rev. Lett. 42 1979 Phys. Rev. Lett. 42 (1979) 1195 [10] B.deWitt, Relativity, Groups and Topology, Paris (1963); A.O.Barvinsky, G.A.Vilkovisky 1 Phys. Rep. 119 1985 Phys. Rep. 119 (1985) 1 [11] P.H.Frampton, T.W.Kephart 1343 Phys. Rev. Lett. 50 1983 Phys. Rev. Lett. 50 (1983) 1343 L. Alvarez-Gaume, E.Witten 269 Nucl. Phys. 234 1983 Nucl. Phys. 234 (1983) 269 [12] A.Smailagic, R.E.Gamboa-Saravi 145 Phys. Lett. 192 1987 Phys. Lett. 192 (1987) 145 A.Smailagic, E.Spallucci 17 Phys. Lett. 284 1992 Phys. Lett. 284 (1992) 17 hep-th/0003294 eng Matsubara, K Uppsala University Restrictions on Gauge Groups in Noncommutative Gauge Theory We show that the gauge groups SU(N), SO(N) and Sp(N) cannot be realized on a flat noncommutative manifold, while it is possible for U(N). LANL EDS LANLPUBL200104 SzGeCERN Particle Physics - Theory Keizo Matsubara <keizo.matsubara@teorfys.uu.se> http://cdsware.cern.ch/download/invenio-demo-site-files/0003294.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0003294.ps.gz CER n 200231 2000 13 Matsubara, Keizo 2000-04-03 50 2001-11-09 BATCH 417-419 Phys. Lett., B 482 2000 SLAC 4356160 ARTICLE [1] J.Polchinski, TASI Lectures on D-branes hep-th/9611050 [2] M.R.Douglas and C.Hull, D-branes and the Noncommuta-tive torus 8 J. High Energy Phys. 2 1998 J. High Energy Phys. 2 (1998) 8 hep-th/9711165 [3] V.Schomerus, D-branes and Deformation Quantization hep-th/9903205 [4] N.Seiberg and E.Witten, String Theory and Noncommu-tative Geometry hep-th/9908142 2 hep-th/0003295 eng Wang, B Fudan University Quasinormal modes of Reissner-Nordstrom Anti-de Sitter Black Holes Complex frequencies associated with quasinormal modes for large Reissner-Nordstr$\ddot{o}$m Anti-de Sitter black holes have been computed. These frequencies have close relation to the black hole charge and do not linearly scale withthe black hole temperature as in Schwarzschild Anti-de Sitter case. In terms of AdS/CFT correspondence, we found that the bigger the black hole charge is, the quicker for the approach to thermal equilibrium in the CFT. The propertiesof quasinormal modes for $l>0$ have also been studied. LANL EDS LANLPUBL200104 SzGeCERN Particle Physics - Theory Lin, C Y Abdalla, E Elcio Abdalla <eabdalla@fma.if.usp.br> http://cdsware.cern.ch/download/invenio-demo-site-files/0003295.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0003295.ps.gz http://cdsware.cern.ch/download/invenio-demo-site-files/0003295.fig1.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0003295.fig2.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0003295.fig3.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0003295.fig4.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0003295.fig5.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0003295.fig6a.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0003295.fig6b.ps.gz Additional http://cdsware.cern.ch/download/invenio-demo-site-files/0003295.fig7.ps.gz Additional CER n 200231 2000 13 Wang, Bin Lin, Chi-Yong Abdalla, Elcio 2000-04-03 50 2001-11-09 BATCH 79-88 Phys. Lett., B 481 2000 SLAC 4356179 ARTICLE [1] K. D. Kokkotas, B. G. Schmidt gr-qc/9909058 and references therein [2] W. Krivan 101501 Phys. Rev., D 60 1999 Phys. Rev. D 60 (1999) 101501 [3] S. Hod gr-qc/9902072 [4] P. R. Brady, C. M. Chambers, W. G. Laarakkers and E. Poisson 064003 Phys. Rev., D 60 1999 Phys. Rev. D 60 (1999) 064003 [5] P. R. Brady, C. M. Chambers, W. Krivan and P. Laguna 7538 Phys. Rev., D 55 1997 Phys. Rev. D 55 (1997) 7538 [6] G. T. Horowitz and V. E. Hubeny hep-th/9909056 G. T. Horowitz hep-th/9910082 [7] E. S. C. Ching, P. T. Leung, W. M. Suen and K. Young 2118 Phys. Rev., D 52 1995 Phys. Rev. D 52 (1995) 2118 [8] J. M. Maldacena 231 Adv. Theor. Math. Phys. 2 1998 Adv. Theor. Math. Phys. 2 (1998) 231 [9] E. Witten 253 Adv. Theor. Math. Phys. 2 1998 Adv. Theor. Math. Phys. 2 (1998) 253 [10] S. S. Gubser, I. R. Klebanov and A. M. Polyakov 105 Phys. Lett., B 428 1998 Phys. Lett. B 428 (1998) 105 [11] A. Chamblin, R. Emparan, C. V. Johnson and R. C. Myers 064018 Phys. Rev., D 60 1999 Phys. Rev. D 60 (1999) 064018 [12] E. W. Leaver 1238 J. Math. Phys. 27 1986 J. Math. Phys. 27 (1986) 1238 [13] E. W. Leaver 2986 Phys. Rev., D 41 1990 Phys. Rev. D 41 (1990) 2986 [14] C. O. Lousto 1733 Phys. Rev., D 51 1995 Phys. Rev. D 51 (1995) 1733 [15] O. Kaburaki 316 Phys. Lett., A 217 1996 Phys. Lett. A 217 (1996) 316 [16] R. K. Su, R. G. Cai and P. K. N. Yu 2932 Phys. Rev., D 50 1994 Phys. Rev. D 50 (1994) 2932 3473 Phys. Rev., D 48 1993 Phys. Rev. D 48 (1993) 3473 6186 Phys. Rev., D 52 1995 Phys. Rev. D 52 (1995) 6186 B. Wang, J. M. Zhu 1269 Mod. Phys. Lett., A 10 1995 Mod. Phys. Lett. A 10 (1995) 1269 [17] A. Chamblin, R. Emparan, C. V. Johnson and R. C. Myers, Phys. Rev., D60: 104026 (1999) 5070 90 110 130 150 r+ 130 230 330 50 70 90 110 130 150 r+ rus Пушкин, А С Медный всадник <!--HTML-->На берегу пустынных волн, <br /> Стоял он, дум великих полн, <br /> И вдаль глядел. Пред ним широко<br /> Река неслася; бедный чёлн<br /> По ней стремился одиноко. <br /> По мшистым, топким берегам<br /> Чернели избы здесь и там, <br /> Приют убогого чухонца; <br /> И лес, неведомый лучам<br /> В тумане спрятанного солнца, <br /> Кругом шумел. 1833 1990-01-27 00 2002-04-12 BATCH POETRY gre Καβάφης, Κ Π Ιθάκη <!--HTML-->Σα βγεις στον πηγαιμό για την Ιθάκη, <br /> να εύχεσαι νάναι μακρύς ο δρόμος, <br /> γεμάτος περιπέτειες, γεμάτος γνώσεις. <br/> Τους Λαιστρυγόνας και τους Κύκλωπας, <br /> τον θυμωμένο Ποσειδώνα μη φοβάσαι, <br /> τέτοια στον δρόμο σου ποτέ σου δεν θα βρείς, <br /> αν μέν' η σκέψις σου υψηλή, αν εκλεκτή<br /> συγκίνησις το πνεύμα και το σώμα σου αγγίζει. <br /> Τους Λαιστρυγόνας και τους Κύκλωπας, <br /> τον άγριο Ποσειδώνα δεν θα συναντήσεις, <br /> αν δεν τους κουβανείς μες στην ψυχή σου, <br /> αν η ψυχή σου δεν τους στήνει εμπρός σου. <br /> <br> Να εύχεσαι νάναι μακρύς ο δρόμος. <br /> Πολλά τα καλοκαιρινά πρωϊά να είναι<br /> που με τι ευχαρίστησι, με τι χαρά<br /> θα μπαίνεις σε λιμένας πρωτοειδωμένους· <br /> να σταματήσεις σ' εμπορεία Φοινικικά, <br /> και τες καλές πραγμάτειες ν' αποκτήσεις, <br /> σεντέφια και κοράλλια, κεχριμπάρια κ' έβενους, <br /> και ηδονικά μυρωδικά κάθε λογής, <br /> όσο μπορείς πιο άφθονα ηδονικά μυρωδικά· <br /> σε πόλεις Αιγυπτιακές πολλές να πας, <br /> να μάθεις και να μάθεις απ' τους σπουδασμένους. <br /> <br /> Πάντα στον νου σου νάχεις την Ιθάκη. <br/> Το φθάσιμον εκεί είν' ο προορισμός σου. <br /> Αλλά μη βιάζεις το ταξίδι διόλου. <br /> Καλλίτερα χρόνια πολλά να διαρκέσει· <br /> και γέρος πια ν' αράξεις στο νησί, <br /> πλούσιος με όσα κέρδισες στον δρόμο, <br /> μη προσδοκώντας πλούτη να σε δώσει η Ιθάκη. <br /> <br /> Η Ιθάκη σ' έδωσε το ωραίο ταξίδι. <br /> Χωρίς αυτήν δεν θάβγαινες στον δρόμο. <br /> Αλλο δεν έχει να σε δώσει πια. <br /> <br /> Κι αν πτωχική την βρεις, η Ιθάκη δεν σε γέλασε. <br /> Ετσι σοφός που έγινες, με τόση πείρα, <br /> ήδη θα το κατάλαβες η Ιθάκες τι σημαίνουν. 1911 2005-03-02 00 2005-03-02 BATCH POETRY SzGeCERN 2345180CERCER SLAC 5278333 hep-th/0210114 eng Klebanov, Igor R Princeton University AdS Dual of the Critical O(N) Vector Model 2002 11 Oct 2002 11 p We suggest a general relation between theories of infinite number of higher-spin massless gauge fields in $AdS_{d+1}$ and large $N$ conformal theories in $d$ dimensions containing $N$-component vector fields. In particular, we propose that the singlet sector of the well-known critical 3-d O(N) model with the $(\phi^a \phi^a)^2$ interaction is dual, in the large $N$ limit, to the minimal bosonic theory in $AdS_4$ containing massless gauge fields of even spin. LANL EDS SIS LANLPUBL2003 SIS:2003 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory Polyakov, A M 213-219 Phys. Lett. B 550 2002 http://cdsware.cern.ch/download/invenio-demo-site-files/0210114.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0210114.ps.gz klebanov@feynman.princeton.edu n 200242 13 20060826 0012 CER01 20021014 PUBLIC 002345180CER ARTICLE [1] G.’t Hooft, "A planar diagram theory for strong interactions," Nucl. Phys. B 72 (1974) 461 [2] A.M. Polyakov, "String theory and quark confinement," Nucl. Phys. B, Proc. Suppl. 68 (1998) 1 hep-th/9711002 [2] "The wall of the cave," hep-th/9809057 [3] J. Maldacena, "The large N limit of superconformal field theories and supergravity," Adv. Theor. Math. Phys. 2 (1998) 231 hep-th/9711200 [4] S. S. Gubser, I. R. Klebanov, and A. M. Polyakov, "Gauge theory correlators from non-critical string theory," Phys. Lett. B 428 (1998) 105 hep-th/9802109 [5] E. Witten, "Anti-de Sitter space and holography," Adv. Theor. Math. Phys. 2 (1998) 253 hep-th/9802150 [6] E. Brezin, D.J. Wallace, Phys. Rev. B 7 (1973) 1976 [7] K.G. Wilson and J. Kogut, "The Renormalization Group and the Epsilon Expansion," Phys. Rep. 12 (1974) 75 [8] C. Fronsdal, Phys. Rev. D 18 (1978) 3624 [9] E. Fradkin and M. Vasiliev, Phys. Lett. B 189 (1987) 89 [9] Nucl. Phys. B 291 (1987) 141 [10] M.A. Vasiliev, "Higher Spin Gauge Theories Star Product and AdS Space," hep-th/9910096 [11] A. M. Polyakov, "Gauge fields and space-time," hep-th/0110196 [12] P. Haggi-Mani and B. Sundborg, "Free Large N Supersymmetric Yang-Mills Theory Ann. Sci. a String Theory," hep-th/0002189 [12] B. Sundborg, "Stringy Gravity, Interacting Tensionless Strings and Massless Higher Spins," hep-th/0103247 [13] E. Witten, Talk at the John Schwarz 60-th Birthday Symposium, http://theory.caltech.edu/jhs60/witten/1.html http://theory.caltech.edu/jhs60/witten/1.html [14] E. Sezgin and P. Sundell, "Doubletons and 5D Higher Spin Gauge Theory," hep-th/0105001 [15] A. Mikhailov, "Notes On Higher Spin Symmetries," hep-th/0201019 [16] E. Sezgin and P. Sundell, "Analysis of Higher Spin Field Equations in Four Dimensions," hep-th/0205132 [16] J. Engquist, E. Sezgin, P. Sundell, "On N=1,2,4 Higher Spin Gauge Theories in Four Dimensions," hep-th/0207101 [17] M. Vasiliev, "Higher Spin Gauge Theories in Four, Three and Two Dimensions," Int. J. Mod. Phys. D 5 (1996) 763 hep-th/9611024 [18] I.R. Klebanov and E. Witten, "AdS/CFT correspondence and Symmetry Breaking," Nucl. Phys. B 556 (1999) 89 hep-th/9905104 [19] O. Aharony, M. Berkooz, E. Silverstein, "Multiple Trace Operators and Nonlocal String Theories," J. High Energy Phys. 0108 (2001) 006 hep-th-0105309 [20] E. Witten, "Multi-Trace Operators, Boundary Conditions, And AdS/CFT Correspondence," hep-th/0112258 [21] M. Berkooz, A. Sever and A. Shomer, "Double Trace Deformations, Boundary Conditions and Space-time Singularities," J. High Energy Phys. 0205 (2002) 034 hep-th-0112264 [22] S.S. Gubser and I. Mitra, "Double-Trace Operators and One-Loop Vacuum Energy in AdS/CFT," hep-th/0210093 [23] I.R. Klebanov, "Touching Random Surfaces and Liouville Gravity," Phys. Rev. D 51 (1995) 1836 hep-th/9407167 [23] I.R. Klebanov and A. Hashimoto, "Non-Perturbative Solution of Matrix Models Modified by Trace-Squared Terms," Nucl. Phys. B 434 (1995) 264 hep-th/9409064 [24] A.M. Polyakov, "Non-Hamiltonian Approach to Quantum Field Theory at Small Distances," Zh. Eksp. Teor. Fiz. 66 (1974) 23 [25] E. D’Hoker, D. Z. Freedman, S. Mathur, A. Matusis and L. Rastelli, "Graviton exchange and complete 4-point functions in the AdS/CFT correspondence, " hep-th/9903196 [26] For a review with a comprehensive set of references, see E. d‘Hoker and D.Z. Freedman, "Supersymmetric Gauge Theories and the AdS/CFT Correspondence," hep-th/0201253 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181365824-0-23-17-1-0 2292727CERCER SLAC 4828445 UNCOVER 1021768628 hep-th/0201100 eng DSF-2002-2 Mück, W INFN Universita di Napoli An improved correspondence formula for AdS/CFT with multi-trace operators 2002 Napoli Napoli Univ. 15 Jan 2002 6 p An improved correspondence formula is proposed for the calculation of correlation functions of a conformal field theory perturbed by multi-trace operators from the analysis of the dynamics of the dual field theory in Anti-de Sitter space. The formula reduces to the usual AdS/CFT correspondence formula in the case of single-trace perturbations. LANL EDS SIS ING2002 SIS:2002 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory 301-304 3-4 Phys. Lett. B 531 2002 http://cdsware.cern.ch/download/invenio-demo-site-files/0201100.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0201100.ps.gz wolfgang.mueck@na.infn.it n 200204 13 20060826 0008 CER01 20020128 PUBLIC 002292727CER ARTICLE [1] E. Witten, hep-th/0112258 [2] S. S. Gubser, I. R. Klebanov and A. M. Polyakov, Phys. Lett. B 428 (1998) 105 hep-th/9802109 [3] E. Witten, Adv. Theor. Math. Phys. 2 (1998) 253 hep-th/9802150 [4] P. Breitenlohner and D. Z. Freedman, Ann. Phys. (San Diego) 144 (1982) 249 [5] I. R. Klebanov and E. Witten, Nucl. Phys. B 556 (1999) 89 hep-th/9905104 [6] W. Mück and K. S. Viswanathan, Phys. Rev. D 60 (1999) 081901 hep-th/9906155 [7] W. Mück, Nucl. Phys. B 620 (2002) 477 hep-th/0105270 [8] M. Bianchi, D. Z. Freedman and K. Skenderis, J. High Energy Phys. 08 (2001) 041 hep-th/0105276 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181358050-0-7-7-0-0 SzGeCERN 2307939CERCER SLAC 4923022 hep-th/0205061 eng DSF-2002-11 QMUL-PH-2002-11 Martelli, D University of London Holographic Renormalization and Ward Identities with the Hamilton-Jacobi Method 2003 Napoli Napoli Univ. 7 May 2002 31 p A systematic procedure for performing holographic renormalization, which makes use of the Hamilton-Jacobi method, is proposed and applied to a bulk theory of gravity interacting with a scalar field and a U(1) gauge field in the Stueckelberg formalism. We describe how the power divergences are obtained as solutions of a set of "descent equations" stemming from the radial Hamiltonian constraint of the theory. In addition, we isolate the logarithmic divergences, which are closely related to anomalies. The method allows to determine also the exact one-point functions of the dual field theory. Using the other Hamiltonian constraints of the bulk theory, we derive the Ward identities for diffeomorphisms and gauge invariance. In particular, we demonstrate the breaking of U(1)_R current conservation, recovering the holographic chiral anomaly recently discussed in hep-th/0112119 and hep-th/0202056. LANL EDS SIS LANLPUBL2004 SIS:2004 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory Mück, W Martelli, Dario Mueck, Wolfgang 248-276 Nucl. Phys. B 654 2003 http://cdsware.cern.ch/download/invenio-demo-site-files/0205061.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0205061.ps.gz d.martelli@qmul.ac.uk n 200219 13 20060823 0005 CER01 20020508 PUBLIC 002307939CER ARTICLE [1] J. M. Maldacena, Adv. Theor. Math. Phys. 2 (1998) 231 hep-th/9711200 [2] S. S. Gubser, I. R. Klebanov and A. M. Polyakov, Phys. Lett. B 428 (1998) 105 hep-th/9802109 [3] E. Witten, Adv. Theor. Math. Phys. 2 (1998) 253 hep-th/9802150 [4] E. D’Hoker and D. Z. Freedman, hep-th/0201253 [5] W. Mück and K. S. Viswanathan, Phys. Rev. D 58 (1998) 041901 hep-th/9804035 [6] D. Z. Freedman, S. D. Mathur, A. Matusis and L. Rastelli, Nucl. Phys. B 546 (1998) 96 hep-th/9812032 [7] H. Liu and Astron. Astrophys. Tseytlin, Nucl. Phys. B 533 (1998) 88 hep-th/9804083 [8] M. Henningson and K. Skenderis, J. High Energy Phys. 07 (1998) 023 hep-th/9806087 [9] J. D. Brown and J. W. York, Phys. Rev. D 47 (1993) 1407 [10] B. Balasubramanian and P. Kraus, Commun. Math. Phys. 208 (1999) 413 hep-th/9902121 [11] R. C. Myers, Phys. Rev. D 60 (1999) 046002 hep-th/9903203 [12] R. Emparan, C. V. Johnson and R. C. Myers, Phys. Rev. D p. 104001 (1999), hep-th/9903238 [13] S. de Haro, K. Skenderis and S. N. Solodukhin, Commun. Math. Phys. 217 (2000) 595 hep-th/0002230 [14] M. Bianchi, D. Z. Freedman and K. Skenderis, hep-th/0112119 [15] M. Bianchi, D. Z. Freedman and K. Skenderis, J. High Energy Phys. 08 (2001) 041 hep-th/0105276 [16] J. de Boer, E. Verlinde and H. Verlinde, J. High Energy Phys. 08 (2000) 003 hep-th/9912012 [17] J. Kalkkinen, D. Martelli and W. Mück, J. High Energy Phys. 04 (2001) 036 hep-th/0103111 [18] S. Corley, Phys. Lett. B 484 (2000) 141 hep-th/0004030 [19] J. Kalkkinen and D. Martelli, Nucl. Phys. B 596 (2001) 415 hep-th/0007234 [20] M. Bianchi, O. DeWolfe, D. Z. Freedman and K. Pilch, J. High Energy Phys. 01 (2001) 021 hep-th/0009156 [21] I. R. Klebanov, P. Ouyang and E. Witten, Phys. Rev. D 65 (2002) 105007 hep-th/0202056 [22] C. Fefferman and C. R. Graham, in Elie Cartan et les Mathématiques d’aujour d’hui, Astérique, p. 95 (1985). [23] D. Martelli and A. Miemic, J. High Energy Phys. 04 (2002) 027 hep-th/0112150 [24] S. Ferrara and A. Zaffaroni, hep-th/9908163 [25] J. Parry, D. S. Salopek and J. M. Stewart, Phys. Rev. D 49 (1994) 2872 gr-qc/9310020 [26] B. Darian, Class. Quantum Gravity 15 (1998) 143 gr-qc/9707046 [27] V. L. Campos, G. Ferretti, H. Larsson, D. Martelli and B. E. W. Nilsson, J. High Energy Phys. 0006 (2000) 023 hep-th/0003151 [28] L. Girardello, M. Petrini, M. Porrati and A. Zaffaroni, Nucl. Phys. B 569 (2000) 451 hep-th/9909047 [29] W. Mück, hep-th/0201100 [30] W. Mück and K. S. Viswanathan, Phys. Rev. D 58 (1998) 106006 hep-th/9805145 [31] M. M. Taylor-Robinson, hep-th/0002125 [32] C. W. Misner, K. S. Thorne and J. A. Wheeler, Gravitation, Freeman, San Francisco (1973). CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181361227-0-29-24-0-2 SzGeCERN 2327507CERCER SLAC 5004500 hep-th/0207111 eng BROWN-HEP-1309 Ramgoolam, S Brown University Higher dimensional geometries related to fuzzy odd-dimensional spheres 2002 Providence, RI Brown Univ. 11 Jul 2002 32 p We study $SO(m)$ covariant Matrix realizations of $ \sum_{i=1}^{m} X_i^2 = 1 $ for even $m$ as candidate fuzzy odd spheres following hep-th/0101001. As for the fuzzy four sphere, these Matrix algebras contain more degrees of freedom than the sphere itself and the full set of variables has a geometrical description in terms of a higher dimensional coset. The fuzzy $S^{2k-1} $ is related to a higher dimensional coset $ {SO(2k) \over U(1) \times U(k-1)}$. These cosets are bundles where base and fibre are hermitian symmetric spaces. The detailed form of the generators and relations for the Matrix algebras related to the fuzzy three-spheres suggests Matrix actions which admit the fuzzy spheres as solutions. These Matrix actions are compared with the BFSS, IKKT and BMN Matrix models as well as some others. The geometry and combinatorics of fuzzy odd spheres lead to some remarks on the transverse five-brane problem of Matrix theories and the exotic scaling of the entropy of 5-branes with the brane number. LANL EDS SIS:2003 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory Ramgoolam, Sanjaye 064 J. High Energy Phys. 10 2002 http://cdsware.cern.ch/download/invenio-demo-site-files/0207111.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0207111.ps.gz ramgosk@het.brown.edu n 200228 13 20070205 2036 CER01 20020712 PUBLIC 002327507CER ARTICLE [1] D. Kabat and W. Taylor, "Spherical membranes in Matrix theory," Adv. Theor. Math. Phys. 2 (1998) 181 hep-th/9711078 [2] J.Castelino, S. Lee and W. Taylor IV, "Longitudinal Five-Branes Ann. Sci. Four Spheres in Matrix Theory," Nucl. Phys. B 526 (1998) 334 hep-th/9712105 [3] R. Myers, "Dielectric-Branes," hep-th/9910053 [4] N. Constable, R. Myers, O. Tafjord, "Non-abelian Brane intersections, " J. High Energy Phys. 0106 (2001) 023 hep-th/0102080 [5] D. Berenstein, J. Maldacena, H. Nastase "Strings in flat space and pp waves from N = 4 Super Yang Mills," J. High Energy Phys. 0204 (2002) 013 hep-th/0202021 [6] J. Maldacena, A. Strominger, "AdS3 Black Holes and a Stringy Exclusion Principle," hep-th/980408, J. High Energy Phys. 9812 (1998) 005 [7] A. Jevicki, S. Ramgoolam, "Non commutative gravity from the ADS/CFT correspon-dence," J. High Energy Phys. 9904 (1999) 032 hep-th/9902059 [8] P.M. Ho, M. Li, "Fuzzy Spheres in AdS/CFT Correspondence and Holography from Noncommutativity," Nucl. Phys. B 596 (2001) 259 hep-th/0004072 [9] M. Berkooz, H. Verlinde "Matrix Theory, AdS/CFT and Higgs-Coulomb Equiva-lence," J. High Energy Phys. 9911 (1999) 037 hep-th/9907100 [10] Z. Guralnik, S. Ramgoolam "On the Polarization of Unstable D0-Branes into Non-Commutative Odd Spheres," J. High Energy Phys. 0102 (2001) 032 hep-th/0101001 [11] S. Ramgoolam, " On spherical harmonics for fuzzy spheres in diverse dimensions, " Nucl. Phys. B 610 (2001) 461 hep-th/0105006 [12] P.M. Ho, S. Ramgoolam, "Higher dimensional geometries from matrix brane construc-tions," Nucl. Phys. B 627 (2002) 266 hep-th/0111278 [13] Y. Kimura "Noncommutative Gauge Theory on Fuzzy Four-Sphere and Matrix Model," hep-th/0204256 [14] S.C Zhang, J. Hu "A Four Dimensional Generalization of the Quantum Hall Effect," Science 294 (2001) 823 cond-mat/0110572 [15] M. Fabinger, "Higher-Dimensional Quantum Hall Effect in String Theory," J. High Energy Phys. 0205 (2002) 037 hep-th/0201016 [16] A.P. Balachandran "Quantum Spacetimes in the Year 1," hep-th/0203259 [17] A. Salam, J. Strathdee, "On Kaluza-Klein Theory," Ann. Phys. 141, 1982, 216 [18] N.L. Wallach, "Harmonic Analysis on homogeneous spaces," M. Drekker Inc. NY 1973 [19] Y. Kazama, H. Suzuki, "New N = 2 superconformal field theories and superstring compactification" Nucl. Phys. B 321 (1989) 232 [20] M. Kramer, " Some remarks suggesting an interesting theory of harmonic functions on SU(2n + 1)/Sp(n) and SO(2n + 1)/U(n)," Arch. Math. 33 ( 1979/80), 76-79. [21] P.M. Ho, "Fuzzy sphere from Matrix model," J. High Energy Phys. 0012 (2000) 015 hep-th/0110165 [22] T. Banks, W. Fischler, S. Shenker, L. Susskind, "M-Theory Ann. Sci. a Matrix model A conjecture," Phys. Rev. D 55 (1997) 5112 hep-th/9610043 [23] N. Ishibashi, H. Kawai, Y. Kitazawa, A. Tsuchiya, " A large-N reduced model Ann. Sci. Superstring, " Nucl. Phys. B 498 (1997) 467 [24] V. Periwal, "Matrices on a point Ann. Sci. the theory of everything," Phys. Rev. D 55 (1997) 1711 [25] S. Chaudhuri, "Bosonic Matrix Theory and D-branes," hep-th/0205306 [26] M. Bagnoud, L. Carlevaro, A. Bilal, "Supermatrix models for M-theory based on osp(1—32,R)," hep-th/0201183 [27] L. Smolin, "M theory Ann. Sci. a matrix extension of Chern Simons theory," Nucl. Phys. B 591 (2000) 227 hep-th/0002009 [28] I. Bandos, J. Lukierski, "New superparticle models outside the HLS suersymmetry scheme," hep-th/9812074 [29] S.Iso, Y.Kimura, K.Tanaka, K. Wakatsuki, "Noncommutative Gauge Theory on Fuzzy Sphere from Matrix Model," hep-th/0101102 [30] W. Fulton and G. Harris, "Representation theory," Springer Verlag 1991. [31] M. Atiyah and E. Witten, "M-Theory Dynamics On A Manifold Of G2 Holonomy," hep-th/0107177 [32] S. Ramgoolam, D. Waldram, "Zero branes on a compact orbifold," J. High Energy Phys. 9807 (1998) 009 hep-th/9805191 [33] Brian R. Greene, C.I. Lazaroiu, Piljin Yi " D Particles on T4 /Z(N) Orbifolds and their resolutions," Nucl. Phys. B 539 (1999) 135 hep-th/9807040 [34] I. Klebanov, A. Tseytlin, "Entropy of Near-Extremal Black p-branes," Nucl. Phys. B 475 (1996) 164 hep-th/9604089 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181363461-0-26-22-0-4 SzGeCERN 2341644CERCER SLAC 5208424 hep-th/0209226 eng PUTP-2002-48 SLAC-PUB-9504 SU-ITP-2002-36 Adams, A Stanford University Decapitating Tadpoles 2002 Beijing Beijing Univ. Dept. Phys. 26 Sep 2002 31 p We argue that perturbative quantum field theory and string theory can be consistently modified in the infrared to eliminate, in a radiatively stable manner, tadpole instabilities that arise after supersymmetry breaking. This is achieved by deforming the propagators of classically massless scalar fields and the graviton so as to cancel the contribution of their zero modes. In string theory, this modification of propagators is accomplished by perturbatively deforming the world-sheet action with bi-local operators similar to those that arise in double-trace deformations of AdS/CFT. This results in a perturbatively finite and unitary S-matrix (in the case of string theory, this claim depends on standard assumptions about unitarity in covariant string diagrammatics). The S-matrix is parameterized by arbitrary scalar VEVs, which exacerbates the vacuum degeneracy problem. However, for generic values of these parameters, quantum effects produce masses for the nonzero modes of the scalars, lifting the fluctuating components of the moduli. LANL EDS SzGeCERN Particle Physics - Theory PREPRINT LANL EDS High Energy Physics - Theory McGreevy, J Silverstein, E Adams, Allan Greevy, John Mc Silverstein, Eva http://cdsware.cern.ch/download/invenio-demo-site-files/0209226.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0209226.ps.gz evas@slac.stanford.edu n 200239 11 20060218 0013 CER01 20020927 PUBLIC 002341644CER PREPRINT [1] W. Fischler and L. Susskind, "Dilaton Tadpoles, String Condensates And Scale In-variance," Phys. Lett. B 171 (1986) 383 [2] W. Fischler and L. Susskind, "Dilaton Tadpoles, String Condensates And Scale In-variance. 2," Phys. Lett. B 173 (1986) 262 [3] C. G. Callan, C. Lovelace, C. R. Nappi and S. A. Yost, "Loop Corrections To Super-string Equations Of Motion," Nucl. Phys. B 308 (1988) 221 [4] H. Ooguri and N. Sakai, "String Multiloop Corrections To Equations Of Motion," Nucl. Phys. B 312 (1989) 435 [5] J. Polchinski, "Factorization Of Bosonic String Amplitudes," Nucl. Phys. B 307 (1988) 61 [6] H. La and P. Nelson, "Effective Field Equations For Fermionic Strings," Nucl. Phys. B 332 (1990) 83 [7] O. Aharony, M. Berkooz and E. Silverstein, "Multiple-trace operators and non-local string theories," J. High Energy Phys. 0108 (2001) 006 hep-th/0105309 [8] O. Aharony, M. Berkooz and E. Silverstein, "Non-local string theories on AdS3 × S3 and stable non-supersymmetric backgrounds," Phys. Rev. D 65 (2002) 106007 hep-th/0112178 [9] N. Arkani-Hamed, S. Dimopoulos, G. Dvali, G. Gabadadze, to appear. [10] E. Witten, "Strong Coupling Expansion Of Calabi-Yau Compactification," Nucl. Phys. B 471 (1996) 135 hep-th/9602070 [11] O. Aharony and T. Banks, "Note on the Quantum Mech. of M theory," J. High Energy Phys. 9903 (1999) 016 hep-th/9812237 [12] T. Banks, "On isolated vacua and background independence," arXiv hep-th/0011255 [13] R. Bousso and J. Polchinski, "Quantization of four-form fluxes and dynamical neutral-ization of the cosmological constant," J. High Energy Phys. 0006 (2000) 006 hep-th/0004134 [14] S. B. Giddings, S. Kachru and J. Polchinski, "Hierarchies from fluxes in string com-pactifications," arXiv hep-th/0105097 [15] A. Maloney, E. Silverstein and A. Strominger, "De Sitter space in noncritical string theory," arXiv hep-th/0205316 [16] S. Kachru and E. Silverstein, "4d conformal theories and strings on orbifolds," Phys. Rev. Lett. 80 (1998) 4855 hep-th/9802183 [17] A. E. Lawrence, N. Nekrasov and C. Vafa, "On conformal field theories in four di-mensions," Nucl. Phys. B 533 (1998) 199 hep-th/9803015 [18] M. Bershadsky, Z. Kakushadze and C. Vafa, "String expansion Ann. Sci. large N expansion of gauge theories," Nucl. Phys. B 523 (1998) 59 hep-th/9803076 [19] I. R. Klebanov and Astron. Astrophys. Tseytlin, "A non-supersymmetric large N CFT from type 0 string theory," J. High Energy Phys. 9903 (1999) 015 hep-th/9901101 [20] E. Witten, "Multi-trace operators, boundary conditions, and AdS/CFT correspon-dence," arXiv hep-th/0112258 [21] M. Berkooz, A. Sever and A. Shomer, "Double-trace deformations, boundary condi-tions and spacetime singularities," J. High Energy Phys. 0205 (2002) 034 hep-th/0112264 [22] A. Adams and E. Silverstein, "Closed string tachyons, AdS/CFT, and large N QCD," Phys. Rev. D 64 (2001) 086001 hep-th/0103220 [23] Astron. Astrophys. Tseytlin and K. Zarembo, "Effective potential in non-supersymmetric SU(N) x SU(N) gauge theory and interactions of type 0 D3-branes," Phys. Lett. B 457 (1999) 77 hep-th/9902095 [24] M. Strassler, to appear [25] V. Balasubramanian and P. Kraus, "A stress tensor for anti-de Sitter gravity," Commun. Math. Phys. 208 (1999) 413 hep-th/9902121 [26] S. Thomas, in progress. [27] O. Aharony, M. Fabinger, G. T. Horowitz and E. Silverstein, "Clean time-dependent string backgrounds from bubble baths," J. High Energy Phys. 0207 (2002) 007 hep-th/0204158 [28] G. Dvali, G. Gabadadze and M. Shifman, "Diluting cosmological constant in infinite volume extra dimensions," arXiv hep-th/0202174 [29] D. Friedan, "A tentative theory of large distance Physics," arXiv hep-th/0204131 [30] G. Dvali, G. Gabadadze and M. Shifman, "Diluting cosmological constant via large distance modification of gravity," arXiv hep-th/0208096 [31] J. W. Moffat, arXiv hep-th/0207198 [32] Astron. Astrophys. Tseytlin, "On ’Macroscopic String’ Approximation In String Theory," Phys. Lett. B 251 (1990) 530 [33] B. Zwiebach, "Closed string field theory Quantum action and the B-V master equa-tion," Nucl. Phys. B 390 (1993) 33 hep-th/9206084 [34] J. Polchinski, "String Theory. Vol. 1 An Introduction To The Bosonic String," Cam-bridge, UK Univ. Phys. Rev. (1998) 402 p. [35] S. Kachru, X. Liu, M. B. Schulz and S. P. Trivedi, "Supersymmetry changing bubbles in string theory," arXiv hep-th/0205108 [36] A. R. Frey and J. Polchinski, "N = 3 warped compactifications," Phys. Rev. D 65 (2002) 126009 hep-th/0201029 [37] A. Adams, O. Aharony, J. McGreevy, E. Silverstein,..., work in progress CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181365384-0-25-23-0-5 SzGeCERN 2342206CERCER SLAC 5224543 hep-th/0209257 eng Berkooz, M The Weizmann Inst. of Science Double Trace Deformations, Infinite Extra Dimensions and Supersymmetry Breaking 2002 29 Sep 2002 22 p It was recently shown how to break supersymmetry in certain $AdS_3$ spaces, without destabilizing the background, by using a ``double trace'' deformation which localizes on the boundary of space-time. By viewing spatial sections of $AdS_3$ as a compactification space, one can convert this into a SUSY breaking mechanism which exists uniformly throughout a large 3+1 dimensional space-time, without generating any dangerous tadpoles. This is a generalization of a Visser type infinite extra dimensions compactification. Although the model is not Lorentz invariant, the dispersion relation is relativistic at high enough momenta, and it can be arranged such that at the same kinematical regime the energy difference between between former members of a SUSY multiplet is large. LANL EDS SzGeCERN Particle Physics - Theory PREPRINT LANL EDS High Energy Physics - Theory Berkooz, Micha http://cdsware.cern.ch/download/invenio-demo-site-files/0209257.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0209257.ps.gz berkooz@wisemail.weizmann.ac.il n 200240 11 20060603 0013 CER01 20021001 PUBLIC 002342206CER PREPRINT [1] T. Banks, "Cosmological Breaking of Supersymmetry ? Or Little Lambda Goes Back to the Future 2.", hep-th/0007146 [2] J. Brown and C. Teitelboim, Phys. Lett. B 195 (1987) 177 [2] Nucl. Phys. B 297 (1988) 787 [2] R. Bousso and J. Polchinski, "Quantization of Four-form Fluxes and Dynami-cal Neutralization of the Cosmological Constant", J. High Energy Phys. 0006 (2000) 006 hep-th/0004134 [2] J.L. Feng, J. March-Russell, S. Sethi and F. Wilczek, "Saltatory Relaxation of the Cosmological Constant", Nucl. Phys. B 602 (2001) 307 hep-th/0005276 [3] E. Witten, "Strong Coupling and the Cosmological Constant", Mod. Phys. Lett. A 10 (1995) 2153 hep-th/9506101 [4] A. Maloney, E. Silverstein and A. Strominger "de Sitter Space in Non-Critical String Theory", hep-th/0205316 [4] , Hawking Festschrift. [5] S. Kachru and E. Silverstein, "On Vanishing Two Loop Cosmological Constant in Nonsupersymmetric Strings", J. High Energy Phys. 9901 (1999) 004 hep-th/9810129 [6] S. Kachru, M. Schulz and E. Silverstein, "Self-tuning flat domain walls in 5d gravity and string theory", Phys. Rev. D 62 (2000) 045021 hep-th/0001206 [7] V.A. Rubakov and M.E. Shaposhnikov, "Extra Space-time Dimensions Towards a Solution to the Cosmological Constant Problem", Phys. Lett. B 125 (1983) 139 [8] G. Dvali, G. Gabadadze and M. Shifman, "Diluting Cosmological Constant In Infinite Volume Extra", hep-th/0202174 [9] G.W. Moore, "Atkin-Lehner Symmetry", Nucl. Phys. B 293 (1987) 139 [9] Erratum- Nucl. Phys. B 299 (1988) 847 [10] By K. Akama, "Pregeometry", Lect. Notes Phys. 176 (1982) 267 hep-th/0001113 [10] Also in *Nara 1982, Proceedings, Gauge Theory and Gravitation*, 267-271. [11] M. Visser, "An Exotic Class of Kaluza-Klein Models", Phys. Lett. B 159 (1985) 22 hep-th/9910093 [12] L. Randall and R. Sundrum, "An Alternative to Compactification", Phys. Rev. Lett. 83 (1999) 4690 hep-th/9906064 [13] A. Adams and E. Silverstein, "Closed String Tachyons, AdS/CFT and Large N QCD", Phys. Rev. D 64 (2001) 086001 hep-th/0103220 [14] O. Aharony, M. Berkooz and E. Silverstein, "Multiple-Trace Operators and Non-Local String Theories", J. High Energy Phys. 0108 (2001) 006 hep-th/0105309 [15] O. Aharony, M. Berkooz and E. Silverstein, "Non-local String Theories on AdS3 × S3 and non-supersymmetric backgrounds", Phys. Rev. D 65 (2002) 106007 hep-th/0112178 [16] M. Berkooz, A. Sever and A. Shomer, "Double-trace Deformations, Boundary Condi-tions and Space-time Singularities", J. High Energy Phys. 0205 (2002) 034 hep-th/0112264 [17] E. Witten, "Multi-Trace Operators, Boundary Conditions, And AdS/CFT Correspon-dence", hep-th/0112258 [18] A. Sever and A. Shomer, "A Note on Multi-trace Deformations and AdS/CFT", J. High Energy Phys. 0207 (2002) 027 hep-th/0203168 [19] J. Maldacena, "The large N limit of superconformal field theories and supergravity," Adv. Theor. Math. Phys. 2 (1998) 231 hep-th/9711200 [19] Int. J. Theor. Phys. 38 (1998) 1113 [20] E. Witten, "Anti-de Sitter space and holography," Adv. Theor. Math. Phys. 2 (1998) 253 hep-th/9802150 [21] S. S. Gubser, I. R. Klebanov and A. M. Polyakov, "Gauge theory correlators from non-critical string theory," hep-th/980210, Phys. Lett. B 428 (1998) 105 [22] O. Aharony, S.S. Gubser, J. Maldacena, H. Ooguri and Y. Oz, "Large N Field Theories, String Theory and Gravity", Phys. Rep. 323 (2000) 183 hep-th/9905111 [23] A. Giveon, D. Kutasov and N. Seiberg, "Comments on string theory on AdS3," he-th/9806194, Adv. Theor. Math. Phys. 2 (1998) 733 [24] J. Maldacena, J. Michelson and A. Strominger, "Anti-de Sitter Fragmentation", J. High Energy Phys. 9902 (1999) 011 hep-th/9812073 [25] N. Seiberg and E. Witten, "The D1/D5 System And Singular CFT", J. High Energy Phys. 9904 (1999) 017 hep-th/9903224 [26] J. Maldacena and H. Ooguri, "Strings in AdS3 and the SL(2, R) WZW Model. Part 1 The Spectrum", J. Math. Phys. 42 (2001) 2929 hep-th/0001053 [27] I. R. Klebanov and E. Witten, "AdS/CFT correspondence and Symmetry breaking," Nucl. Phys. B 556 (1999) 89 hep-th/9905104 [28] R. Kallosh, A.D. Linde, S. Prokushkin and M. Shmakova, "Gauged Supergravities, de Sitter space and Cosmology", Phys. Rev. D 65 (2002) 105016 hep-th/0110089 [29] R. Kallosh, "Supergravity, M-Theory and Cosmology", hep-th/0205315 [30] R. Kallosh, A.D. Linde, S. Prokushkin and M. Shmakova, "Supergravity, Dark Energy and the Fate of the Universe", hep-th/0208156 [31] C.M. Hull and N.P. Warner, "Non-compact Gauging from Higher Dimensions", Class. Quantum Gravity 5 (1988) 1517 [32] P. Kraus and E.T. Tomboulis "Title Photons and Gravitons Ann. Sci. Goldstone Bosons, and the Cosmological Constant", Phys. Rev. D 66 (2002) 045015 hep-th/0203221 [33] M. Berkooz and S.-J. Rey, "Non-Supersymmetric Stable Vacua of M-Theory", J. High Energy Phys. 9901 (1999) 014 hep-th/9807200 [34] A. Adams, J. McGreevy and E. Silverstein, "Decapitating Tadpoles", hep-th/0209226 [35] N. Arkani-Hamed, S. Dimopoulos, G. Dvali and G. Gabadadze, "Non-Local Modifi-cation of Gravity and the Cosmological Constant Problem", hep-th/0209227 [36] V. Balasubramanian, P. Kraus and A. Lawrence "Bulk vs. Boundary Dynamics in Anti-de Sitter Space-time" Phys. Rev. D 59 (1999) 046003 hep-th/9805171 [37] H. Verlinde, "Holography and Compactification", Nucl. Phys. B 580 (2000) 264 hep-th/9906182 [38] S.B. Giddings, S. Kachru and J. Polchinski, "Hierarchies from Fluxes in String Com-pactifications", hep-th/0105097 [39] G. Dvali, G. Gabadadze and M. Shifman, "Diluting Cosmological Constant via Large Distance Modification of Gravity" hep-th/0208096 [40] D. Gepner, "Lectures on N=2 String Theory", In Superstrings 89, The Trieste Spring School, 1989. [41] R.G. Leigh, "Dirac-Born-Infeld Action from Dirichlet SIGMA, Symmetry Integrability Geom. Methods Appl. Model", Mod. Phys. Lett. A 4 (1989) 2767 [42] J. Bagger and A. Galperin "Linear and Non-linear Supersymmetries", hep-th/9810109 [42] , *Dubna 1997, Supersymmetries and quantum symmetries* 3-20. [43] A, Giveon and M. Rocek, "Supersymmetric String Vacua on AdS3 × N ", hep-th/9904024 [44] E.J. Martinec and W. McElgin, "String Theory on AdS Orbifolds" J. High Energy Phys. 0204 (2002) 029 hep-th/0106171 [45] E.J. Martinec and W. McElgin, "Exciting AdS Orbifolds", hep-th/0206175 [46] V. Balasubramanian, J. de Boer, E. Keski-Vakkuri and S.F. Ross, "Supersymmetric Conical Defects", Phys. Rev. D 64 (2001) 064011 hep-th/0011217 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181365450-0-40-37-0-4 SzGeCERN CERCER 2344398 SLAC 5256739 hep-th/0210075 eng SISSA-2002-64-EP Borunda, M INFN On the quantum stability of IIB orbifolds and orientifolds with Scherk-Schwarz SUSY breaking 2003 Trieste Scuola Int. Sup. Studi Avan. 8 Oct 2002 26 p We study the quantum stability of Type IIB orbifold and orientifold string models in various dimensions, including Melvin backgrounds, where supersymmetry (SUSY) is broken {\it \`a la} Scherk-Schwarz (SS) by twisting periodicity conditions along a circle of radius R. In particular, we compute the R-dependence of the one-loop induced vacuum energy density $\rho(R)$, or cosmological constant. For SS twists different from Z2 we always find, for both orbifolds and orientifolds, a monotonic $\rho(R)<0$, eventually driving the system to a tachyonic instability. For Z2 twists, orientifold models can have a different behavior, leading either to a runaway decompactification limit or to a negative minimum at a finite value R_0. The last possibility is obtained for a 4D chiral orientifold model where a more accurate but yet preliminary analysis seems to indicate that $R_0\to \infty$ or towards the tachyonic instability, as the dependence on the other geometric moduli is included. LANL EDS SIS LANLPUBL2003 SIS:2003 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory Serone, M Trapletti, M 85-108 Nucl. Phys. B 653 2003 http://cdsware.cern.ch/download/invenio-demo-site-files/0210075.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0210075.ps.gz serone@he.sissa.it n 200241 13 20060823 0006 CER01 20021009 PUBLIC 002344398CER ARTICLE [1] J. Scherk and J. H. Schwarz, Phys. Lett. B 82 (1979) 60 [1] Nucl. Phys. B 153 (1979) 61 [2] R. Rohm, Nucl. Phys. B 237 (1984) 553 [3] H. Itoyama and T.R. Taylor, Phys. Lett. B 186 (1987) 129 [4] C. Kounnas and M. Porrati, Nucl. Phys. B 310 (1988) 355 [4] S. Ferrara, C. Kounnas, M. Porrati and F. Zwirner, Nucl. Phys. B 318 (1989) 75 [4] C. Kounnas and B. Rostand, Nucl. Phys. B 341 (1990) 641 [4] I. Antoniadis and C. Kounnas, Phys. Lett. B 261 (1991) 369 [4] E. Kiritsis and C. Kounnas, Nucl. Phys. B 503 (1997) 117 hep-th/9703059 [5] I. Antoniadis, Phys. Lett. B 246 (1990) 377 [6] C. A. Scrucca and M. Serone, J. High Energy Phys. 0110 (2001) 017 hep-th/0107159 [7] I. Antoniadis, E. Dudas and A. Sagnotti, Nucl. Phys. B 544 (1999) 469 hep-th/9807011 [8] I. Antoniadis, G. D’Appollonio, E. Dudas and A. Sagnotti, Nucl. Phys. B 553 (1999) 133 hep-th/9812118 [8] Nucl. Phys. B 565 (2000) 123 hep-th/9907184 [8] I. Antoniadis, K. Benakli and A. Laugier, hep-th/0111209 [9] C. A. Scrucca, M. Serone and M. Trapletti, Nucl. Phys. B 635 (2002) 33 hep-th/0203190 [10] J. D. Blum and K. R. Dienes, Nucl. Phys. B 516 (1998) 83 hep-th/9707160 [11] M. Fabinger and P. Horava, Nucl. Phys. B 580 (2000) 243 hep-th/0002073 [12] P. Ginsparg and C. Vafa, Nucl. Phys. B 289 (1987) 414 [13] M. A. Melvin, Phys. Lett. 8 (1964) 65 [13] G. W. Gibbons and K. i. Maeda, Nucl. Phys. B 298 (1988) 741 [13] F. Dowker, J. P. Gauntlett, D. A. Kastor and J. Traschen, Phys. Rev. D 49 (1994) 2909 hep-th/9309075 [14] A. Adams, J. Polchinski and E. Silverstein, J. High Energy Phys. 0110 (2001) 029 hep-th/0108075 [15] J. R. David, M. Gutperle, M. Headrick and S. Minwalla, J. High Energy Phys. 0202 (2002) 041 hep-th/0111212 [16] T. Suyama, J. High Energy Phys. 0207 (2002) 015 hep-th/0110077 [17] C. Vafa, arXiv hep-th/0111051 [18] G. Aldazabal, A. Font, L. E. Ibanez and G. Violero, Nucl. Phys. B 536 (1998) 29 hep-th/9804026 [19] K. H. O’Brien and C. I. Tan, Phys. Rev. D 36 (1987) 1184 [20] J. Polchinski, Commun. Math. Phys. 104 (1986) 37 [21] D. M. Ghilencea, H. P. Nilles and S. Stieberger, arXiv hep-th/0108183 [22] P. Mayr and S. Stieberger, Nucl. Phys. B 407 (1993) 725 hep-th/9303017 [23] E. Alvarez, Nucl. Phys. B 269 (1986) 596 [24] J. G. Russo and Astron. Astrophys. Tseytlin, J. High Energy Phys. 0111 (2001) 065 hep-th/0110107 [24] Nucl. Phys. B 611 (2001) 93 hep-th/0104238 [24] A. Dabholkar, Nucl. Phys. B 639 (2002) 331 hep-th/0109019 [24] M. Gutperle and A. Strominger, J. High Energy Phys. 0106 (2001) 035 hep-th/0104136 [24] M. S. Costa and M. Gutperle, J. High Energy Phys. 0103 (2001) 027 hep-th/0012072 [25] E. Dudas and J. Mourad, Nucl. Phys. B 622 (2002) 46 hep-th/0110186 [25] T. Takayanagi and T. Uesugi, J. High Energy Phys. 0111 (2001) 036 hep-th/0110200 [25] Phys. Lett. B 528 (2002) 156 hep-th/0112199 [25] C. Angelantonj, E. Dudas and J. Mourad, Nucl. Phys. B 637 (2002) 59 hep-th/0205096 [26] M. Trapletti, in preparation. [27] A. Adams, J. McGreevy and E. Silverstein, arXiv hep-th/0209226 [28] E. Witten, Nucl. Phys. B 195 (1982) 481 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181365734-0-27-39-0-1 SzGeCERN 2355566CERCER SLAC 5419166 hep-th/0212138 eng PUPT-2069 Gubser, S S Princeton University A universal result on central charges in the presence of double-trace deformations 2003 Princeton, NJ Princeton Univ. Joseph-Henry Lab. Phys. 12 Dec 2002 15 p We study large N conformal field theories perturbed by relevant double-trace deformations. Using the auxiliary field trick, or Hubbard-Stratonovich transformation, we show that in the infrared the theory flows to another CFT. The generating functionals of planar correlators in the ultraviolet and infrared CFT's are shown to be related by a Legendre transform. Our main result is a universal expression for the difference of the scale anomalies between the ultraviolet and infrared fixed points, which is of order 1 in the large N expansion. Our computations are entirely field theoretic, and the results are shown to agree with predictions from AdS/CFT. We also remark that a certain two-point function can be computed for all energy scales on both sides of the duality, with full agreement between the two and no scheme dependence. LANL EDS SIS LANLPUBL2004 SIS:2004 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory Klebanov, Igor R Gubser, Steven S. Klebanov, Igor R. 23-36 Nucl. Phys. B 656 2003 http://cdsware.cern.ch/download/invenio-demo-site-files/0212138.ps.gz http://cdsware.cern.ch/download/invenio-demo-site-files/0212138.pdf ssgubser@Princeton.EDU n 200250 13 20060823 0007 CER01 20021213 PUBLIC 002355566CER ARTICLE CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181367639-4-0-0-0-0 SzGeCERN 2356302CERCER SLAC 5423422 hep-th/0212181 eng Girardello, L INFN Universita di Milano-Bicocca 3-D Interacting CFTs and Generalized Higgs Phenomenon in Higher Spin Theories on AdS 2003 16 Dec 2002 8 p We study a duality, recently conjectured by Klebanov and Polyakov, between higher-spin theories on AdS_4 and O(N) vector models in 3-d. These theories are free in the UV and interacting in the IR. At the UV fixed point, the O(N) model has an infinite number of higher-spin conserved currents. In the IR, these currents are no longer conserved for spin s>2. In this paper, we show that the dual interpretation of this fact is that all fields of spin s>2 in AdS_4 become massive by a Higgs mechanism, that leaves the spin-2 field massless. We identify the Higgs field and show how it relates to the RG flow connecting the two CFTs, which is induced by a double trace deformation. LANL EDS SIS LANLPUBL2004 SIS:2004 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory Porrati, Massimo Zaffaroni, A 289-293 Phys. Lett. B 561 2003 http://cdsware.cern.ch/download/invenio-demo-site-files/0212181.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0212181.ps.gz alberto.zaffaroni@mib.infn.it n 200251 13 20060823 0007 CER01 20021217 PUBLIC 002356302CER ARTICLE [1] D. Francia and A. Sagnotti, Phys. Lett. B 543 (2002) 303 hep-th/0207002 [1] P. Haggi-Mani and B. Sundborg, J. High Energy Phys. 0004 (2000) 031 hep-th/0002189 [1] B. Sundborg, Nucl. Phys. B, Proc. Suppl. 102 (2001) 113 hep-th/0103247 [1] E. Sezgin and P. Sundell, J. High Energy Phys. 0109 (2001) 036 hep-th/0105001 [1] A. Mikhailov, hep-th/0201019 [1] E. Sezgin and P. Sundell, Nucl. Phys. B 644 (2002) 303 hep-th/0205131 [1] E. Sezgin and P. Sundell, J. High Energy Phys. 0207 (2002) 055 hep-th/0205132 [1] J. Engquist, E. Sezgin and P. Sundell, Class. Quantum Gravity 19 (2002) 6175 hep-th/0207101 [1] M. A. Vasiliev, Int. J. Mod. Phys. D 5 (1996) 763 hep-th/9611024 [1] D. Anselmi, Nucl. Phys. B 541 (1999) 323 hep-th/9808004 [1] D. Anselmi, Class. Quantum Gravity 17 (2000) 1383 hep-th/9906167 [2] E. S. Fradkin and M. A. Vasiliev, Nucl. Phys. B 291 (1987) 141 [2] E. S. Fradkin and M. A. Vasiliev, Phys. Lett. B 189 (1987) 89 [3] I. R. Klebanov and A. M. Polyakov, Phys. Lett. B 550 (2002) 213 hep-th/0210114 [4] M. A. Vasiliev, hep-th/9910096 [5] T. Leonhardt, A. Meziane and W. Ruhl, hep-th/0211092 [6] O. Aharony, M. Berkooz and E. Silverstein, J. High Energy Phys. 0108 (2001) 006 hep-th/0105309 [7] E. Witten, hep-th/0112258 [8] M. Berkooz, A. Sever and A. Shomer J. High Energy Phys. 0205 (2002) 034 hep-th/0112264 [9] S. S. Gubser and I. Mitra, hep-th/0210093 [10] S. S. Gubser and I. R. Klebanov, hep-th/0212138 [11] M. Porrati, J. High Energy Phys. 0204 (2002) 058 hep-th/0112166 [12] K. G. Wilson and J. B. Kogut, Phys. Rep. 12 (1974) 75 [13] I. R. Klebanov and E. Witten, Nucl. Phys. B 556 (1999) 89 hep-th/9905104 [14] W. Heidenreich, J. Math. Phys. 22 (1981) 1566 [15] D. Anselmi, hep-th/0210123 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181367768-0-22-19-0-0 SzGeCERN 20041129103619.0 2357700CERCER SLAC 5435544 hep-th/0212314 eng KUNS-1817 YITP-2002-73 TAUP-2719 Fukuma, M Kyoto University Holographic Renormalization Group 2003 Kyoto Kyoto Univ. 26 Dec 2002 90 p The holographic renormalization group (RG) is reviewed in a self-contained manner. The holographic RG is based on the idea that the radial coordinate of a space-time with asymptotically AdS geometry can be identified with the RG flow parameter of the boundary field theory. After briefly discussing basic aspects of the AdS/CFT correspondence, we explain how the notion of the holographic RG comes out in the AdS/CFT correspondence. We formulate the holographic RG based on the Hamilton-Jacobi equations for bulk systems of gravity and scalar fields, as was introduced by de Boer, Verlinde and Verlinde. We then show that the equations can be solved with a derivative expansion by carefully extracting local counterterms from the generating functional of the boundary field theory. The calculational methods to obtain the Weyl anomaly and scaling dimensions are presented and applied to the RG flow from the N=4 SYM to an N=1 superconformal fixed point discovered by Leigh and Strassler. We further discuss a relation between the holographic RG and the noncritical string theory, and show that the structure of the holographic RG should persist beyond the supergravity approximation as a consequence of the renormalizability of the nonlinear sigma model action of noncritical strings. As a check, we investigate the holographic RG structure of higher-derivative gravity systems, and show that such systems can also be analyzed based on the Hamilton-Jacobi equations, and that the behaviour of bulk fields are determined solely by their boundary values. We also point out that higher-derivative gravity systems give rise to new multicritical points in the parameter space of the boundary field theories. LANL EDS SIS INIS2004 SIS LANLPUBL2004 SIS:2004 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory Matsuura, S Sakai, T Fukuma, Masafumi Matsuura, So Sakai, Tadakatsu 489-562 Prog. Theor. Phys. 109 2003 http://cdsware.cern.ch/download/invenio-demo-site-files/0212314.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0212314.ps.gz matsu@yukawa.kyoto-u.ac.jp n 200201 13 20051024 1938 CER01 20021230 PUBLIC 002357700CER ARTICLE [1] Y. Nambu, in Symmetries and quark models, ed. R. Chand (Tordon and Breach 1970), p 269; H. Nielsen, in the 15th International Conference on High Energy Physics (Kiev 1970); L. Susskind, Nuovo Cimento A 69 (1970) 457 [2] G. ’t Hooft, "A Planar Diagram Theory For Strong Interactions," Nucl. Phys. B 72 (1974) 461 [3] K. G. Wilson, ‘Confinement of Quarks," Phys. Rev. D 10 (1974) 2445 [4] R. Gopakumar and C. Vafa, "On the gauge theory/geometry correspondence," Adv. Theor. Math. Phys. 3 (1999) 1415 hep-th/9811131 [5] J. Maldacena, "The large N limit of superconformal field theories and supergravity," Adv. Theor. Math. Phys. 2 (1998) 231 hep-th/9711200 [6] S. S. Gubser, I. R. Klebanov and A. M. Polyakov, "Gauge Theory Correlators from Non-Critical String Theory," Phys. Lett. B 428 (1998) 105 hep-th/9802109 [7] E. Witten, "Anti De Sitter Space And Holography," Adv. Theor. Math. Phys. 2 (1998) 253 hep-th/9802150 [8] O. Aharony, S. S. Gubser, J. Maldacena, H. Ooguri and Y. Oz, "Large N Field Theories, String Theory and Gravity," hep-th/9905111 [8] , and references therein. [9] G. T. Horowitz and A. Strominger, "Black Strings And P-Branes," Nucl. Phys. B 360 (1991) 197 [10] L. Susskind and E. Witten, "The holographic bound in anti-de Sitter space," hep-th/9805114 [11] E. T. Akhmedov, "A remark on the AdS/CFT correspondence and the renormaliza-tion group flow," Phys. Lett. B 442 (1998) 152 hep-th/9806217 [12] E. Alvarez and C. Gomez, "Geometric Holography, the Renormalization Group and the c-Theorem," Nucl. Phys. B 541 (1999) 441 hep-th/9807226 [13] L. Girardello, M. Petrini, M. Porrati and A. Zaffaroni, "Novel Local CFT and Exact Results on Perturbations of N=4 Super Yang Mills from AdS Dynamics," J. High Energy Phys. 12 (1998) 022 hep-th/9810126 [14] M. Porrati and A. Starinets, "RG Fixed Points in Supergravity Duals of 4-d Field Theory and Asymptotically AdS Spaces," Phys. Lett. B 454 (1999) 77 hep-th/9903085 [15] V. Balasubramanian and P. Kraus, "Spacetime and the Holographic Renormalization Group," Phys. Rev. Lett. 83 (1999) 3605 hep-th/9903190 [16] D. Z. Freedman, S. S. Gubser, K. Pilch and N. P. Warner, "Renormalization group flows from holography supersymmetry and a c-theorem," Adv. Theor. Math. Phys. 3 (1999) 363 hep-th/9904017 [17] L. Girardello, M. Petrini, M. Porrati and A. Zaffaroni "The Supergravity Dual of N=1 Super Yang-Mills Theory," Nucl. Phys. B 569 (2000) 451 hep-th/9909047 [18] K. Skenderis and P. K. Townsend, "Gravitational Stability and Renormalization-Group Flow," Phys. Lett. B 468 (1999) 46 hep-th/9909070 [19] O. DeWolfe, D. Z. Freedman, S. S. Gubser and A. Karch, "Modeling the fifth dimen-sion with scalars and gravity," Phys. Rev. D 62 (2000) 046008 hep-th/9909134 [20] V. Sahakian, "Holography, a covariant c-function and the geometry of the renormal-ization group," Phys. Rev. D 62 (2000) 126011 hep-th/9910099 [21] E. Alvarez and C. Gomez, "A comment on the holographic renormalization group and the soft dilaton theorem," Phys. Lett. B 476 (2000) 411 hep-th/0001016 [22] S. Nojiri, S. D. Odintsov and S. Zerbini, "Quantum (in)stability of dilatonic AdS backgrounds and holographic renormalization group with gravity," Phys. Rev. D 62 (2000) 064006 hep-th/0001192 [23] M. Li, "A note on relation between holographic RG equation and Polchinski’s RG equation," Nucl. Phys. B 579 (2000) 525 hep-th/0001193 [24] V. Sahakian, "Comments on D branes and the renormalization group," J. High Energy Phys. 0005 (2000) 011 hep-th/0002126 [25] O. DeWolfe and D. Z. Freedman, "Notes on fluctuations and correlation functions in holographic renormalization group flows," hep-th/0002226 [26] V. Balasubramanian, E. G. Gimon and D. Minic, "Consistency conditions for holo-graphic duality," J. High Energy Phys. 0005 (2000) 014 hep-th/0003147 [27] C. V. Johnson, K. J. Lovis and D. C. Page, "Probing some N = 1 AdS/CFT RG flows," J. High Energy Phys. 0105 (2001) 036 hep-th/0011166 [28] J. Erdmenger, "A field-theoretical interpretation of the holographic renormalization group," Phys. Rev. D 64 (2001) 085012 hep-th/0103219 [29] S. Yamaguchi, "Holographic RG flow on the defect and g-theorem," J. High Energy Phys. 0210 (2002) 002 hep-th/0207171 [30] J. de Boer, E. Verlinde and H. Verlinde, "On the Holographic Renormalization Group," hep-th/9912012 [31] M. Henningson and K. Skenderis, "The Holographic Weyl anomaly," J. High Energy Phys. 07 (1998) 023 hep-th/9806087 [32] V. Balasubramanian and P. Kraus, "A stress tensor for anti-de Sitter gravity," Commun. Math. Phys. 208 (1999) 413 hep-th/9902121 [33] S. de Haro, K. Skenderis and S. Solodukhin, "Holographic Reconstruction of Space-time and Renormalization in the AdS/CFT Correspondence," hep-th/0002230 [34] M. J. Duff, "Twenty Years of the Weyl Anomaly," Class. Quantum Gravity 11 (1994) 1387 hep-th/9308075 [35] M. Fukuma, S. Matsuura and T. Sakai, "A note on the Weyl anomaly in the holographic renormalization group," Prog. Theor. Phys. 104 (2000) 1089 hep-th/0007062 [36] M. Fukuma and T. Sakai, "Comment on ambiguities in the holographic Weyl anomaly," Mod. Phys. Lett. A 15 (2000) 1703 hep-th/0007200 [37] M. Fukuma, S. Matsuura and T. Sakai, "Higher-Derivative Gravity and the AdS/CFT Correspondence," Prog. Theor. Phys. 105 (2001) 1017 hep-th/0103187 [38] M. Fukuma and S. Matsuura, "Holographic renormalization group structure in higher-derivative gravity," Prog. Theor. Phys. 107 (2002) 1085 hep-th/0112037 [39] A. Fayyazuddin and M. Spalinski "Large N Superconformal Gauge Theories and Supergravity Orientifolds," Nucl. Phys. B 535 (1998) 219 hep-th/9805096 [39] O. Aharony, A. Fayyazuddin and J. Maldacena, "The Large N Limit of N = 1, 2 Field Theories from Three Branes in F-theory," J. High Energy Phys. 9807 (1998) 013 hep-th/9806159 [40] M. Blau, K. S. Narain and E. Gava "On Subleading Contributions to the AdS/CFT Trace Anomaly," J. High Energy Phys. 9909 (1999) 018 hep-th/9904179 [41] O. Aharony, J. Pawelczyk, S. Theisen and S. Yankielowicz, "A Note on Anomalies in the AdS/CFT correspondence," Phys. Rev. D 60 (1999) 066001 hep-th/9901134 [42] S. Corley, "A Note on Holographic Ward Identities," Phys. Lett. B 484 (2000) 141 hep-th/0004030 [43] J. Kalkkinen and D. Martelli, "Holographic renormalization group with fermions and form fields," Nucl. Phys. B 596 (2001) 415 hep-th/0007234 [44] S. Nojiri, S. D. Odintsov and S. Ogushi, "Scheme-dependence of holographic confor-mal anomaly in d5 gauged supergravity with non-trivial bulk potential," Phys. Lett. B 494 (2000) 318 hep-th/0009015 [45] N. Hambli, "On the holographic RG-flow and the low-Energy, strong coupling, large N limit," Phys. Rev. D 64 (2001) 024001 hep-th/0010054 [46] S. Nojiri, S. D. Odintsov and S. Ogushi, "Holographic renormalization group and conformal anomaly for AdS(9)/CFT(8) correspondence," Phys. Lett. B 500 (2001) 199 hep-th/0011182 [47] J. de Boer, "The holographic renormalization group," Fortschr. Phys. 49 (2001) 339 hep-th/0101026 [48] J. Kalkkinen, D. Martelli and W. Muck, "Holographic renormalisation and anoma-lies," J. High Energy Phys. 0104 (2001) 036 hep-th/0103111 [49] S. Nojiri and S. D. Odintsov, "Conformal anomaly from dS/CFT correspondence," Phys. Lett. B 519 (2001) 145 hep-th/0106191 [50] S. Nojiri and S. D. Odintsov, "Asymptotically de Sitter dilatonic space-time, holo-graphic RG flow and conformal anomaly from (dilatonic) dS/CFT correspondence," Phys. Lett. B 531 (2002) 143 hep-th/0201210 [51] R. G. Leigh and M. J. Strassler, "Exactly marginal operators and duality in four-dimensional N=1 supersymmetric gauge theory," Nucl. Phys. B 447 (1995) 95 hep-th/9503121 [52] S. Ferrara, C. Fronsdal and A. Zaffaroni, "On N = 8 supergravity on AdS(5) and N = 4 superconformal Yang-Mills theory," Nucl. Phys. B 532 (1998) 153 hep-th/9802203 [53] L. Andrianopoli and S. Ferrara, "K-K excitations on AdS(5) x S(5) Ann. Sci. N = 4 *pri-mary* superfields," Phys. Lett. B 430 (1998) 248 hep-th/9803171 [54] S. Ferrara, M. A. Lledo and A. Zaffaroni, "Born-Infeld corrections to D3 brane action in AdS(5) x S(5) and N = 4, d = 4 primary superfields," Phys. Rev. D 58 (1998) 105029 hep-th/9805082 [55] M. F. Sohnius, "Introducing Supersymmetry," Phys. Rep. 128 (1985) 39 [56] O. Aharony, M. Berkooz and E. Silverstein, "Multiple-trace operators and non-local string theories," J. High Energy Phys. 0108 (2001) 006 hep-th/0105309 [57] E. Witten, "Multi-trace operators, boundary conditions, and AdS/CFT correspon-dence," hep-th/0112258 [58] M. Berkooz, A. Sever and A. Shomer, "Double-trace deformations, boundary condi-tions and spacetime singularities," J. High Energy Phys. 0205 (2002) 034 hep-th/0112264 [59] S. Minwalla, "Restrictions imposed by superconformal invariance on quantum field theories," Adv. Theor. Math. Phys. 2 (1998) 781 hep-th/9712074 [60] M. Gunaydin, D. Minic, and M. Zagermann, "Novel supermultiplets of SU(2, 2|4) and the AdS5 / CFT4 duality," hep-th/9810226 [61] L. Andrianopoli and S. Ferrara, "K-K Excitations on AdS5 ×S5 Ann. Sci. N = 4 ‘Primary’ Superfields," Phys. Lett. B 430 (1998) 248 hep-th/9803171 [62] L. Andrianopoli and S. Ferrara, "Nonchiral’ Primary Superfields in the AdSd+1 / CFTd Correspondence," Lett. Math. Phys. 46 (1998) 265 hep-th/9807150 [63] S. Ferrara and A. Zaffaroni, "Bulk gauge fields in AdS supergravity and supersingle-tons," hep-th/9807090 [64] M. Gunaydin, D. Minic, and M. Zagermann, "4-D doubleton conformal theories, CPT and II B string on AdS5 × S5," Nucl. Phys. B 534 (1998) 96 hep-th/9806042 [65] L. Andrianopoli and S. Ferrara, "On Short and Long SU(2, 2/4) Multiplets in the AdS/CFT Correspondence," hep-th/9812067 [66] P. S. Howe, K. S. Stelle and P. K. Townsend, "Supercurrents," Nucl. Phys. B 192 (1981) 332 [67] P. S. Howe and P. C. West, "Operator product expansions in four-dimensional super-conformal field theories," Phys. Lett. B 389 (1996) 273 hep-th/9607060 [67] "Is N = 4 Yang-Mills theory soluble?," hep-th/9611074 [67] "Superconformal invariants and extended supersymmetry," Phys. Lett. B 400 (1997) 307 hep-th/9611075 [68] H. J. Kim, L. J. Romans and P. van Nieuwenhuizen, "The Mass Spectrum Of Chiral N=2 D = 10 Supergravity On S**5," Phys. Rev. D 32 (1985) 389 [69] M. Günaydin and N. Marcus, "The Spectrum Of The S**5 Compactification Of The Chiral N=2, D=10 Supergravity And The Unitary Supermultiplets Of U(2, 2/4)," Class. Quantum Gravity 2 (1985) L11 [70] V. A. Novikov, M. A. Shifman, A. I. Vainshtein and V. I. Zakharov, "Exact Gell-Mann-Low Function Of Supersymmetric Yang-Mills Theories From Instanton Cal-culus," Nucl. Phys. B 229 (1983) 381 [71] D. Anselmi, D. Z. Freedman, M. T. Grisaru and Astron. Astrophys. Johansen, "Nonperturbative formulas for central functions of supersymmetric gauge theories," Nucl. Phys. B 526 (1998) 543 hep-th/9708042 [72] A. Khavaev, K. Pilch and N. P. Warner, "New vacua of gauged N = 8 supergravity in five dimensions," Phys. Lett. B 487 (2000) 14 hep-th/9812035 [73] K. Pilch and N. P. Warner, "N = 1 supersymmetric renormalization group flows from ICFA Instrum. Bull. supergravity," Adv. Theor. Math. Phys. 4 (2002) 627 hep-th/0006066 [74] D. Berenstein, J. M. Maldacena and H. Nastase, "Strings in flat space and pp waves from N = 4 super Yang Mills," J. High Energy Phys. 0204 (2002) 013 hep-th/0202021 [75] M. Blau, J. Figueroa-O’Farrill, C. Hull and G. Papadopoulos, "A new maximally supersymmetric background of ICFA Instrum. Bull. superstring theory," J. High Energy Phys. 0201 (2002) 047 hep-th/0110242 [75] M. Blau, J. Figueroa-O’Farrill, C. Hull and G. Papadopoulos, "Pen-rose limits and maximal supersymmetry," Class. Quantum Gravity 19 (2002) L87 hep-th/0201081 [75] M. Blau, J. Figueroa-O’Farrill and G. Papadopoulos, "Penrose lim-its, supergravity and brane dynamics," Class. Quantum Gravity 19 (2002) 4753 hep-th/0202111 [76] R. R. Metsaev, "Type ICFA Instrum. Bull. Green-Schwarz superstring in plane wave Ramond-Ramond background," Nucl. Phys. B 625 (2002) 70 hep-th/0112044 [77] R. Corrado, N. Halmagyi, K. D. Kennaway and N. P. Warner, "Penrose limits of RG fixed points and pp-waves with background fluxes," hep-th/0205314 [77] E. G. Gi-mon, L. A. Pando Zayas and J. Sonnenschein, "Penrose limits and RG flows," hep-th/0206033 [77] D. Brecher, C. V. Johnson, K. J. Lovis and R. C. Myers, "Penrose limits, deformed pp-waves and the string duals of N = 1 large N gauge theory," J. High Energy Phys. 0210 (2002) 008 hep-th/0206045 [78] Y. Oz and T. Sakai, "Penrose limit and six-dimensional gauge theories," Phys. Lett. B 544 (2002) 321 hep-th/0207223 [78] ; etc. [79] A. B. Zamolodchikov, "Irreversibility’ Of The Flux Of The Renormalization Group In A 2-D Field Theory," JETP Lett. 43 (1986) 730 [79] [ Pis'ma Zh. Eksp. Teor. Fiz. 43 (1986) 565 [79] ]. [80] D. Anselmi, "Anomalies, unitarity, and quantum irreversibility," Ann. Phys. 276 (1999) 361 hep-th/9903059 [81] G. W. Gibbons and S. W. Hawking, "Action Integrals and Partition Functions in Quantum Gravity," Phys. Rev. D 15 (1977) 2752 [82] C. R. Graham and J. M. Lee, "Einstein Metrics with Prescribed Conformal Infinity on the Ball," Adv. Math. 87 (1991) 186 [83] M. Green, J. Schwarz and E. Witten, "Superstring Theory," Cambridge University Press, New York, 1987. [84] S. Nojiri and S. Odintsov, "Conformal Anomaly for Dilaton Coupled Theories from AdS/CFT Correspondence," Phys. Lett. B 444 (1998) 92 hep-th/9810008 [84] S. Nojiri, S. Odintsov and S. Ogushi, "Conformal Anomaly from d5 Gauged Super-gravity and c-function Away from Conformity," hep-th/9912191 [84] "Finite Action in d5 Gauged Supergravity and Dilatonic Conformal Anomaly for Dual Quantum Field Theory," hep-th/0001122 [85] A. Polyakov, Phys. Lett. B 103 (1981) 207 [85] 211; V. Knizhnik, A. Polyakov and A. Zamolodchikov, Mod. Phys. Lett. A 3 (1988) 819 [86] F. David, Mod. Phys. Lett. A 3 (1988) 1651 [86] J. Distler and H. Kawai, Nucl. Phys. B 321 (1989) 509 [87] N. Seiberg, "Notes on quantum Liouville theory and quantum gravity," Prog. Theor. Phys. Suppl. 102 (1990) 319 [88] R. Myer, Phys. Lett. B 199 (1987) 371 [89] A. Dhar and S. Wadia, "Noncritical strings, RG flows and holography," Nucl. Phys. B 590 (2000) 261 hep-th/0006043 [90] S. Nojiri and S. D. Odintsov, "Brane World Inflation Induced by Quantum Effects," Phys. Lett. B 484 (2000) 119 hep-th/0004097 [91] R. C. Myers, "Higher-derivative gravity, surface terms, and string theory," Phys. Rev. D 36 (1987) 392 [92] S. Nojiri and S. D. Odintsov, "Brane-World Cosmology in Higher Derivative Gravity or Warped Compactification in the Next-to-leading Order of AdS/CFT Correspon-dence," J. High Energy Phys. 0007 (2000) 049 hep-th/0006232 [92] S. Nojiri, S. D. Odintsov and S. Ogushi, "Dynamical Branes from Gravitational Dual of N = 2 Sp(N) Superconformal Field Theory," hep-th/0010004 [92] "Holographic Europhys. Newstropy and brane FRW-dynamics from AdS black hole in d5 higher derivative gravity," hep-th/0105117 [93] S. Nojiri and S. D. Odintsov, "On the conformal anomaly from higher derivative grav-ity in AdS/CFT correspondence," Int. J. Mod. Phys. A 15 (2000) 413 hep-th/9903033 [93] S. Nojiri and S. D. Odintsov, "Finite gravitational action for higher derivative and stringy gravity," Phys. Rev. D 62 (2000) 064018 hep-th/9911152 [94] J. Polchinski, "String Theory," Vol. II, Cambridge University Press, 1998. [95] S. Kamefuchi, L. O’Raifeartaigh and A. Salam, "Change Of Variables And Equiva-lence Theorems In Quantum Field Theories," Nucl. Phys. 28 (1961) 529 [96] D. J. Gross and E. Witten, "Superstring Modifications Of Einstein’s Equations," Nucl. Phys. B 277 (1986) 1 [97] J. I. Latorre and T. R. Morris, "Exact scheme independence," J. High Energy Phys. 0011 (2000) 004 hep-th/0008123 [98] M. Fukuma and S. Matsuura, "Comment on field redefinitions in the AdS/CFT cor-respondence," Prog. Theor. Phys. 108 (2002) 375 hep-th/0204257 [99] I. R. Klebanov and A. M. Polyakov, "AdS dual of the critical O(N) vector model," Phys. Lett. B 550 (2002) 213 hep-th/0210114 [100] A. M. Polyakov, "Gauge Fields Ann. Sci. Rings Of Glue," Nucl. Phys. B 164 (1980) 171 [101] Y. Makeenko and Astron. Astrophys. Migdal, "Quantum Chromodynamics Ann. Sci. Dynamics Of Loops," Nucl. Phys. B 188 (1981) 269 [102] A. M. Polyakov, "Confining strings," Nucl. Phys. B 486 (1997) 23 hep-th/9607049 [103] A. M. Polyakov, "String theory and quark confinement," Nucl. Phys. B, Proc. Suppl. 68 (1998) 1 hep-th/9711002 [104] A. M. Polyakov, "The wall of the cave," Int. J. Mod. Phys. A 14 (1999) 645 hep-th/9809057 [105] A. M. Polyakov and V. S. Rychkov, "Gauge fields - strings duality and the loop equation," Nucl. Phys. B 581 (2000) 116 hep-th/0002106 [106] A. M. Polyakov and V. S. Rychkov, "Loop dynamics and AdS/CFT correspondence," Nucl. Phys. B 594 (2001) 272 hep-th/0005173 [107] A. M. Polyakov, "String theory Ann. Sci. a universal language," Phys. At. Nucl. 64 (2001) 540 hep-th/0006132 [108] A. M. Polyakov, "Gauge fields and space-time," Int. J. Mod. Phys. A 17S : 1 (2002) 119, hep-th/0110196 [109] J. B. Kogut and L. Susskind, "Hamiltonian Formulation Of Wilson’s Lattice Gauge Theories," Phys. Rev. D 11 (1975) 395 [110] A. Santambrogio and D. Zanon, "Exact anomalous dimensions of N = 4 Yang-Mills operators with large R charge," Phys. Lett. B 545 (2002) 425 hep-th/0206079 [111] Y. Oz and T. Sakai, "Exact anomalous dimensions for N = 2 ADE SCFTs," hep-th/0208078 [112] S. R. Das, C. Gomez and S. J. Rey, "Penrose limit, spontaneous Symmetry break-ing and holography in pp-wave background," Phys. Rev. D 66 (2002) 046002 hep-th/0203164 [113] R. G. Leigh, K. Okuyama and M. Rozali, "PP-waves and holography," Phys. Rev. D 66 (2002) 046004 hep-th/0204026 [114] D. Berenstein and H. Nastase, "On lightcone string field theory from super Yang-Mills and holography," hep-th/0205048 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181368247-0-102-108-0-5 SzGeCERN 2373792CERCER hep-th/0304229 eng Barvinsky, A O Lebedev Physics Institute Nonlocal action for long-distance modifications of gravity theory 2003 28 Apr 2003 9 p We construct the covariant nonlocal action for recently suggested long-distance modifications of gravity theory motivated by the cosmological constant and cosmological acceleration problems. This construction is based on the special nonlocal form of the Einstein-Hilbert action explicitly revealing the fact that this action within the covariant curvature expansion begins with curvature-squared terms. LANL EDS SIS LANLPUBL2004 SIS:2004 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory 109-116 Phys. Lett. B 572 2003 http://cdsware.cern.ch/download/invenio-demo-site-files/0304229.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0304229.ps.gz barvin@lpi.ru n 200318 13 20060826 0015 CER01 20030429 PUBLIC 002373792CER ARTICLE [1] N.Arkani-Hamed, S.Dimopoulos, G.Dvali and G.Gabadadze, Nonlocal modifica-tion of gravity and the cosmological constant problem, hep-th/0209227 [2] S.Weinberg, Rev. Mod. Phys. 61 (1989) 1 [3] M.K.Parikh and S.N.Solodukhin, Phys. Lett. B 503 (2001) 384 hep-th/0012231 [4] A.O.Barvinsky and G.A.Vilkovisky, Nucl. Phys. B 282 (1987) 163 [5] A.O.Barvinsky and G.A.Vilkovisky, Nucl. Phys. B 333 (1990) 471 [6] A.O.Barvinsky, Yu.V.Gusev, G.A.Vilkovisky and V.V.Zhytnikov, J. Math. Phys. 35 (1994) 3525 [6] J. Math. Phys. 35 (1994) 3543 [7] A.Adams, J.McGreevy and E.Silverstein, hep-th/0209226 [8] R.Gregory, V.A.Rubakov and S.M.Sibiryakov, Phys. Rev. Lett. 84 (2000) 5928 hep-th/0002072 [9] G.Dvali, G.Gabadadze and M.Porrati, Phys. Rev. Lett. B : 485 (2000) 208, hep-th/0005016 [10] S.L.Dubovsky and V.A.Rubakov, Phys. Rev. D 67 (2003) 104014 hep-th/0212222 [11] A.O.Barvinsky, Phys. Rev. D 65 (2002) 062003 hep-th/0107244 [12] A.O.Barvinsky, A.Yu.Kamenshchik, A.Rathke and C.Kiefer, Phys. Rev. D 67 (2003) 023513 hep-th/0206188 [13] E.S. Fradkin and Astron. Astrophys. Tseytlin, Phys. Lett. B 104 (1981) 377 [13] A.O.Barvinsky and I.G.Avramidi, Phys. Lett. B 159 (1985) 269 [14] A.O.Barvinsky, A.Yu.Kamenshchik and I.P.Karmazin, Phys. Rev. D 48 (1993) 3677 gr-qc/9302007 [15] E.V.Gorbar and I.L.Shapiro, J. High Energy Phys. 0302 (2003) 021 [16] M.Porrati, Phys. Lett. B 534 (2002) 209 hep-th/0203014 [17] H. van Damm and M.J.Veltman, Nucl. Phys. D : 22 (1970) 397; V.I.Zakharov, JETP Lett. 12 (1970) 312 [17] M.Porrati, Phys. Lett. B 498 (2001) 92 hep-th/0011152 [18] A.O.Barvinsky, Yu.V.Gusev, V.F.Mukhanov and D.V.Nesterov, Nonperturbative late time asymptotics for heat kernel in gravity theory, hep-th/0306052 [19] A.Strominger, J. High Energy Phys. 0110 (2001) 034 hep-th/0106113 [19] J. High Energy Phys. 0111 (2001) 049 hep-th/0110087 [20] J.Schwinger, J. Math. Phys. 2 (1961) 407 [20] J.L.Buchbinder, E.S.Fradkin and D.M.Gitman, Fortschr. Phys. 29 (1981) 187 [20] R.D.Jordan, Phys. Rev. D 33 (1986) 444 [21] C.Deffayet, G.Dvali and G.Gabadadze, Phys. Rev. D 65 (2002) 044023 astro-ph/0105068 [22] G.Dvali, A.Gruzinov and M.Zaldarriaga, The accelerated Universe and the Moon, hep-ph/0212069 [23] M.E.Soussa and R.P.Woodard, A nonlocal metric formulation of MOND, astro-ph/0302030 [24] M.Milgrom, Astrophys. J. 270 (1983) 365 [24] Astrophys. J. 270 (1983) 371 [24] J.Bekenstein and M.Milgrom, Astrophys. J. 286 (1984) 7 [25] L.R.Abramo and R.P.Woodard, Phys. Rev. D 65 (2002) 063516 [25] V.K.Onemli and R.P.Woodard, Class. Quantum Gravity 19 (2002) 4607 gr-qc/0204065 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181372109-0-18-28-0-0 SzGeCERN hep-th/0307041 eng Witten, Edward Princeton University SL(2,Z) Action On Three-Dimensional Conformal Field Theories With Abelian Symmetry 2003 3 Jul 2003 24 p On the space of three-dimensional conformal field theories with U(1) symmetry and a chosen coupling to a background gauge field, there is a natural action of the group SL(2,Z). The generator S of SL(2,Z) acts by letting the background gauge field become dynamical, an operation considered recently by Kapustin and Strassler. The other generator T acts by shifting the Chern-Simons coupling of the background field. This SL(2,Z) action in three dimensions is related by the AdS/CFT correspondence to SL(2,Z) duality of low energy U(1) gauge fields in four dimensions. LANL EDS SzGeCERN Particle Physics - Theory PREPRINT LANL EDS High Energy Physics - Theory Witten, Edward http://cdsware.cern.ch/download/invenio-demo-site-files/0307041.pdf http://cdsware.cern.ch/download/invenio-demo-site-files/0307041.ps.gz witten@ias.edu n 200327 11 20061123 0917 CER01 20030704 PUBLIC 002385282CER PREPRINT [1] C. Burgess and B. P. Dolan, "Particle Vortex Duality And The Modular Group Applications To The Quantum Hall Effect And Other 2-D Systems," hep-th/0010246 [2] A. Shapere and F. Wilczek, "Self-Dual Models With Theta Terms," Nucl. Phys. B 320 (1989) 669 [3] S. J. Rey and A. Zee, "Self-Duality Of Three-Dimensional Chern-Simons Theory," Nucl. Phys. B 352 (1991) 897 [4] C. A. Lutken and G. G. Ross, "Duality In The Quantum Hall System," Phys. Rev. B 45 (1992) 11837 [4] Phys. Rev. B 48 (1993) 2500 [5] D.-H. Lee, S. Kivelson, and S.-C. Zhang, Phys. Lett. 68 (1992) 2386 [5] Phys. Rev. B 46 (1992) 2223 [6] C. A. Lutken, "Geometry Of Renormalization Group Flows Constrained By Discrete Global Symmetries," Nucl. Phys. B 396 (1993) 670 [7] B. P. Dolan, "Duality And The Modular Group In The Quantum Hall Effect," J. Phys. A 32 (1999) L243 cond-mat/9805171 [8] C. P. Burgess, R. Dib, and B. P. Dolan, Phys. Rev. B 62 (2000) 15359 cond-mat/9911476 [9] A. Zee, "Quantum Hall Fluids," cond-mat/9501022 [10] A. Kapustin and M. Strassler, "On Mirror Symmetry In Three Dimensional Abelian Gauge Theories," hep-th/9902033 [11] K. Intriligator and N. Seiberg, "Mirror Symmetry In Three-Dimensional Gauge The-ories," Phys. Lett. B 387 (1996) 512 hep-th/9607207 [12] J. Cardy and E. Rabinovici, "Phase Structure Of Z. Phys. Models In The Presence Of A Theta Parameter," Nucl. Phys. B 205 (1982) 1 [12] J. Cardy, "Duality And The Theta Parameter In Abelian Lattice Models," Nucl. Phys. B 205 (1982) 17 [13] C. Vafa and E. Witten, "A Strong Coupling Test Of S-Duality," Nucl. Phys. B 431 (1994) 3 hep-th/9408074 [14] E. Witten, "On S Duality In Abelian Gauge Theory," Selecta Mathematica : 1 (1995) 383, hep-th/9505186 [15] S. Deser, R. Jackiw, and S. Templeton, "Topologically Massive Gauge Theories," Ann. Phys. 140 (1982) 372 [16] E. Guadagnini, M. Martinelli, and M. Mintchev, "Scale-Invariant SIGMA, Symmetry Integrability Geom. Methods Appl. Models On Homogeneous Spaces," Phys. Lett. B 194 (1987) 69 [17] K. Bardacki, E. Rabinovici, and B. Saring, Nucl. Phys. B 299 (1988) 157 [18] D. Karabali, Q.-H. Park, H. J. Schnitzer, and Z. Yang, Phys. Lett. B 216 (1989) 307 [18] H. J. Schnitzer, Nucl. Phys. B 324 (1989) 412 [18] D. Karabali and H. J. Schnitzer, Nucl. Phys. B 329 (1990) 649 [19] T. Appelquist and R. D. Pisarski, "Hot Yang-Mills Theories And Three-Dimensional QCD," Phys. Rev. D 23 (1981) 2305 [20] R. Jackiw and S. Templeton, "How Superrenormalizable Interactions Cure Their In-frared Divergences," Phys. Rev. D 23 (1981) 2291 [21] S. Templeton, "Summation Of Dominant Coupling Constant Logarithms In QED In Three Dimensions," Phys. Lett. B 103 (1981) 134 [21] "Summation Of Coupling Constant Logarithms In QED In Three Dimensions," Phys. Rev. D 24 (1981) 3134 [22] T. Appelquist and U. W. Heinz,"Three-Dimensional O(N) Theories At Large Dis-tances," Phys. Rev. D 24 (1981) 2169 [23] D. Anselmi, "Large N Expansion, Conformal Field Theory, And Renormalization Group Flows In Three Dimensions," J. High Energy Phys. 0006 (2000) 042 hep-th/0005261 [24] V. Borokhov, A. Kapustin, and X. Wu, "Topological Disorder Operators In Three-Dimensional Conformal Field Theory," hep-th/0206054 [25] V. Borokhov, A. Kapustin, and X. Wu, "Monopole Operators And Mirror Symmetry In Three Dimensions," J. High Energy Phys. 0212 (2002) 044 hep-th/0207074 [26] P. Breitenlohner and D. Z. Freedman, "Stability In Gauged Extended Supergravity," Ann. Phys. 144 (1982) 249 [27] I. R. Klebanov and E. Witten, "AdS/CFT Correspondence And Symmetry Breaking," Nucl. Phys. B 536 (1998) 199 hep-th/9905104 [28] R. Jackiw, "Topological Investigations Of Quantized Gauge Theories," in Current Algebra And Anomalies, ed. S. B. Treiman et. al. (World-Scientific, 1985). [29] A. Schwarz, "The Partition Function Of A Degenerate Functional," Commun. Math. Phys. 67 (1979) 1 [30] M. Rocek and E. Verlinde, "Duality, Quotients, and Currents," Nucl. Phys. B 373 (1992) 630 hep-th/9110053 [31] S. Elitzur, G. Moore, A. Schwimmer, and N. Seiberg, "Remarks On The Canonical Quantization Of The Chern-Simons-Witten Theory," Nucl. Phys. B 326 (1989) 108 [32] E. Witten, "Quantum Field Theory And The Jones Polynomial," Commun. Math. Phys. 121 (1989) 351 [33] N. Redlich, "Parity Violation And Gauge Non-Invariance Of The Effective Gauge Field Action In Three Dimensions," Phys. Rev. D 29 (1984) 2366 [34] E. Witten, "Multi-Trace Operators, Boundary Conditions, and AdS/CFT Correspon-dence," hep-th/0112258 [35] M. Berkooz, A. Sever, and A. Shomer, "Double-trace Deformations, Boundary Con-ditions, and Spacetime Singularities," J. High Energy Phys. 05 (2002) 034 hep-th/0112264 [36] P. Minces, "Multi-trace Operators And The Generalized AdS/CFT Prescription," hep-th/0201172 [37] O. Aharony, M. Berkooz, and E. Silverstein, "Multiple Trace Operators And Non-Local String Theories," J. High Energy Phys. 08 (2001) 006 [38] V. K. Dobrev, "Intertwining Operator Realization Of The AdS/CFT Correspon-dence," Nucl. Phys. B 553 (1999) 559 hep-th/9812194 [39] I. R. Klebanov, "Touching Random Surfaces And Liouville Theory," Phys. Rev. D 51 (1995) 1836 hep-th/9407167 [39] I. R. Klebanov and A. Hashimoto, "Non-perturbative Solution Of Matrix Models Modified By Trace Squared Terms," Nucl. Phys. B 434 (1995) 264 hep-th/9409064 [40] S. Gubser and I. Mitra, "Double-trace Operators And One-Loop Vacuum Energy In AdS/CFT," hep-th/0210093 Phys.Rev. D67 (2003) 064018 [41] S. Gubser and I. R. Klebanov, "A Universal Result On Central Charges In The Presence Of Double-Trace Deformations," Nucl.Phys. B656 (2003) 23 hep-th/0212138 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181375069-0-21-38-0-1 SzGeCERN 20070403111954.0 hep-th/0402130 eng NYU-TH-2004-02-17 Dvali, G New York University Filtering Gravity: Modification at Large Distances? Infrared Modification of Gravity Preprint title 2005 New York, NY New York Univ. Dept. Phys. 17 Feb 2004 18 p In this lecture I address the issue of possible large distance modification of gravity and its observational consequences. Although, for the illustrative purposes we focus on a particular simple generally-covariant example, our conclusions are rather general and apply to large class of theories in which, already at the Newtonian level, gravity changes the regime at a certain very large crossover distance $r_c$. In such theories the cosmological evolution gets dramatically modified at the crossover scale, usually exhibiting a "self-accelerated" expansion, which can be differentiated from more conventional "dark energy" scenarios by precision cosmology. However, unlike the latter scenarios, theories of modified-gravity are extremely constrained (and potentially testable) by the precision gravitational measurements at much shorter scales. Despite the presence of extra polarizations of graviton, the theory is compatible with observations, since the naive perturbative expansion in Newton's constant breaks down at a certain intermediate scale. This happens because the extra polarizations have couplings singular in $1/r_c$. However, the correctly resummed non-linear solutions are regular and exhibit continuous Einsteinian limit. Contrary to the naive expectation, explicit examples indicate that the resummed solutions remain valid after the ultraviolet completion of the theory, with the loop corrections taken into account. LANL EDS SIS:200704 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory Dvali, Gia 92-98 Phys. Scr. Top. Issues T117 2005 http://cdsware.cern.ch/download/invenio-demo-site-files/0402130.pdf n 200408 13 20070425 1019 CER01 20040218 002414101 92-98 sigtuna20030814 PUBLIC 002426503CER ARTICLE [1] G. Dvali, G. Gabadadze and M. Porrati, Phys. Lett. B 485 (2000) 208 hep-th/0005016 [1] G. R. Dvali and G. Gabadadze, Phys. Rev. D 63 (2001) 065007 hep-th/0008054 [2] G. Dvali, G. Gabadadze, M. Kolanovic and F. Nitti, Phys. Rev. D 65 (2002) 024031 hep-ph/0106058 [3] G. Dvali, G. Gabadadze, M. Kolanovic and F. Nitti, Phys. Rev. D 64 (2001) 084004 hep-ph/0102216 [4] C. Deffayet, G. Dvali and G. Gabadadze, Phys. Rev. D 65 (2002) 044023 astro-ph/0105068 [5] C. Deffayet, Phys. Lett. B 502 (2001) 199 hep-th/0010186 [6] A. G. Riess et al. [Supernova Search Team Collaboration], Astron. J. 116 (1998) 1009 astro-ph/9805201 [6] S. Perlmutter et al. [Supernova Cosmology Project Collaboration], Astrophys. J. 517 (1999) 565 astro-ph/9812133 [7] G. Dvali and M. Turner, astro-ph/0301510 [8] H. van Dam and M. Veltman, Nucl. Phys. B 22 (1970) 397 [9] V. I. Zakharov, JETP Lett. 12 (1970) 312 [10] A. I. Vainshtein, Phys. Lett. B 39 (1972) 393 [11] C. Deffayet, G. Dvali, G. Gabadadze and A. I. Vainshtein, Phys. Rev. D 65 (2002) 044026 hep-th/0106001 [12] N. Arkani-Hamed, H. Georgi and M.D. Schwartz, Ann. Phys. 305 (2003) 96 hep-th/0210184 [13] D. G..Boulware and S. Deser., Phys. Rev. D 6 (1972) 3368 [14] G. Gabadadze and A. Gruzinov, Phys.Rev. D72 (2005) 124007 hep-th/0312074 [15] M. A. Luty, M. Porrati and R. Rattazzi, J. High Energy Phys. 0309 (2003) 029 hep-th/0303116 [16] A. Lue, Phys. Rev. D 66 (2002) 043509 hep-th/0111168 [17] A. Gruzinov, astro-ph/0112246 New Astron. 10 (2005) 311 [18] S. Corley, D.A.Lowe and S. Ramgoolam, J. High Energy Phys. 0107 (2001) 030 hep-th/0106067 [19] I. Antoniadis, R. Minasian and P. Vanhove, Nucl. Phys. B 648 (2003) 69 hep-th/0209030 [20] R. L. Davis, Phys. Rev. D 35 (1987) 3705 [21] G. Dvali, A. Gruzinov and M. Zaldarriaga, Phys. Rev. D 68 (2003) 024012 hep-ph/0212069 [22] A. Lue and G. Starkman, Phys. Rev. D 67 (2003) 064002 astro-ph/0212083 [23] E. Adelberger (2002). Private communication. [24] T. Damour, I. I. Kogan, A. Papazoglou, Phys. Rev. D 66 (2002) 104025 hep-th/0206044 [25] G. Dvali, G. Gabadadze and M. Shifman, Phys. Rev. D 67 (2003) 044020 hep-th/0202174 [26] A. Adams, J. McGreevy and E. Silverstein, hep-th/0209226 [27] N. Arkani-Hamed, S. Dimopoulos, G. Dvali and G. Gabadadze, hep-th/0209227 [28] S.M. Carrol, V. Duvvuri, M. Trodden and M.S. Turner, astro-ph/0306438 Phys.Rev. D70 (2004) 043528 [29] G.Gabadadze and M. Shifman, hep-th/0312289 Phys.Rev. D69 (2004) 124032 [30] M.Porrati and G. W. Rombouts, hep-th/0401211 Phys.Rev. D69 (2004) 122003 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181382391-0-26-23-0-2 SzGeCERN 20060124104603.0 hep-th/0501145 eng Durin, B LPTHE Closed strings in Misner space: a toy model for a Big Bounce ? 2005 19 Jan 2005 Misner space, also known as the Lorentzian orbifold $R^{1,1}/boost$, is one of the simplest examples of a cosmological singularity in string theory. In this lecture, we review the semi-classical propagation of closed strings in this background, with a particular emphasis on the twisted sectors of the orbifold. Tree-level scattering amplitudes and the one-loop vacuum amplitude are also discussed. LANL EDS SIS LANLPUBL2006 SIS:2006 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory Pioline, B Durin, Bruno Pioline, Boris http://cdsware.cern.ch/download/invenio-demo-site-files/0501145.pdf LPTHE LPTHE, Lptens n 200503 13 20061202 0008 CER01 20050120 002424942 177 cargese20040607 PUBLIC 002503681CER ARTICLE [1] S. Lem, "The Seventh Voyage", in The Star Diaries, Varsaw 1971, english translation New York, 1976. [2] A. Borde and A. Vilenkin, "Eternal inflation and the initial singu-larity," Phys. Rev. Lett. 72 (1994) 3305 gr-qc/9312022 [3] C. W. Misner, in Relativity Theory and Astrophysics I Relativity and Cosmology, edited by J. Ehlers, Lectures in Applied Mathe-matics, Vol. 8 (American Electron. Res. Announc. Am. Math. Soc., Providence, 1967), p. 160. [4] M. Berkooz, and B. Pioline, "Strings in an electric field, and the Milne universe," J. Cosmol. Astropart. Phys. 0311 (2003) 007 hep-th/0307280 [5] M. Berkooz, B. Pioline and M. Rozali, "Closed strings in Mis-ner space Cosmological production of winding strings," J. Cosmol. Astropart. Phys. 07 (2004) 003 hep-th/0405126 [6] JCAP 0410 (2004) 002 M. Berkooz, B. Durin, B. Pioline and D. Reichmann, "Closed strings in Misner space Stringy fuzziness with a twist," arXiv hep-th/0407216 [7] G. T. Horowitz and A. R. Steif, "Singular String Solutions With Nonsingular Initial Data," Phys. Lett. B 258 (1991) 91 [8] J. Khoury, B. A. Ovrut, N. Seiberg, P. J. Steinhardt and N. Turok, "From big crunch to big bang," Phys. Rev. D 65 (2002) 086007 hep-th/0108187 [9] Surveys High Energ.Phys. 17 (2002) 115 N. A. Nekrasov, "Milne universe, tachyons, and quantum group," arXiv hep-th/0203112 [10] V. Balasubramanian, S. F. Hassan, E. Keski-Vakkuri and A. Naqvi, "A space-time orbifold A toy model for a cosmological singu-larity," Phys. Rev. D 67 (2003) 026003 hep-th/0202187 [10] R. Biswas, E. Keski-Vakkuri, R. G. Leigh, S. Nowling and E. Sharpe, "The taming of closed time-like curves," J. High Energy Phys. 0401 (2004) 064 hep-th/0304241 [11] I. Antoniadis, C. Bachas, J. R. Ellis and D. V. Nanopoulos, "Cosmo-logical String Theories And Discrete Inflation," Phys. Lett. B 211 (1988) 393 [11] I. Antoniadis, C. Bachas, J. R. Ellis and D. V. Nanopou-los, "An Expanding Universe In String Theory," Nucl. Phys. B 328 (1989) 117 [11] I. Antoniadis, C. Bachas, J. R. Ellis and D. V. Nanopou-los, "Comments On Cosmological String Solutions," Phys. Lett. B 257 (1991) 278 [12] C. R. Nappi and E. Witten, "A Closed, expanding universe in string theory," Phys. Lett. B 293 (1992) 309 hep-th/9206078 [13] C. Kounnas and D. Lust, "Cosmological string backgrounds from gauged WZW models," Phys. Lett. B 289 (1992) 56 hep-th/9205046 [14] E. Kiritsis and C. Kounnas, "Dynamical Topology change in string theory," Phys. Lett. B 331 (1994) 51 hep-th/9404092 [15] S. Elitzur, A. Giveon, D. Kutasov and E. Rabinovici, "From big bang to big crunch and beyond," J. High Energy Phys. 0206 (2002) 017 hep-th/0204189 [15] S. Elitzur, A. Giveon and E. Rabinovici, "Removing singularities," J. High Energy Phys. 0301 (2003) 017 hep-th/0212242 [16] L. Cornalba and M. S. Costa, "A New Cosmological Scenario in String Theory," Phys. Rev. D 66 (2002) 066001 hep-th/0203031 [16] L. Cornalba, M. S. Costa and C. Kounnas, "A res-olution of the cosmological singularity with orientifolds," Nucl. Phys. B 637 (2002) 378 hep-th/0204261 [16] L. Cornalba and M. S. Costa, "On the classical stability of orientifold cosmologies," Class. Quantum Gravity 20 (2003) 3969 hep-th/0302137 [17] B. Craps, D. Kutasov and G. Rajesh, "String propagation in the presence of cosmological singularities," J. High Energy Phys. 0206 (2002) 053 hep-th/0205101 [17] B. Craps and B. A. Ovrut, "Global fluc-tuation spectra in big crunch / big bang string vacua," Phys. Rev. D 69 (2004) 066001 hep-th/0308057 [18] E. Dudas, J. Mourad and C. Timirgaziu, "Time and space depen-dent backgrounds from nonsupersymmetric strings," Nucl. Phys. B 660 (2003) 3 hep-th/0209176 [19] L. Cornalba and M. S. Costa, "Time-dependent orbifolds and string cosmology," Fortschr. Phys. 52 (2004) 145 hep-th/0310099 [20] Phys.Rev. D70 (2004) 126011 C. V. Johnson and H. G. Svendsen, "An exact string theory model of closed time-like curves and cosmological singularities," arXiv hep-th/0405141 [21] N. Toumbas and J. Troost, "A time-dependent brane in a cosmolog-ical background," J. High Energy Phys. 0411 (2004) 032 hep-th/0410007 [22] W. A. Hiscock and D. A. Konkowski, "Quantum Vacuum Energy In Taub - Nut (Newman-Unti-Tamburino) Type Cosmologies," Phys. Rev. D 26 (1982) 1225 [23] A. H. Taub, "Empty Space-Times Admitting A Three Parameter Group Of Motions," Ann. Math. 53 (1951) 472 [23] E. Newman, L. Tamburino and T. Unti, "Empty Space Generalization Of The Schwarzschild Metric," J. Math. Phys. 4 (1963) 915 [24] J. G. Russo, "Cosmological string models from Milne spaces and SL(2,Z) orbifold," arXiv hep-th/0305032 [25] Mod.Phys.Lett. A19 (2004) 421 J. R. I. Gott, "Closed Timelike Curves Produced By Pairs Of Mov-ing Cosmic Strings Exact Solutions," Phys. Rev. Lett. 66 (1991) 1126 [25] J. D. Grant, "Cosmic strings and chronology protection," Phys. Rev. D 47 (1993) 2388 hep-th/9209102 [26] S. W. Hawking, "The Chronology protection conjecture," Phys. Rev. D 46 (1992) 603 [27] Commun.Math.Phys. 256 (2005) 491 D. Kutasov, J. Marklof and G. W. Moore, "Melvin Models and Diophantine Approximation," arXiv hep-th/0407150 [28] C. Gabriel and P. Spindel, "Quantum charged fields in Rindler space," Ann. Phys. 284 (2000) 263 gr-qc/9912016 [29] N. Turok, M. Perry and P. J. Steinhardt, "M theory model of a big crunch / big bang transition," Phys. Rev. D 70 (2004) 106004 hep-th/0408083 [30] C. Bachas and M. Porrati, "Pair Creation Of Open Strings In An Electric Field," Phys. Lett. B 296 (1992) 77 hep-th/9209032 [31] J. M. Maldacena, H. Ooguri and J. Son, "Strings in AdS(3) and the SL(2,R) WZW model. II Euclidean black hole," J. Math. Phys. 42 (2001) 2961 hep-th/0005183 [32] M. Berkooz, B. Craps, D. Kutasov and G. Rajesh, "Comments on cosmological singularities in string theory," arXiv hep-th/0212215 [33] D. J. Gross and P. F. Mende, "The High-Energy Behavior Of String Scattering Amplitudes," Phys. Lett. B 197 (1987) 129 [34] H. Liu, G. Moore and N. Seiberg, "Strings in a time-dependent orbifold," J. High Energy Phys. 0206 (2002) 045 hep-th/0204168 [34] H. Liu, G. Moore and N. Seiberg, "Strings in time-dependent orbifolds," J. High Energy Phys. 0210 (2002) 031 hep-th/0206182 [35] D. Amati, M. Ciafaloni and G. Veneziano, "Class. Quantum Gravity Effects From Planckian Energy Superstring Collisions," Int. J. Mod. Phys. A 3 (1988) 1615 [36] G. T. Horowitz and J. Polchinski, "Instability of spacelike and null orbifold singularities," Phys. Rev. D 66 (2002) 103512 hep-th/0206228 [37] C. R. Nappi and E. Witten, "A WZW model based on a non-semisimple group," Phys. Rev. Lett. 71 (1993) 3751 hep-th/9310112 [38] D. I. Olive, E. Rabinovici and A. Schwimmer, "A Class of string backgrounds Ann. Sci. a semiclassical limit of WZW models," Phys. Lett. B 321 (1994) 361 hep-th/9311081 [39] E. Kiritsis and C. Kounnas, "String Propagation In Gravitational Wave Backgrounds," Phys. Lett. B 320 (1994) 264 [39] [Addendum- Phys. Lett. B 325 (1994) 536 hep-th/9310202 [39] E. Kiritsis, C. Koun-nas and D. Lust, "Superstring gravitational wave backgrounds with space-time supersymmetry," Phys. Lett. B 331 (1994) 321 hep-th/9404114 [40] E. Kiritsis and B. Pioline, "Strings in homogeneous gravitational waves and null holography," J. High Energy Phys. 0208 (2002) 048 hep-th/0204004 [41] Nucl.Phys. B674 (2003) 80 G. D’Appollonio and E. Kiritsis, "String interactions in gravita-tional wave backgrounds," arXiv hep-th/0305081 [42] Y. K. Cheung, L. Freidel and K. Savvidy, "Strings in gravimagnetic fields," J. High Energy Phys. 0402 (2004) 054 hep-th/0309005 [43] O. Aharony, M. Berkooz and E. Silverstein, "Multiple-trace op-erators and non-local string theories," J. High Energy Phys. 0108 (2001) 006 hep-th/0105309 [43] M. Berkooz, A. Sever and A. Shomer, "Double-trace deformations, boundary conditions and spacetime singularities," J. High Energy Phys. 0205 (2002) 034 hep-th/0112264 [43] E. Witten, "Multi-trace operators, boundary conditions, and AdS/CFT correspondence," arXiv hep-th/0112258 [44] T. Damour, M. Henneaux and H. Nicolai, "Cosmological billiards," Class. Quantum Gravity 20 (2003) R145 hep-th/0212256 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181394655-0-44-48-0-2 SzGeCERN 20060713170102.0 hep-th/0606038 eng DESY-06-083 DESY-2006-083 Papadimitriou, I DESY Non-Supersymmetric Membrane Flows from Fake Supergravity and Multi-Trace Deformations 2007 Hamburg DESY 5 Jun 2006 45 p We use fake supergravity as a solution generating technique to obtain a continuum of non-supersymmetric asymptotically $AdS_4\times S^7$ domain wall solutions of eleven-dimensional supergravity with non-trivial scalars in the $SL(8,\mathbb{R})/SO(8)$ coset. These solutions are continuously connected to the supersymmetric domain walls describing a uniform sector of the Coulomb branch of the $M2$-brane theory. We also provide a general argument that identifies the fake superpotential with the exact large-N quantum effective potential of the dual theory, thus arriving at a very general description of multi-trace deformations in the AdS/CFT correspondence, which strongly motivates further study of fake supergravity as a solution generating method. This identification allows us to interpret our non-supersymmetric solutions as a family of marginal triple-trace deformations of the Coulomb branch that completely break supersymmetry and to calculate the exact large-N anomalous dimensions of the operators involved. The holographic one- and two-point functions for these solutions are also computed. LANL EDS SIS JHEP2007 SIS:200703 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory Papadimitriou, Ioannis 008 J. High Energy Phys. 02 2007 http://cdsware.cern.ch/download/invenio-demo-site-files/0606038.pdf n 200623 13 20070307 2032 CER01 20060607 PUBLIC 002623855CER ARTICLE [1] M. Cvetic and H. H. Soleng, "Supergravity domain walls," Phys. Rep. 282 (1997) 159 hep-th/9604090 [2] D. Z. Freedman, C. Nunez, M. Schnabl and K. Skenderis, "Fake supergravity and domain wall stability," Phys. Rev. D 69 (2004) 104027 hep-th/0312055 [3] A. Celi, A. Ceresole, G. Dall’Agata, A. Van Proeyen and M. Zagermann, "On the fakeness of fake supergravity," Phys. Rev. D 71 (2005) 045009 hep-th/0410126 [4] K. Skenderis and P. K. Townsend, "Gravitational stability and renormalization-group flow," Phys. Lett. B 468 (1999) 46 hep-th/9909070 [5] I. Bakas, A. Brandhuber and K. Sfetsos, "Domain walls of gauged supergravity, M-branes, and algebraic curves," Adv. Theor. Math. Phys. 3 (1999) 1657 hep-th/9912132 [6] M. Zagermann, "N = 4 fake supergravity," Phys. Rev. D 71 (2005) 125007 hep-th/0412081 [7] K. Skenderis and P. K. Townsend, "Hidden supersymmetry of domain walls and cosmologies," arXiv hep-th/0602260 [8] K. Skenderis, private communication. [9] P. K. Townsend, "Positive Energy And The Scalar Potential In Higher Dimensional (Super)Gravity Theories," Phys. Lett. B 148 (1984) 55 [10] O. DeWolfe, D. Z. Freedman, S. S. Gubser and A. Karch, "Modeling the fifth dimension with scalars and gravity," Phys. Rev. D 62 (2000) 046008 hep-th/9909134 [11] S. S. Gubser, "Curvature singularities The good, the bad, and the naked," Adv. Theor. Math. Phys. 4 (2002) 679 hep-th/0002160 [12] I. Papadimitriou and K. Skenderis, "AdS / CFT correspondence and geometry," arXiv hep-th/0404176 [13] V. L. Campos, G. Ferretti, H. Larsson, D. Martelli and B. E. W. Nilsson, "A study of holographic renormalization group flows in d = 6 and d = 3," J. High Energy Phys. 0006 (2000) 023 hep-th/0003151 [14] M. Cvetic, S. S. Gubser, H. Lu and C. N. Pope, "Symmetric potentials of gauged supergravities in diverse dimensions and Coulomb branch of gauge theories," Phys. Rev. D 62 (2000) 086003 hep-th/9909121 [15] M. Cvetic, H. Lu, C. N. Pope and A. Sadrzadeh, "Consistency of Kaluza-Klein sphere reductions of symmetric potentials," Phys. Rev. D 62 (2000) 046005 hep-th/0002056 [16] P. Kraus, F. Larsen and S. P. Trivedi, "The Coulomb branch of gauge theory from rotating branes," J. High Energy Phys. 9903 (1999) 003 hep-th/9811120 [17] D. Z. Freedman, S. S. Gubser, K. Pilch and N. P. Warner, "Continuous distributions of D3-branes and gauged supergravity," J. High Energy Phys. 0007 (2000) 038 hep-th/9906194 [18] I. Bakas and K. Sfetsos, "States and curves of five-dimensional gauged supergravity," Nucl. Phys. B 573 (2000) 768 hep-th/9909041 [19] C. Martinez, R. Troncoso and J. Zanelli, "Exact black hole solution with a minimally coupled scalar field," Phys. Rev. D 70 (2004) 084035 hep-th/0406111 [20] B. de Wit and H. Nicolai, "The Consistency Of The S7 Truncation In D = 11 Supergravity," Nucl. Phys. B 281 (1987) 211 [21] H. Nastase, D. Vaman and P. van Nieuwenhuizen, "Consistent Nonlinearity K K reduction of 11d supergravity on AdS7 × S4 and self-duality in odd dimensions," Phys. Lett. B 469 (1999) 96 hep-th/9905075 [22] H. Nastase, D. Vaman and P. van Nieuwenhuizen, "Consistency of the AdS7 × S4 reduction and the origin of self-duality in odd dimensions," Nucl. Phys. B 581 (2000) 179 hep-th/9911238 [23] P. Breitenlohner and D. Z. Freedman, "Stability In Gauged Extended Supergravity," Ann. Phys. 144 (1982) 249 [24] I. R. Klebanov and E. Witten, "AdS/CFT correspondence and Symmetry breaking," Nucl. Phys. B 556 (1999) 89 hep-th/9905104 [25] Dr. E. Kamke, Differentialgleichungen Lösungsmethoden und Lösungen, Chelsea Publishing Company, 1971. [26] E. S. Cheb-Terrab and A. D. Roche, "Abel ODEs Equivalence and Integrable Classes," Comput. Phys. Commun. 130, Issues 1- : 2 (2000) 204 [arXiv math-ph/0001037 [26] E. S. Cheb-Terrab and A. D. Roche, "An Abel ordinary differential equation class generalizing known integrable classes," European J. Appl. Math. 14 (2003) 217 math.GM/0002059 [26] V. M. Boyko, "Symmetry, Equivalence and Integrable Classes of Abel’s Equations," Proceedings of the Institute of Mathematics of the NAS of Ukraine 50, Part : 1 (2004) 47 [arXiv nlin.SI/0404020 [27] M. J. Duff and J. T. Liu, "Anti-de Sitter black holes in gauged N = 8 supergravity," Nucl. Phys. B 554 (1999) 237 hep-th/9901149 [28] M. Cvetic et al., "Embedding AdS black holes in ten and eleven dimensions," Nucl. Phys. B 558 (1999) 96 hep-th/9903214 [29] J. de Boer, E. P. Verlinde and H. L. Verlinde, "On the holographic renormalization group," J. High Energy Phys. 0008 (2000) 003 hep-th/9912012 [30] M. Bianchi, D. Z. Freedman and K. Skenderis, "How to go with an RG flow," J. High Energy Phys. 0108 (2001) 041 hep-th/0105276 [31] I. Papadimitriou and K. Skenderis, "Correlation functions in holographic RG flows," J. High Energy Phys. 0410 (2004) 075 hep-th/0407071 [32] M. Henningson and K. Skenderis, "The holographic Weyl anomaly," J. High Energy Phys. 9807 (1998) 023 hep-th/9806087 [33] V. Balasubramanian and P. Kraus, "A stress tensor for anti-de Sitter gravity," Commun. Math. Phys. 208 (1999) 413 hep-th/9902121 [34] P. Kraus, F. Larsen and R. Siebelink, "The gravitational action in asymptotically AdS and flat spacetimes," Nucl. Phys. B 563 (1999) 259 hep-th/9906127 [35] S. de Haro, S. N. Solodukhin and K. Skenderis, "Holographic reconstruction of spacetime and renormalization in the AdS/CFT correspondence," Commun. Math. Phys. 217 (2001) 595 hep-th/0002230 [36] M. Bianchi, D. Z. Freedman and K. Skenderis, "Holographic renormalization," Nucl. Phys. B 631 (2002) 159 hep-th/0112119 [37] D. Martelli and W. Muck, "Holographic renormalization and Ward identities with the Hamilton-Jacobi method," Nucl. Phys. B 654 (2003) 248 hep-th/0205061 [38] K. Skenderis, "Lecture notes on holographic renormalization," Class. Quantum Gravity 19 (2002) 5849 hep-th/0209067 [39] D. Z. Freedman, S. D. Mathur, A. Matusis and L. Rastelli, "Correlation functions in the CFT(d)/AdS(d + 1) correspondence," Nucl. Phys. B 546 (1999) 96 hep-th/9804058 [40] O. DeWolfe and D. Z. Freedman, "Notes on fluctuations and correlation functions in holographic renormalization group flows," arXiv hep-th/0002226 [41] W. Muck, "Correlation functions in holographic renormalization group flows," Nucl. Phys. B 620 (2002) 477 hep-th/0105270 [42] M. Bianchi, M. Prisco and W. Muck, "New results on holographic three-point functions," J. High Energy Phys. 0311 (2003) 052 hep-th/0310129 [43] E. Witten, "Multi-trace operators, boundary conditions, and AdS/CFT correspondence," arXiv hep-th/0112258 [44] M. Berkooz, A. Sever and A. Shomer, "Double-trace deformations, boundary conditions and spacetime singularities," J. High Energy Phys. 0205 (2002) 034 hep-th/0112264 [45] W. Muck, "An improved correspondence formula for AdS/CFT with multi-trace operators," Phys. Lett. B 531 (2002) 301 hep-th/0201100 [46] P. Minces, "Multi-trace operators and the generalized AdS/CFT prescription," Phys. Rev. D 68 (2003) 024027 hep-th/0201172 [47] A. Sever and A. Shomer, "A note on multi-trace deformations and AdS/CFT," J. High Energy Phys. 0207 (2002) 027 hep-th/0203168 [48] S. S. Gubser and I. R. Klebanov, "A universal result on central charges in the presence of double-trace deformations," Nucl. Phys. B 656 (2003) 23 hep-th/0212138 [49] O. Aharony, M. Berkooz and B. Katz, "Non-local effects of multi-trace deformations in the AdS/CFT correspondence," J. High Energy Phys. 0510 (2005) 097 hep-th/0504177 [50] S. Elitzur, A. Giveon, M. Porrati and E. Rabinovici, "Multitrace deformations of vector and adjoint theories and their holographic duals," J. High Energy Phys. 0602 (2006) 006 hep-th/0511061 [51] R. Corrado, K. Pilch and N. P. Warner, "An N = 2 supersymmetric membrane flow," Nucl. Phys. B 629 (2002) 74 hep-th/0107220 [52] T. Hertog and K. Maeda, "Black holes with scalar hair and asymptotics in N = 8 supergravity," J. High Energy Phys. 0407 (2004) 051 hep-th/0404261 [53] T. Hertog and G. T. Horowitz, "Towards a big crunch dual," J. High Energy Phys. 0407 (2004) 073 hep-th/0406134 [54] T. Hertog and G. T. Horowitz, "Designer gravity and field theory effective potentials," Phys. Rev. Lett. 94 (2005) 221301 hep-th/0412169 [55] T. Hertog and G. T. Horowitz, "Holographic description of AdS cosmologies," J. High Energy Phys. 0504 (2005) 005 hep-th/0503071 [56] S. de Haro, I. Papadimitriou and A. C. Petkou, "Conformally coupled scalars, instantons and Vacuum instability in AdS(4)," [arXiv hep-th/0611315 CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181414470-0-53-49-0-2 SzGeCERN 20060616163757.0 hep-th/0606096 eng UTHET-2006-05-01 Koutsoumbas, G National Technical University of Athens Quasi-normal Modes of Electromagnetic Perturbations of Four-Dimensional Topological Black Holes with Scalar Hair 2006 10 Jun 2006 17 p We study the perturbative behaviour of topological black holes with scalar hair. We calculate both analytically and numerically the quasi-normal modes of the electromagnetic perturbations. In the case of small black holes we find clear evidence of a second-order phase transition of a topological black hole to a hairy configuration. We also find evidence of a second-order phase transition of the AdS vacuum solution to a topological black hole. LANL EDS SIS JHEP2007 SIS:200702 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE LANL EDS High Energy Physics - Theory Musiri, S Papantonopoulos, E Siopsis, G Koutsoumbas, George Musiri, Suphot Papantonopoulos, Eleftherios Siopsis, George 006 J. High Energy Phys. 10 2006 http://cdsware.cern.ch/download/invenio-demo-site-files/0606096.pdf n 200624 13 20070425 1021 CER01 20060613 PUBLIC 002628325CER ARTICLE [1] K. D. Kokkotas and B. G. Schmidt, Living Rev. Relativ. 2 (1999) 2 gr-qc/9909058 [2] H.-P. Nollert, Class. Quantum Gravity 16 (1999) R159 [3] J. S. F. Chan and R. B. Mann, Phys. Rev. D 55 (1997) 7546 gr-qc/9612026 [3] Phys. Rev. D 59 (1999) 064025 [4] G. T. Horowitz and V. E. Hubeny, Phys. Rev. D 62 (2000) 024027 hep-th/9909056 [5] V. Cardoso and J. P. S. Lemos, Phys. Rev. D 64 (2001) 084017 gr-qc/0105103 [6] B. Wang, C. Y. Lin and E. Abdalla, Phys. Lett. B 481 (2000) 79 hep-th/0003295 [7] E. Berti and K. D. Kokkotas, Phys. Rev. D 67 (2003) 064020 gr-qc/0301052 [8] F. Mellor and I. Moss, Phys. Rev. D 41 (1990) 403 [9] C. Martinez and J. Zanelli, Phys. Rev. D 54 (1996) 3830 gr-qc/9604021 [10] M. Henneaux, C. Martinez, R. Troncoso and J. Zanelli, Phys. Rev. D 65 (2002) 104007 hep-th/0201170 [11] C. Martinez, R. Troncoso and J. Zanelli, Phys. Rev. D 67 (2003) 024008 hep-th/0205319 [12] N. Bocharova, K. Bronnikov and V. Melnikov, Vestn. Mosk. Univ. Fizika Astronomy 6 (1970) 706 [12] J. D. Bekenstein, Ann. Phys. 82 (1974) 535 [12] Ann. Phys. 91 (1975) 75 [13] T. Torii, K. Maeda and M. Narita, Phys. Rev. D 64 (2001) 044007 [14] E. Winstanley, Found. Phys. 33 (2003) 111 gr-qc/0205092 [15] T. Hertog and K. Maeda, J. High Energy Phys. 0407 (2004) 051 hep-th/0404261 [16] J. P. S. Lemos, Phys. Lett. B 353 (1995) 46 gr-qc/9404041 [17] R. B. Mann, Class. Quantum Gravity 14 (1997) L109 gr-qc/9607071 [17] R. B. Mann, Nucl. Phys. B 516 (1998) 357 hep-th/9705223 [18] L. Vanzo, Phys. Rev. D 56 (1997) 6475 gr-qc/9705004 [19] D. R. Brill, J. Louko and P. Peldan, Phys. Rev. D 56 (1997) 3600 gr-qc/9705012 [20] D. Birmingham, Class. Quantum Gravity 16 (1999) 1197 hep-th/9808032 [21] R. G. Cai and K. S. Soh, Phys. Rev. D 59 (1999) 044013 gr-qc/9808067 [22] Phys.Rev. D65 (2002) 084006 B. Wang, E. Abdalla and R. B. Mann, [arXiv hep-th/0107243 [23] Phys.Rev. D65 (2002) 084006 R. B. Mann, [arXiv gr-qc/9709039 [24] J. Crisostomo, R. Troncoso and J. Zanelli, Phys. Rev. D 62 (2000) 084013 hep-th/0003271 [25] R. Aros, R. Troncoso and J. Zanelli, Phys. Rev. D 63 (2001) 084015 hep-th/0011097 [26] R. G. Cai, Y. S. Myung and Y. Z. Zhang, Phys. Rev. D 65 (2002) 084019 hep-th/0110234 [27] M. H. Dehghani, Phys. Rev. D 70 (2004) 064019 hep-th/0405206 [28] C. Martinez, R. Troncoso and J. Zanelli, Phys. Rev. D 70 (2004) 084035 hep-th/0406111 [29] Phys.Rev. D74 (2006) 044028 C. Martinez, J. P. Staforelli and R. Troncoso, [arXiv hep-th/0512022 [29] C. Martinez and R. Troncoso, [arXiv Phys.Rev. D74 (2006) 064007 hep-th/0606130 [30] E. Winstanley, Class. Quantum Gravity 22 (2005) 2233 gr-qc/0501096 [30] E. Radu and E. Win-stanley, Phys. Rev. D 72 (2005) 024017 gr-qc/0503095 [30] A. M. Barlow, D. Doherty and E. Winstanley, Phys. Rev. D 72 (2005) 024008 gr-qc/0504087 [31] I. Papadimitriou, [arXiv JHEP 0702 (2007) 008 hep-th/0606038 [32] P. Breitenlohner and D. Z. Freedman, Phys. Lett. B 115 (1982) 197 [32] Ann. Phys. 144 (1982) 249 [33] L. Mezincescu and P. K. Townsend, Ann. Phys. 160 (1985) 406 [34] V. Cardoso, J. Natario and R. Schiappa, J. Math. Phys. 45 (2004) 4698 hep-th/0403132 [35] J. Natario and R. Schiappa, Adv. Theor. Math. Phys. 8 (2004) 1001 hep-th/0411267 [36] S. Musiri, S. Ness and G. Siopsis, Phys. Rev. D 73 (2006) 064001 hep-th/0511113 [37] L. Motl and A. Neitzke, Adv. Theor. Math. Phys. 7 (2003) 307 hep-th/0301173 [38] Astron. J. M. Medved, D. Martin and M. Visser, Class. Quantum Gravity 21 (2004) 2393 gr-qc/0310097 [39] W.-H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery in Numerical Recipies (Cambridge University Press, Cambridge, England, 1992). [40] G. Koutsoumbas, S. Musiri, E. Papantonopoulos and G. Siopsis, in preparation. CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181414732-0-36-41-0-2 SzGeCERN hep-th/0703265 eng IGPG-07-3-4 Alexander, S The Pennsylvania State University A new PPN parameter to test Chern-Simons gravity 2007 28 Mar 2007 4 p We study Chern-Simons (CS) gravity in the parameterized post-Newtonian (PPN) framework through weak-field solutions of the modified field equations for a perfect fluid source. We discover that CS gravity possesses the same PPN parameters as general relativity, except for the inclusion of a new term, proportional both to the CS coupling parameter and the curl of the PPN vector potentials. This new term encodes the key physical effect of CS gravity in the weak-field limit, leading to a modification of frame dragging and, thus, the Lense-Thirring contribution to gyroscopic precession. We provide a physical interpretation for the new term, as well as an estimate of the size of this effect relative to the general relativistic Lense-Thirring prediction. This correction to frame dragging might be used in experiments, such as Gravity Probe B and lunar ranging, to place bounds on the CS coupling parameter, as well as other intrinsic parameters of string theory. LANL EDS SzGeCERN Particle Physics - Theory PREPRINT LANL EDS High Energy Physics - Theory Yunes, N Alexander, Stephon Yunes, Nicolas Phys. Rev. Lett. http://cdsware.cern.ch/download/invenio-demo-site-files/0703265.pdf yunes@gravity.psu.edu> Uploader Engine <uploader@sundh99.cern.ch n 200713 11 20070417 2012 CER01 20070330 PUBLIC 002685163CER PREPRINT [1] J. Polchinski, String theory. Vol. 2 Superstring theory and beyond (Cambridge University Press, Cambridge, UK, 1998). [2] S. H. S. Alexander, M. E. Peskin, and M. M. Sheik-Jabbari, Phys. Rev. Lett. 96 (2006) 081301 [3] A. Lue, L.-M. Wang, and M. Kamionkowski, Phys. Rev. Lett. 83 (1999) 1506 astro-ph/9812088 [4] C. M. Will, Theory and experiment in gravitational Physics (Cambridge Univ. Press, Cambridge, UK, 1993). [5] C. M. Will, Phys. Rev. D 57 (1998) 2061 gr-qc/9709011 [6] C. M. Will and N. Yunes, Class. Quantum Gravity 21 (2004) 4367 [7] E. Berti, A. Buonanno, and C. M. Will, Phys. Rev. D 71 (2005) 084025 [8] A discussion of the history, technology and Physics of Gravity Probe B can be found at http://einstein.standfod.edu http://einstein.standfod.edu [9] J. Murphy, T. W., K. Nordtvedt, and S. G. Turyshev, Phys. Rev. Lett. 98 (2007) 071102 gr-qc/0702028 [10] R. Jackiw and S. Y. Pi, Phys. Rev. D 68 (2003) 104012 [11] D. Guarrera and Astron. J. Hariton (2007), Phys. Rev. D 76 (2007) 044011 gr-qc/0702029 [12] S. Alexander and J. Martin, Phys. Rev. D 71 (2005) 063526 hep-th/0410230 [13] R. J. Gleiser and C. N. Kozameh, Phys. Rev. D 64 (2001) 083007 gr-qc/0102093 [14] R. H. Brandenberger and C. Vafa, Nucl. Phys. B 316 (1989) 391 [15] L. Randall and R. Sundrum, Phys. Rev. Lett. 83 (1999) 4690 hep-th/9906064 [16] S. Alexander and N. Yunes (2007), in progress. [17] L. Blanchet, Living Rev. Relativ. 9 (2006) 4 [17] and references therein, gr-qc/0202016 [18] S. Alexander, L. S. Finn, and N. Yunes, in progress (2007). CDS Invenio/0.92.0.20070116 refextract/0.92.0.20070116-1181427282-0-8-12-1-4 SzGeCERN 0237765CERCER SLAC 3455840 hep-th/9611103 eng PUPT-1665 Periwal, V Princeton University Matrices on a point as the theory of everything 1997 Princeton, NJ Princeton Univ. Joseph-Henry Lab. Phys. 14 Nov 1996 5 p It is shown that the world-line can be eliminated in the matrix quantum mechanics conjectured by Banks, Fischler, Shenker and Susskind to describe the light-cone physics of M theory. The resulting matrix model has a form that suggests origins in the reduction to a point of a Yang-Mills theory. The reduction of the Nishino-Sezgin $10+2$ dimensional supersymmetric Yang-Mills theory to a point gives a matrix model with the appropriate features: Lorentz invariance in $9+1$ dimensions, supersymmetry, and the correct number of physical degrees of freedom. SIS UNC98 LANL EDS SzGeCERN Particle Physics - Theory ARTICLE Periwal, Vipul 1711 4 Phys. Rev. D 55 1997 http://cdsware.cern.ch/download/invenio-demo-site-files/9611103.pdf vipul@viper.princeton.edu n 199648 13 20070310 0012 CER01 19961115 PUBLIC 000237765CER ARTICLE 0289446CERCER SLAC 3838510 hep-th/9809057 eng Polyakov, A M Princeton University The wall of the cave 1999 In this article old and new relations between gauge fields and strings are discussed. We add new arguments that the Yang Mills theories must be described by the non-critical strings in the five dimensional curved space. The physical meaning of the fifth dimension is that of the renormalization scale represented by the Liouville field. We analyze the meaning of the zigzag symmetry and show that it is likely to be present if there is a minimal supersymmetry on the world sheet. We also present the new string backgrounds which may be relevant for the description of the ordinary bosonic Yang-Mills theories. The article is written on the occasion of the 40-th anniversary of the IHES. SIS LANLPUBL2001 LANL EDS SIS:2001 PR/LKR added SzGeCERN Particle Physics - Theory ARTICLE 645-658 Int. J. Mod. Phys. A 14 1999 polyakov@puhep1.princeton.edu n 199837 13 20060916 0007 CER01 19980910 PUBLIC 000289446CER ARTICLE http://cdsware.cern.ch/download/invenio-demo-site-files/9809057.pdf SzGeCERN 2174811CERCER SLAC 4308492 hep-ph/0002060 eng ACT-2000-1 CTP-TAMU-2000-2 OUTP-2000-03-P TPI-MINN-2000-6 Cleaver, G B Non-Abelian Flat Directions in a Minimal Superstring Standard Model 2000 Houston, TX Houston Univ. Adv. Res. Cent. The Woodlands 4 Feb 2000 14 p Recently, by studying exact flat directions of non-Abelian singlet fields, wedemonstrated the existence of free fermionic heterotic-string models in whichthe SU(3)_C x SU(2)_L x U(1)_Y-charged matter spectrum, just below the stringscale, consists solely of the MSSM spectrum. In this paper we generalize theanalysis to include VEVs of non-Abelian fields. We find several,MSSM-producing, exact non-Abelian flat directions, which are the first suchexamples in the literature. We examine the possibility that hidden sectorcondensates lift the flat directions. LANL EDS SIS LANLPUBL2001 SIS:2001 PR/LKR added SzGeCERN Particle Physics - Phenomenology ARTICLE Faraggi, A E Nanopoulos, Dimitri V Walker, J W Walker, Joel W. 1191-1202 Mod. Phys. Lett. A 15 2000 - - http://documents.cern.ch/cgi-bin/setlink?base=preprint&categ=hep-ph&id=0002060 - Access to fulltext document - gcleaver@rainbow.physics.tamu.edu n 200006 13 20070425 1017 CER01 20000207 PUBLIC 002174811CER ARTICLE [1] A.E. Faraggi and D.V. Nanopoulos and L. Yuan Nucl. Phys. B 335 (1990) 347 [2] I. Antoniadis and J. Ellis and J. Hagelin and D.V. Nanopoulos Phys. Lett. B 213 (1989) 65 [3] I. Antoniadis and C. Bachas and C. Kounnas Nucl. Phys. B 289 (1987) 87 [4] A.E. Faraggi and D.V. Nanopoulos Phys. Rev. D 48 (1993) 3288 [5] G.B. Cleaver and A.E. Faraggi and D.V. Nanopoulos and L. Yuan Phys. Lett. B 455 (1999) 135 [6] hep-ph/9904301 [7] hep-ph/9910230 [8] Phys. Lett. B 256 (1991) 150 [10] hep-ph/9511426 [12] J. Ellis, K. Enqvist, D.V. Nanopoulos Phys. Lett., B 151 (1985) 357 [13] P. Horava Phys. Rev. D 54 (1996) 7561 + + http://cdsware.cern.ch/download/invenio-demo-site-files/0002060.pdf + + + http://cdsware.cern.ch/download/invenio-demo-site-files/0002060.ps.gz + + + + SzGeCERN + 20060914104330.0 + + INIS + 34038281 + + + UNCOVER + 251,129,189,013 + + + eng + + + SCAN-0005061 + + + TESLA-FEL-99-07 + + + Treusch, R + + + Development of photon beam diagnostics for VUV radiation from a SASE FEL + + + 2000 + + + Hamburg + DESY + Dec 1999 + + + For the proof-of-principle experiment of self-amplified spontaneous emission (SASE) at short wavelengths on the VUV FEL at DESY a multi-facetted photon beam diagnostics experiment has been developed employing new detection concepts to measure all SASE specific properties on a single pulse basis. The present setup includes instrumentation for the measurement of the energy and the angular and spectral distribution of individual photon pulses. Different types of photon detectors such as PtSi-photodiodes and fast thermoelectric detectors based on YBaCuO-films are used to cover some five orders of magnitude of intensity from the level of spontaneous emission to FEL radiation at saturation. A 1 m normal incidence monochromator in combination with a fast intensified CCD camera allows to select single photon pulses and to record the full spectrum at high resolution to resolve the fine structure due to the start-up from noise. + + + SIS INIS2004 + + + SIS UNC2002 + + + Development of photon beam diagnostics for VUV radiation from a SASE FEL + + + SzGeCERN + Accelerators and Storage Rings + + + ARTICLE + + + INIS + Particle accelerators + + + INIS + ceramics- + + + INIS + desy- + + + INIS + far-ultraviolet-radiation + + + INIS + free-electron-lasers + + + INIS + photodiodes- + + + INIS + photon-beams + + + INIS + superradiance- + + + INIS + thin-films + + + INIS + x-ray-detection + + + INIS + x-ray-sources + + + INIS + accelerators- + + + INIS + beams- + + + INIS + cyclic-accelerators + + + INIS + detection- + + + INIS + electromagnetic-radiation + + + INIS + emission- + + + INIS + energy-level-transitions + + + INIS + films- + + + INIS + lasers- + + + INIS + photon-emission + + + INIS + radiation-detection + + + INIS + radiation-sources + + + INIS + radiations- + + + INIS + semiconductor-devices + + + INIS + semiconductor-diodes + + + INIS + stimulated-emission + + + INIS + synchrotrons- + + + INIS + ultraviolet-radiation + + + Lokajczyk, T + + + Xu, W + + + Jastrow, U + + + Hahn, U + + + Bittner, L + + + Feldhaus, J + + + 456-462 + 1-3 + Nucl. Instrum. Methods Phys. Res., A + 445 + 2000 + + + n + 200430 + + + 13 + + + 20061230 + 0016 + CER01 + 20040727 + + + 000289917 + 456-462 + hamburg990823 + + + PUBLIC + + + 002471378CER + + + ARTICLE + + + http://cdsware.cern.ch/download/invenio-demo-site-files/convert_SCAN-0005061.pdf + + + + SzGeCERN + 20070110102840.0 + + oai:cds.cern.ch:SCAN-9709037 + cerncds:SCAN + + + cerncds:FULLTEXT + + + 0008580CERCER + + + eng + + + SCAN-9709037 + + + UCRL-8417 + + + Orear, J + + + Notes on statistics for physicists + + + Statistics for physicists + + + 1958 + + + Berkeley, CA + Lawrence Berkeley Nat. Lab. + 13 Aug 1958 + + + 34 p + + + SzGeCERN + Mathematical Physics and Mathematics + + + PREPRINT + + + h + 199700 + + + 11 + + + 20070110 + 1028 + CER01 + 19900127 + + + PUBLIC + + + PREPRINT + + + 0001 + + + 000008580CER + + + http://cdsware.cern.ch/download/invenio-demo-site-files/9709037.pdf + BUL-NEWS-2009-001 eng Charles Darwin A naturalist's voyage around the world <!--HTML--><p class="articleHeader">After having been twice driven back by heavy south-western gales, Her Majesty's ship Beagle" a ten-gun brig, under the command of Captain Fitz Roy, R.N., sailed from Devonport on the 27th of December, 1831. The object of the expedition was to complete the survey of Patagonia and Tierra del Fuego, commenced under Captain King in 1826 to 1830--to survey the shores of Chile, Peru, and of some islands in the Pacific--and to carry a chain of chronometrical measurements round the World.</p> <div class="phwithcaption"> <div class="imageScale"><img alt="" src="http://cdsware.cern.ch/download/invenio-demo-site-files/icon-journal_hms_beagle_image.gif" /></div> <p>H.M.S. Beagle</p> </div> <p>On the 6th of January we reached Teneriffe, but were prevented landing, by fears of our bringing the cholera: the next morning we saw the sun rise behind the rugged outline of the Grand Canary Island, and suddenly illumine the Peak of Teneriffe, whilst the lower parts were veiled in fleecy clouds. This was the first of many delightful days never to be forgotten. On the 16th of January 1832 we anchored at Porto Praya, in St. Jago, the chief island of the Cape de Verd archipelago.</p> <p>The neighbourhood of Porto Praya, viewed from the sea, wears a desolate aspect. The volcanic fires of a past age, and the scorching heat of a tropical sun, have in most places rendered the soil unfit for vegetation. The country rises in successive steps of table-land, interspersed with some truncate conical hills, and the horizon is bounded by an irregular chain of more lofty mountains. The scene, as beheld through the hazy atmosphere of this climate, is one of great interest; if, indeed, a person, fresh from sea, and who has just walked, for the first time, in a grove of cocoa-nut trees, can be a judge of anything but his own happiness. The island would generally be considered as very uninteresting, but to any one accustomed only to an English landscape, the novel aspect of an utterly sterile land possesses a grandeur which more vegetation might spoil. A single green leaf can scarcely be discovered over wide tracts of the lava plains; yet flocks of goats, together with a few cows, contrive to exist. It rains very seldom, but during a short portion of the year heavy torrents fall, and immediately afterwards a light vegetation springs out of every crevice. This soon withers; and upon such naturally formed hay the animals live. It had not now rained for an entire year. When the island was discovered, the immediate neighbourhood of Porto Praya was clothed with trees,1 the reckless destruction of which has caused here, as at St. Helena, and at some of the Canary islands, almost entire sterility. The broad, flat-bottomed valleys, many of which serve during a few days only in the season as watercourses, are clothed with thickets of leafless bushes. Few living creatures inhabit these valleys. The commonest bird is a kingfisher (Dacelo Iagoensis), which tamely sits on the branches of the castor-oil plant, and thence darts on grasshoppers and lizards. It is brightly coloured, but not so beautiful as the European species: in its flight, manners, and place of habitation, which is generally in the driest valley, there is also a wide difference. One day, two of the officers and myself rode to Ribeira Grande, a village a few miles eastward of Porto Praya. Until we reached the valley of St. Martin, the country presented its usual dull brown appearance; but here, a very small rill of water produces a most refreshing margin of luxuriant vegetation. In the course of an hour we arrived at Ribeira Grande, and were surprised at the sight of a large ruined fort and cathedral. This little town, before its harbour was filled up, was the principal place in the island: it now presents a melancholy, but very picturesque appearance. Having procured a black Padre for a guide, and a Spaniard who had served in the Peninsular war as an interpreter, we visited a collection of buildings, of which an ancient church formed the principal part. It is here the governors and captain-generals of the islands have been buried. Some of the tombstones recorded dates of the sixteenth century.1 The heraldic ornaments were the only things in this retired place that reminded us of Europe. The church or chapel formed one side of a quadrangle, in the middle of which a large clump of bananas were growing. On another side was a hospital, containing about a ozen miserable-looking inmates.</p> <p>We returned to the Vênda to eat our dinners. A considerable number of men, women, and children, all as black as jet, collected to watch us. Our companions were extremely merry; and everything we said or did was followed by their hearty laughter. Before leaving the town we visited the cathedral. It does not appear so rich as the smaller church, but boasts of a little organ, which sent forth singularly inharmonious cries. We presented the black priest with a few shillings, and the Spaniard, patting him on the head, said, with much candour, he thought his colour made no great difference. We then returned, as fast as the ponies would go, to Porto Praya.</p> (Excerpt from A NATURALIST'S VOYAGE ROUND THE WORLD Chapter 1, By Charles Darwin) <!--HTML--><br /> 3 02/2009 Atlantis Times 3 03/2009 Atlantis Times ATLANTISTIMESNEWS http://cdsware.cern.ch/download/invenio-demo-site-files/journal_hms_beagle_image.gif http://cdsware.cern.ch/download/invenio-demo-site-files/icon-journal_hms_beagle_image.gif BUL-NEWS-2009-002 eng Plato Atlantis (Critias) <!--HTML--><p class="articleHeader">I have before remarked in speaking of the allotments of the gods, that they distributed the whole earth into portions differing in extent, and made for themselves temples and instituted sacrifices. And Poseidon, receiving for his lot the island of Atlantis, begat children by a mortal woman, and settled them in a part of the island, which I will describe.</p> <p>Looking towards the sea, but in the centre of the whole island, there was a plain which is said to have been the fairest of all plains and very fertile. Near the plain again, and also in the centre of the island at a distance of about fifty stadia, there was a mountain not very high on any side. In this mountain there dwelt one of the earth-born primeval men of that country, whose name was Evenor, and he had a wife named Leucippe, and they had an only daughter who was called Cleito. The maiden had already reached womanhood, when her father and mother died; Poseidon fell in love with her and had intercourse with her, and breaking the ground, inclosed the hill in which she dwelt all round, making alternate zones of sea and land larger and smaller, encircling one another; there were two of land and three of water, which he turned as with a lathe, each having its circumference equidistant every way from the centre, so that no man could get to the island, for ships and voyages were not as yet. He himself, being a god, found no difficulty in making special arrangements for the centre island, bringing up two springs of water from beneath the earth, one of warm water and the other of cold, and making every variety of food to spring up abundantly from the soil. He also begat and brought up five pairs of twin male children; and dividing the island of Atlantis into ten portions, he gave to the first-born of the eldest pair his mother's dwelling and the surrounding allotment, which was the largest and best, and made him king over the rest; the others he made princes, and gave them rule over many men, and a large territory. And he named them all; the eldest, who was the first king, he named Atlas, and after him the whole island and the ocean were called Atlantic. To his twin brother, who was born after him, and obtained as his lot the extremity of the island towards the pillars of Heracles, facing the country which is now called the region of Gades in that part of the world, he gave the name which in the Hellenic language is Eumelus, in the language of the country which is named after him, Gadeirus. Of the second pair of twins he called one Ampheres, and the other Evaemon. To the elder of the third pair of twins he gave the name Mneseus, and Autochthon to the one who followed him. Of the fourth pair of twins he called the elder Elasippus, and the younger Mestor. And of the fifth pair he gave to the elder the name of Azaes, and to the younger that of Diaprepes. All these and their descendants for many generations were the inhabitants and rulers of divers islands in the open sea; and also, as has been already said, they held sway in our direction over the country within the pillars as far as Egypt and Tyrrhenia. Now Atlas had a numerous and honourable family, and they retained the kingdom, the eldest son handing it on to his eldest for many generations; and they had such an amount of wealth as was never before possessed by kings and potentates, and is not likely ever to be again, and they were furnished with everything which they needed, both in the city and country. For because of the greatness of their empire many things were brought to them from foreign countries, and the island itself provided most of what was required by them for the uses of life. In the first place, they dug out of the earth whatever was to be found there, solid as well as fusile, and that which is now only a name and was then something more than a name, orichalcum, was dug out of the earth in many parts of the island, being more precious in those days than anything except gold. There was an abundance of wood for carpenter's work, and sufficient maintenance for tame and wild animals. Moreover, there were a great number of elephants in the island; for as there was provision for all other sorts of animals, both for those which live in lakes and marshes and rivers, and also for those which live in mountains and on plains, so there was for the animal which is the largest and most voracious of all. Also whatever fragrant things there now are in the earth, whether roots, or herbage, or woods, or essences which distil from fruit and flower, grew and thrived in that land; also the fruit which admits of cultivation, both the dry sort, which is given us for nourishment and any other which we use for food&mdash;we call them all by the common name of pulse, and the fruits having a hard rind, affording drinks and meats and ointments, and good store of chestnuts and the like, which furnish pleasure and amusement, and are fruits which spoil with keeping, and the pleasant kinds of dessert, with which we console ourselves after dinner, when we are tired of eating&mdash;all these that sacred island which then beheld the light of the sun, brought forth fair and wondrous and in infinite abundance. With such blessings the earth freely furnished them; meanwhile they went on constructing their temples and palaces and harbours and docks.</p> (Excerpt from CRITIAS, By Plato, translated By Jowett, Benjamin) <!--HTML--><br /> 2 02/2009 Atlantis Times 2 03/2009 Atlantis Times ATLANTISTIMESNEWS BUL-NEWS-2009-003 eng Plato Atlantis (Timaeus) <!--HTML--><p class="articleHeader">This great island lay over against the Pillars of Heracles, in extent greater than Libya and Asia put together, and was the passage to other islands and to a great ocean of which the Mediterranean sea was only the harbour; and within the Pillars the empire of Atlantis reached in Europe to Tyrrhenia and in Libya to Egypt.</p> <p>This mighty power was arrayed against Egypt and Hellas and all the countries</p> <div class="phrwithcaption"> <div class="imageScale"><img src="http://cdsware.cern.ch/download/invenio-demo-site-files/icon-journal_Athanasius_Kircher_Atlantis_image.gif" alt="" /></div> <p>Representation of Atlantis by Athanasius Kircher (1669)</p> </div> bordering on the Mediterranean. Then your city did bravely, and won renown over the whole earth. For at the peril of her own existence, and when the other Hellenes had deserted her, she repelled the invader, and of her own accord gave liberty to all the nations within the Pillars. A little while afterwards there were great earthquakes and floods, and your warrior race all sank into the earth; and the great island of Atlantis also disappeared in the sea. This is the explanation of the shallows which are found in that part of the Atlantic ocean. <p> </p> (Excerpt from TIMAEUS, By Plato, translated By Jowett, Benjamin)<br /> <!--HTML--><br /> 1 02/2009 Atlantis Times 1 03/2009 Atlantis Times 1 04/2009 Atlantis Times ATLANTISTIMESNEWS http://cdsware.cern.ch/download/invenio-demo-site-files/journal_Athanasius_Kircher_Atlantis_image.gif http://cdsware.cern.ch/download/invenio-demo-site-files/icon-journal_Athanasius_Kircher_Atlantis_image.gif BUL-SCIENCE-2009-001 eng Charles Darwin The order Rodentia in South America <!--HTML--><p>The order Rodentia is here very numerous in species: of mice alone I obtained no less than eight kinds. <sup><a name="note1" href="#footnote1">1</a></sup>The largest gnawing animal in the world, the Hydrochærus capybara (the water-hog), is here also common. One which I shot at Monte Video weighed ninety-eight pounds: its length, from the end of the snout to the stump-like tail, was three feet two inches; and its girth three feet eight. These great Rodents occasionally frequent the islands in the mouth of the Plata, where the water is quite salt, but are far more abundant on the borders of fresh-water lakes and rivers. Near Maldonado three or four generally live together. In the daytime they either lie among the aquatic plants, or openly feed on the turf plain.<sup><a name="note2" href="#footnote2">2</a></sup></p> <p> <div class="phlwithcaption"> <div class="imageScale"><img src="http://cdsware.cern.ch/download/invenio-demo-site-files/icon-journal_water_dog_image.gif" alt="" /></div> <p>Hydrochærus capybara or Water-hog</p> </div> When viewed at a distance, from their manner of walking and colour they resemble pigs: but when seated on their haunches, and attentively watching any object with one eye, they reassume the appearance of their congeners, cavies and rabbits. Both the front and side view of their head has quite a ludicrous aspect, from the great depth of their jaw. These animals, at Maldonado, were very tame; by cautiously walking, I approached within three yards of four old ones. This tameness may probably be accounted for, by the Jaguar having been banished for some years, and by the Gaucho not thinking it worth his while to hunt them. As I approached nearer and nearer they frequently made their peculiar noise, which is a low abrupt grunt, not having much actual sound, but rather arising from the sudden expulsion of air: the only noise I know at all like it, is the first hoarse bark of a large dog. Having watched the four from almost within arm's length (and they me) for several minutes, they rushed into the water at full gallop with the greatest impetuosity, and emitted at the same time their bark. After diving a short distance they came again to the surface, but only just showed the upper part of their heads. When the female is swimming in the water, and has young ones, they are said to sit on her back. These animals are easily killed in numbers; but their skins are of trifling value, and the meat is very indifferent. On the islands in the Rio Parana they are exceedingly abundant, and afford the ordinary prey to the Jaguar.</p> <p><small><sup><a name="footnote1" href="#note1">1</a></sup>. In South America I collected altogether twenty-seven species of mice, and thirteen more are known from the works of Azara and other authors. Those collected by myself have been named and described by Mr. Waterhouse at the meetings of the Zoological Society. I must be allowed to take this opportunity of returning my cordial thanks to Mr. Waterhouse, and to the other gentleman attached to that Society, for their kind and most liberal assistance on all occasions.</small></p> <p><small><sup><a name="footnote2" href="#note2">2</a></sup>. In the stomach and duodenum of a capybara which I opened, I found a very large quantity of a thin yellowish fluid, in which scarcely a fibre could be distinguished. Mr. Owen informs me that a part of the oesophagus is so constructed that nothing much larger than a crowquill can be passed down. Certainly the broad teeth and strong jaws of this animal are well fitted to grind into pulp the aquatic plants on which it feeds.</small></p> (Excerpt from A NATURALIST'S VOYAGE ROUND THE WORLD Chapter 3, By Charles Darwin) <!--HTML--><br />test fr 1 02/2009 Atlantis Times 1 03/2009 Atlantis Times ATLANTISTIMESSCIENCE http://cdsware.cern.ch/download/invenio-demo-site-files/journal_water_dog_image.gif http://cdsware.cern.ch/download/invenio-demo-site-files/icon-journal_water_dog_image.gif BUL-NEWS-2009-004 eng Charles Darwin Rio Macâe <!--HTML--><p class="articleHeader">April 14th, 1832.—Leaving Socêgo, we rode to another estate on the Rio Macâe, which was the last patch of cultivated ground in that direction. The estate was two and a half miles long, and the owner had forgotten how many broad.</p> <p> <div class="phlwithcaption"> <div class="imageScale"><img src="http://cdsware.cern.ch/download/invenio-demo-site-files/icon-journal_virgin_forest_image.gif" alt="" /></div> <p>Virgin Forest</p> </div> Only a very small piece had been cleared, yet almost every acre was capable of yielding all the various rich productions of a tropical land. Considering the enormous area of Brazil, the proportion of cultivated ground can scarcely be considered as anything compared to that which is left in the state of nature: at some future age, how vast a population it will support! During the second day's journey we found the road so shut up that it was necessary that a man should go ahead with a sword to cut away the creepers. The forest abounded with beautiful objects; among which the tree ferns, though not large, were, from their bright green foliage, and the elegant curvature of their fronds, most worthy of admiration. In the evening it rained very heavily, and although the thermometer stood at 65°, I felt very cold. As soon as the rain ceased, it was curious to observe the extraordinary evaporation which commenced over the whole extent of the forest. At the height of a hundred feet the hills were buried in a dense white vapour, which rose like columns of smoke from the most thickly-wooded parts, and especially from the valleys. I observed this phenomenon on several occasions: I suppose it is owing to the large surface of foliage, previously heated by the sun's rays.</p> <p>While staying at this estate, I was very nearly being an eye-witness to one of those atrocious acts which can only take place in a slave country. Owing to a quarrel and a lawsuit, the owner was on the point of taking all the women and children from the male slaves, and selling them separately at the public auction at Rio. Interest, and not any feeling of compassion, prevented this act. Indeed, I do not believe the inhumanity of separating thirty families, who had lived together for many years, even occurred to the owner. Yet I will pledge myself, that in humanity and good feeling he was superior to the common run of men. It may be said there exists no limit to the blindness of interest and selfish habit. I may mention one very trifling anecdote, which at the time struck me more forcibly than any story of cruelty. I was crossing a ferry with a negro who was uncommonly stupid. In endeavouring to make him understand, I talked loud, and made signs, in doing which I passed my hand near his face. He, I suppose, thought I was in a passion, and was going to strike him; for instantly, with a frightened look and half-shut eyes, he dropped his hands. I shall never forget my feelings of surprise, disgust, and shame, at seeing a great powerful man afraid even to ward off a blow, directed, as he thought, at his face. This man had been trained to a degradation lower than the slavery of the most helpless animal.</p> (Excerpt from A NATURALIST'S VOYAGE ROUND THE WORLD Chapter 2, By Charles Darwin) 1 03/2009 Atlantis Times ATLANTISTIMESNEWS Atlantis Times http://cdsware.cern.ch/download/invenio-demo-site-files/journal_virgin_forest_image.gif http://cdsware.cern.ch/download/invenio-demo-site-files/icon-journal_virgin_forest_image.gif zho 李白 Li Bai Alone Looking at the Mountain eng 敬亭獨坐 <!--HTML-->眾鳥高飛盡<br /> 孤雲去獨閒<br /> 相看兩不厭<br /> 唯有敬亭山 <!--HTML-->All the birds have flown up and gone;<br /> A lonely cloud floats leisurely by.<br /> We never tire of looking at each other -<br /> Only the mountain and I. 701-762 2009-09-16 00 2009-09-16 BATCH POETRY diff --git a/modules/miscutil/lib/Makefile.am b/modules/miscutil/lib/Makefile.am index 192ab714e..0ddfde7a9 100644 --- a/modules/miscutil/lib/Makefile.am +++ b/modules/miscutil/lib/Makefile.am @@ -1,76 +1,77 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. pylibdir = $(libdir)/python/invenio pylib_DATA = __init__.py \ errorlib.py \ errorlib_tests.py \ errorlib_webinterface.py \ errorlib_regression_tests.py \ data_cacher.py \ dbdump.py \ dbquery.py \ dbquery_tests.py \ mailutils.py \ miscutil_config.py \ messages.py \ messages_tests.py \ textutils.py \ textutils_tests.py \ dateutils.py \ dateutils_tests.py \ htmlutils.py \ htmlutils_tests.py \ testutils.py \ testutils_regression_tests.py \ urlutils.py \ urlutils_tests.py \ w3c_validator.py \ intbitset_tests.py \ inveniocfg.py \ shellutils.py \ shellutils_tests.py \ pluginutils.py \ - pluginutils_tests.py + pluginutils_tests.py \ + asyncproc.py noinst_DATA = testimport.py \ kwalitee.py \ pep8.py EXTRA_DIST = $(pylib_DATA) \ testimport.py \ kwalitee.py \ pep8.py \ intbitset.pyx \ intbitset.c \ intbitset.h \ intbitset_impl.c \ intbitset_setup.py \ intbitset.pyx all: $(PYTHON) $(srcdir)/intbitset_setup.py build_ext install-data-hook: $(PYTHON) $(srcdir)/testimport.py ${prefix} @find ${srcdir} -name intbitset.so -exec cp {} ${pylibdir} \; CLEANFILES = *~ *.tmp *.pyc clean-local: rm -rf build diff --git a/modules/miscutil/lib/asyncproc.py b/modules/miscutil/lib/asyncproc.py new file mode 100644 index 000000000..785fb45a6 --- /dev/null +++ b/modules/miscutil/lib/asyncproc.py @@ -0,0 +1,419 @@ +#! /usr/bin/env python + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# The text of the license conditions can be read at +# +# or at . + + +__rcsId__ = """$Id: asyncproc.py,v 1.9 2007/08/06 18:29:24 bellman Exp $""" +__author__ = "Thomas Bellman " +__url__ = "http://www.lysator.liu.se/~bellman/download/" +__licence__ = "GNU General Publice License version 3 or later" + + +import os +import time +import errno +import signal +import threading +import subprocess + + +__all__ = [ 'Process', 'with_timeout', 'Timeout' ] + + +class Timeout(Exception): + """Exception raised by with_timeout() when the operation takes too long. + """ + pass + + +def with_timeout(timeout, func, *args, **kwargs): + """Call a function, allowing it only to take a certain amount of time. + Parameters: + - timeout The time, in seconds, the function is allowed to spend. + This must be an integer, due to limitations in the + SIGALRM handling. + - func The function to call. + - *args Non-keyword arguments to pass to func. + - **kwargs Keyword arguments to pass to func. + + Upon successful completion, with_timeout() returns the return value + from func. If a timeout occurs, the Timeout exception will be raised. + + If an alarm is pending when with_timeout() is called, with_timeout() + tries to restore that alarm as well as possible, and call the SIGALRM + signal handler if it would have expired during the execution of func. + This may cause that signal handler to be executed later than it would + normally do. In particular, calling with_timeout() from within a + with_timeout() call with a shorter timeout, won't interrupt the inner + call. I.e., + with_timeout(5, with_timeout, 60, time.sleep, 120) + won't interrupt the time.sleep() call until after 60 seconds. + """ + + class SigAlarm(Exception): + """Internal exception used only within with_timeout(). + """ + pass + + def alarm_handler(signum, frame): + raise SigAlarm() + + oldalarm = signal.alarm(0) + oldhandler = signal.signal(signal.SIGALRM, alarm_handler) + try: + try: + t0 = time.time() + signal.alarm(timeout) + retval = func(*args, **kwargs) + except SigAlarm: + raise Timeout("Function call took too long", func, timeout) + finally: + signal.alarm(0) + signal.signal(signal.SIGALRM, oldhandler) + if oldalarm != 0: + t1 = time.time() + remaining = oldalarm - int(t1 - t0 + 0.5) + if remaining <= 0: + # The old alarm has expired. + os.kill(os.getpid(), signal.SIGALRM) + else: + signal.alarm(remaining) + + return retval + + + +class Process(object): + """Manager for an asynchronous process. + The process will be run in the background, and its standard output + and standard error will be collected asynchronously. + + Since the collection of output happens asynchronously (handled by + threads), the process won't block even if it outputs large amounts + of data and you do not call Process.read*(). + + Similarly, it is possible to send data to the standard input of the + process using the write() method, and the caller of write() won't + block even if the process does not drain its input. + + On the other hand, this can consume large amounts of memory, + potentially even exhausting all memory available. + + Parameters are identical to subprocess.Popen(), except that stdin, + stdout and stderr default to subprocess.PIPE instead of to None. + Note that if you set stdout or stderr to anything but PIPE, the + Process object won't collect that output, and the read*() methods + will always return empty strings. Also, setting stdin to something + other than PIPE will make the write() method raise an exception. + """ + + def __init__(self, *params, **kwparams): + if len(params) <= 3: + kwparams.setdefault('stdin', subprocess.PIPE) + if len(params) <= 4: + kwparams.setdefault('stdout', subprocess.PIPE) + if len(params) <= 5: + kwparams.setdefault('stderr', subprocess.PIPE) + self.__pending_input = [] + self.__collected_outdata = [] + self.__collected_errdata = [] + self.__exitstatus = None + self.__lock = threading.Lock() + self.__inputsem = threading.Semaphore(0) + # Flag telling feeder threads to quit + self.__quit = False + + self.__process = subprocess.Popen(*params, **kwparams) + + if self.__process.stdin: + self.__stdin_thread = threading.Thread( + name="stdin-thread", + target=self.__feeder, args=(self.__pending_input, + self.__process.stdin)) + self.__stdin_thread.setDaemon(True) + self.__stdin_thread.start() + if self.__process.stdout: + self.__stdout_thread = threading.Thread( + name="stdout-thread", + target=self.__reader, args=(self.__collected_outdata, + self.__process.stdout)) + self.__stdout_thread.setDaemon(True) + self.__stdout_thread.start() + if self.__process.stderr: + self.__stderr_thread = threading.Thread( + name="stderr-thread", + target=self.__reader, args=(self.__collected_errdata, + self.__process.stderr)) + self.__stderr_thread.setDaemon(True) + self.__stderr_thread.start() + + def __del__(self, __killer=os.kill, __sigkill=signal.SIGKILL): + if self.__exitstatus is None: + __killer(self.pid(), __sigkill) + + def pid(self): + """Return the process id of the process. + Note that if the process has died (and successfully been waited + for), that process id may have been re-used by the operating + system. + """ + return self.__process.pid + + def kill(self, signal): + """Send a signal to the process. + Raises OSError, with errno set to ECHILD, if the process is no + longer running. + """ + if self.__exitstatus is not None: + # Throwing ECHILD is perhaps not the most kosher thing to do... + # ESRCH might be considered more proper. + raise OSError(errno.ECHILD, os.strerror(errno.ECHILD)) + os.kill(self.pid(), signal) + + def wait(self, flags=0): + """Return the process' termination status. + + If bitmask parameter 'flags' contains os.WNOHANG, wait() will + return None if the process hasn't terminated. Otherwise it + will wait until the process dies. + + It is permitted to call wait() several times, even after it + has succeeded; the Process instance will remember the exit + status from the first successful call, and return that on + subsequent calls. + """ + if self.__exitstatus is not None: + return self.__exitstatus + pid,exitstatus = os.waitpid(self.pid(), flags) + if pid == 0: + return None + if os.WIFEXITED(exitstatus) or os.WIFSIGNALED(exitstatus): + self.__exitstatus = exitstatus + # If the process has stopped, we have to make sure to stop + # our threads. The reader threads will stop automatically + # (assuming the process hasn't forked), but the feeder thread + # must be signalled to stop. + if self.__process.stdin: + self.closeinput() + # We must wait for the reader threads to finish, so that we + # can guarantee that all the output from the subprocess is + # available to the .read*() methods. + # And by the way, it is the responsibility of the reader threads + # to close the pipes from the subprocess, not our. + if self.__process.stdout: + self.__stdout_thread.join() + if self.__process.stderr: + self.__stderr_thread.join() + return exitstatus + + def terminate(self, graceperiod=1): + """Terminate the process, with escalating force as needed. + First try gently, but increase the force if it doesn't respond + to persuassion. The levels tried are, in order: + - close the standard input of the process, so it gets an EOF. + - send SIGTERM to the process. + - send SIGKILL to the process. + terminate() waits up to GRACEPERIOD seconds (default 1) before + escalating the level of force. As there are three levels, a total + of (3-1)*GRACEPERIOD is allowed before the process is SIGKILL:ed. + GRACEPERIOD must be an integer, and must be at least 1. + If the process was started with stdin not set to PIPE, the + first level (closing stdin) is skipped. + """ + if self.__process.stdin: + # This is rather meaningless when stdin != PIPE. + self.closeinput() + try: + return with_timeout(graceperiod, self.wait) + except Timeout: + pass + + self.kill(signal.SIGTERM) + try: + return with_timeout(graceperiod, self.wait) + except Timeout: + pass + + self.kill(signal.SIGKILL) + return self.wait() + + def __reader(self, collector, source): + """Read data from source until EOF, adding it to collector. + """ + while True: + data = os.read(source.fileno(), 65536) + self.__lock.acquire() + collector.append(data) + self.__lock.release() + if data == "": + source.close() + break + return + + def __feeder(self, pending, drain): + """Feed data from the list pending to the file drain. + """ + while True: + self.__inputsem.acquire() + self.__lock.acquire() + if not pending and self.__quit: + drain.close() + self.__lock.release() + break + data = pending.pop(0) + self.__lock.release() + drain.write(data) + + def read(self): + """Read data written by the process to its standard output. + """ + self.__lock.acquire() + outdata = "".join(self.__collected_outdata) + del self.__collected_outdata[:] + self.__lock.release() + return outdata + + def readerr(self): + """Read data written by the process to its standard error. + """ + self.__lock.acquire() + errdata = "".join(self.__collected_errdata) + del self.__collected_errdata[:] + self.__lock.release() + return errdata + + def readboth(self): + """Read data written by the process to its standard output and error. + Return value is a two-tuple ( stdout-data, stderr-data ). + + WARNING! The name of this method is ugly, and may change in + future versions! + """ + self.__lock.acquire() + outdata = "".join(self.__collected_outdata) + del self.__collected_outdata[:] + errdata = "".join(self.__collected_errdata) + del self.__collected_errdata[:] + self.__lock.release() + return outdata,errdata + + def _peek(self): + self.__lock.acquire() + output = "".join(self.__collected_outdata) + error = "".join(self.__collected_errdata) + self.__lock.release() + return output,error + + def write(self, data): + """Send data to a process's standard input. + """ + if self.__process.stdin is None: + raise ValueError("Writing to process with stdin not a pipe") + self.__lock.acquire() + self.__pending_input.append(data) + self.__inputsem.release() + self.__lock.release() + + def closeinput(self): + """Close the standard input of a process, so it receives EOF. + """ + self.__lock.acquire() + self.__quit = True + self.__inputsem.release() + self.__lock.release() + + +class ProcessManager(object): + """Manager for asynchronous processes. + This class is intended for use in a server that wants to expose the + asyncproc.Process API to clients. Within a single process, it is + usually better to just keep track of the Process objects directly + instead of hiding them behind this. It probably shouldn't have been + made part of the asyncproc module in the first place. + """ + + def __init__(self): + self.__last_id = 0 + self.__procs = {} + + def start(self, args, executable=None, shell=False, cwd=None, env=None): + """Start a program in the background, collecting its output. + Returns an integer identifying the process. (Note that this + integer is *not* the OS process id of the actuall running + process.) + """ + proc = Process(args=args, executable=executable, shell=shell, + cwd=cwd, env=env) + self.__last_id += 1 + self.__procs[self.__last_id] = proc + return self.__last_id + + def kill(self, procid, signal): + return self.__procs[procid].kill(signal) + + def terminate(self, procid, graceperiod=1): + return self.__procs[procid].terminate(graceperiod) + + def write(self, procid, data): + return self.__procs[procid].write(data) + + def closeinput(self, procid): + return self.__procs[procid].closeinput() + + def read(self, procid): + return self.__procs[procid].read() + + def readerr(self, procid): + return self.__procs[procid].readerr() + + def readboth(self, procid): + return self.__procs[procid].readboth() + + def wait(self, procid, flags=0): + """ + Unlike the os.wait() function, the process will be available + even after ProcessManager.wait() has returned successfully, + in order for the process' output to be retrieved. Use the + reap() method for removing dead processes. + """ + return self.__procs[procid].wait(flags) + + def reap(self, procid): + """Remove a process. + If the process is still running, it is killed with no pardon. + The process will become unaccessible, and its identifier may + be reused immediately. + """ + if self.wait(procid, os.WNOHANG) is None: + self.kill(procid, signal.SIGKILL) + self.wait(procid) + del self.__procs[procid] + + def reapall(self): + """Remove all processes. + Running processes are killed without pardon. + """ + # Since reap() modifies __procs, we have to iterate over a copy + # of the keys in it. Thus, do not remove the .keys() call. + for procid in self.__procs.keys(): + self.reap(procid) + + +def _P1(): + return Process(["tcpconnect", "-irv", "localhost", "6923"]) + +def _P2(): + return Process(["tcplisten", "-irv", "6923"]) + diff --git a/modules/miscutil/lib/inveniocfg.py b/modules/miscutil/lib/inveniocfg.py index ba5d8bae9..b355a63d8 100644 --- a/modules/miscutil/lib/inveniocfg.py +++ b/modules/miscutil/lib/inveniocfg.py @@ -1,1166 +1,1229 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ Invenio configuration and administration CLI tool. Usage: inveniocfg [options] General options: -h, --help print this help -V, --version print version number Options to finish your installation: --create-apache-conf create Apache configuration files --create-tables create DB tables for Invenio --load-webstat-conf load the WebStat configuration --drop-tables drop DB tables of Invenio + --check-openoffice-dir check for correctly set up of openoffice temporary directory Options to set up and test a demo site: --create-demo-site create demo site --load-demo-records load demo records --remove-demo-records remove demo records, keeping demo site --drop-demo-site drop demo site configurations too - --run-unit-tests run unit test suite (needs deme site) + --run-unit-tests run unit test suite (needs demo site) --run-regression-tests run regression test suite (needs demo site) --run-web-tests run web tests in a browser (needs demo site, Firefox, Selenium IDE) Options to update config files in situ: --update-all perform all the update options --update-config-py update config.py file from invenio.conf file --update-dbquery-py update dbquery.py with DB credentials from invenio.conf --update-dbexec update dbexec with DB credentials from invenio.conf --update-bibconvert-tpl update bibconvert templates with CFG_SITE_URL from invenio.conf --update-web-tests update web test cases with CFG_SITE_URL from invenio.conf Options to update DB tables: --reset-all perform all the reset options --reset-sitename reset tables to take account of new CFG_SITE_NAME* --reset-siteadminemail reset tables to take account of new CFG_SITE_ADMIN_EMAIL --reset-fieldnames reset tables to take account of new I18N names from PO files --reset-recstruct-cache reset record structure cache according to CFG_BIBUPLOAD_SERIALIZE_RECORD_STRUCTURE Options to help the work: --list print names and values of all options from conf files --get get value of a given option from conf files --conf-dir path to directory where invenio*.conf files are [optional] --detect-system-details print system details such as Apache/Python/MySQL versions """ __revision__ = "$Id$" from ConfigParser import ConfigParser import os import re import shutil import socket import sys import zlib import marshal def print_usage(): """Print help.""" print __doc__ def print_version(): """Print version information.""" print __revision__ def convert_conf_option(option_name, option_value): """ Convert conf option into Python config.py line, converting values to ints or strings as appropriate. """ ## 1) convert option name to uppercase: option_name = option_name.upper() ## 2) convert option value to int or string: if option_name in ['CFG_BIBUPLOAD_REFERENCE_TAG', 'CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG', 'CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG', 'CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG', 'CFG_BIBUPLOAD_STRONG_TAGS', 'CFG_BIBFORMAT_HIDDEN_TAGS', 'CFG_SITE_EMERGENCY_PHONE_NUMBERS']: # some options are supposed be string even when they look like # numeric option_value = '"' + option_value + '"' else: try: option_value = int(option_value) except ValueError: option_value = '"' + option_value + '"' - ## 3a) special cases: regexps + ## 3a) special cases: chars regexps if option_name in ['CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS', 'CFG_BIBINDEX_CHARS_PUNCTUATION']: option_value = 'r"[' + option_value[1:-1] + ']"' + ## 3abis) special cases: real regexps + if option_name in ['CFG_BIBINDEX_PERFORM_OCR_ON_DOCNAMES', + 'CFG_BIBINDEX_SPLASH_PAGES']: + option_value = 'r"' + option_value[1:-1] + '"' + ## 3b) special cases: True, False, None if option_value in ['"True"', '"False"', '"None"']: option_value = option_value[1:-1] ## 3c) special cases: dicts - if option_name in ['CFG_WEBSEARCH_FIELDS_CONVERT', ]: + if option_name in ['CFG_WEBSEARCH_FIELDS_CONVERT']: option_value = option_value[1:-1] ## 3d) special cases: comma-separated lists if option_name in ['CFG_SITE_LANGS', 'CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS', 'CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS', 'CFG_BIBUPLOAD_STRONG_TAGS', 'CFG_BIBFORMAT_HIDDEN_TAGS', 'CFG_BIBSCHED_GC_TASKS_TO_REMOVE', 'CFG_BIBSCHED_GC_TASKS_TO_ARCHIVE', 'CFG_BIBUPLOAD_FFT_ALLOWED_LOCAL_PATHS', 'CFG_BIBUPLOAD_CONTROLLED_PROVENANCE_TAGS', 'CFG_WEBSEARCH_ENABLED_SEARCH_INTERFACES', 'CFG_SITE_EMERGENCY_PHONE_NUMBERS', 'CFG_WEBSTYLE_HTTP_STATUS_ALERT_LIST', 'CFG_WEBSEARCH_RSS_I18N_COLLECTIONS']: out = "[" for elem in option_value[1:-1].split(","): if elem: if option_name in ['CFG_WEBSEARCH_ENABLED_SEARCH_INTERFACES']: # 3d1) integer values out += "%i, " % int(elem) else: # 3d2) string values out += "'%s', " % elem out += "]" option_value = out ## 3e) special cases: multiline if option_name == 'CFG_OAI_IDENTIFY_DESCRIPTION': # make triple quotes option_value = '""' + option_value + '""' ## 3f) ignore some options: if option_name.startswith('CFG_SITE_NAME_INTL'): # treated elsewhere return ## 3g) special cases: float if option_name == 'CFG_BIBDOCFILE_MD5_CHECK_PROBABILITY': option_value = float(option_value[1:-1]) ## 4) finally, return output line: return '%s = %s' % (option_name, option_value) def cli_cmd_update_config_py(conf): """ Update new config.py from conf options, keeping previous config.py in a backup copy. """ print ">>> Going to update config.py..." ## location where config.py is: configpyfile = conf.get("Invenio", "CFG_PYLIBDIR") + \ os.sep + 'invenio' + os.sep + 'config.py' ## backup current config.py file: if os.path.exists(configpyfile): shutil.copy(configpyfile, configpyfile + '.OLD') ## here we go: fdesc = open(configpyfile, 'w') ## generate preamble: fdesc.write("# -*- coding: utf-8 -*-\n") fdesc.write("# DO NOT EDIT THIS FILE! IT WAS AUTOMATICALLY GENERATED\n") fdesc.write("# FROM INVENIO.CONF BY EXECUTING:\n") fdesc.write("# " + " ".join(sys.argv) + "\n") ## special treatment for CFG_SITE_NAME_INTL options: fdesc.write("CFG_SITE_NAME_INTL = {}\n") for lang in conf.get("Invenio", "CFG_SITE_LANGS").split(","): fdesc.write("CFG_SITE_NAME_INTL['%s'] = \"%s\"\n" % (lang, conf.get("Invenio", "CFG_SITE_NAME_INTL_" + lang))) ## special treatment for CFG_SITE_SECURE_URL that may be empty, in ## which case it should be put equal to CFG_SITE_URL: if not conf.get("Invenio", "CFG_SITE_SECURE_URL"): conf.set("Invenio", "CFG_SITE_SECURE_URL", conf.get("Invenio", "CFG_SITE_URL")) ## process all the options normally: sections = conf.sections() sections.sort() for section in sections: options = conf.options(section) options.sort() for option in options: if not option.startswith('CFG_DATABASE_'): # put all options except for db credentials into config.py line_out = convert_conf_option(option, conf.get(section, option)) if line_out: fdesc.write(line_out + "\n") ## FIXME: special treatment for experimental variables ## CFG_WEBSEARCH_ENABLED_SEARCH_INTERFACES and CFG_WEBSEARCH_DEFAULT_SEARCH_INTERFACE ## (not offering them in invenio.conf since they will be refactored) fdesc.write("CFG_WEBSEARCH_DEFAULT_SEARCH_INTERFACE = 0\n") fdesc.write("CFG_WEBSEARCH_ENABLED_SEARCH_INTERFACES = [0, 1,]\n") ## generate postamble: fdesc.write("") fdesc.write("# END OF GENERATED FILE") ## we are done: fdesc.close() print "You may want to restart Apache now." print ">>> config.py updated successfully." def cli_cmd_update_dbquery_py(conf): """ Update lib/dbquery.py file with DB parameters read from conf file. Note: this edits dbquery.py in situ, taking a backup first. Use only when you know what you are doing. """ print ">>> Going to update dbquery.py..." ## location where dbquery.py is: dbquerypyfile = conf.get("Invenio", "CFG_PYLIBDIR") + \ os.sep + 'invenio' + os.sep + 'dbquery.py' ## backup current dbquery.py file: if os.path.exists(dbquerypyfile): shutil.copy(dbquerypyfile, dbquerypyfile + '.OLD') ## replace db parameters: out = '' for line in open(dbquerypyfile, 'r').readlines(): match = re.search(r'^CFG_DATABASE_(HOST|PORT|NAME|USER|PASS)(\s*=\s*)\'.*\'$', line) if match: dbparam = 'CFG_DATABASE_' + match.group(1) out += "%s%s'%s'\n" % (dbparam, match.group(2), conf.get('Invenio', dbparam)) else: out += line fdesc = open(dbquerypyfile, 'w') fdesc.write(out) fdesc.close() print "You may want to restart Apache now." print ">>> dbquery.py updated successfully." def cli_cmd_update_dbexec(conf): """ Update bin/dbexec file with DB parameters read from conf file. Note: this edits dbexec in situ, taking a backup first. Use only when you know what you are doing. """ print ">>> Going to update dbexec..." ## location where dbexec is: dbexecfile = conf.get("Invenio", "CFG_BINDIR") + \ os.sep + 'dbexec' ## backup current dbexec file: if os.path.exists(dbexecfile): shutil.copy(dbexecfile, dbexecfile + '.OLD') ## replace db parameters via sed: out = '' for line in open(dbexecfile, 'r').readlines(): match = re.search(r'^CFG_DATABASE_(HOST|PORT|NAME|USER|PASS)(\s*=\s*)\'.*\'$', line) if match: dbparam = 'CFG_DATABASE_' + match.group(1) out += "%s%s'%s'\n" % (dbparam, match.group(2), conf.get("Invenio", dbparam)) else: out += line fdesc = open(dbexecfile, 'w') fdesc.write(out) fdesc.close() print ">>> dbexec updated successfully." def cli_cmd_update_bibconvert_tpl(conf): """ Update bibconvert/config/*.tpl files looking for 856 http://.../record/ lines, replacing URL with CFG_SITE_URL taken from conf file. Note: this edits tpl files in situ, taking a backup first. Use only when you know what you are doing. """ print ">>> Going to update bibconvert templates..." ## location where bibconvert/config/*.tpl are: tpldir = conf.get("Invenio", 'CFG_ETCDIR') + \ os.sep + 'bibconvert' + os.sep + 'config' ## find all *.tpl files: for tplfilename in os.listdir(tpldir): if tplfilename.endswith(".tpl"): ## change tpl file: tplfile = tpldir + os.sep + tplfilename shutil.copy(tplfile, tplfile + '.OLD') out = '' for line in open(tplfile, 'r').readlines(): match = re.search(r'^(.*)http://.*?/record/(.*)$', line) if match: out += "%s%s/record/%s\n" % (match.group(1), conf.get("Invenio", 'CFG_SITE_URL'), match.group(2)) else: out += line fdesc = open(tplfile, 'w') fdesc.write(out) fdesc.close() print ">>> bibconvert templates updated successfully." def cli_cmd_update_web_tests(conf): """ Update web test cases lib/webtest/test_*.html looking for http://.+?[>> Going to update web tests..." ## location where test_*.html files are: testdir = conf.get("Invenio", 'CFG_PREFIX') + os.sep + \ 'lib' + os.sep + 'webtest' + os.sep + 'invenio' ## find all test_*.html files: for testfilename in os.listdir(testdir): if testfilename.startswith("test_") and \ testfilename.endswith(".html"): ## change test file: testfile = testdir + os.sep + testfilename shutil.copy(testfile, testfile + '.OLD') out = '' for line in open(testfile, 'r').readlines(): match = re.search(r'^(.*)http://.+?([)/opt/cds-invenio(.*)$', line) if match: out += "%s%s%s\n" % (match.group(1), conf.get("Invenio", 'CFG_PREFIX'), match.group(2)) else: out += line fdesc = open(testfile, 'w') fdesc.write(out) fdesc.close() print ">>> web tests updated successfully." def cli_cmd_reset_sitename(conf): """ Reset collection-related tables with new CFG_SITE_NAME and CFG_SITE_NAME_INTL* read from conf files. """ print ">>> Going to reset CFG_SITE_NAME and CFG_SITE_NAME_INTL..." from invenio.dbquery import run_sql, IntegrityError # reset CFG_SITE_NAME: sitename = conf.get("Invenio", "CFG_SITE_NAME") try: run_sql("""INSERT INTO collection (id, name, dbquery, reclist) VALUES (1,%s,NULL,NULL)""", (sitename,)) except IntegrityError: run_sql("""UPDATE collection SET name=%s WHERE id=1""", (sitename,)) # reset CFG_SITE_NAME_INTL: for lang in conf.get("Invenio", "CFG_SITE_LANGS").split(","): sitename_lang = conf.get("Invenio", "CFG_SITE_NAME_INTL_" + lang) try: run_sql("""INSERT INTO collectionname (id_collection, ln, type, value) VALUES (%s,%s,%s,%s)""", (1, lang, 'ln', sitename_lang)) except IntegrityError: run_sql("""UPDATE collectionname SET value=%s WHERE ln=%s AND id_collection=1 AND type='ln'""", (sitename_lang, lang)) print "You may want to restart Apache now." print ">>> CFG_SITE_NAME and CFG_SITE_NAME_INTL* reset successfully." def cli_cmd_reset_recstruct_cache(conf): """If CFG_BIBUPLOAD_SERIALIZE_RECORD_STRUCTURE is changed, this function will adapt the database to either store or not store the recstruct format.""" from invenio.intbitset import intbitset from invenio.dbquery import run_sql from invenio.search_engine import get_record from invenio.bibsched import server_pid, pidfile enable_recstruct_cache = conf.get("Invenio", "CFG_BIBUPLOAD_SERIALIZE_RECORD_STRUCTURE") enable_recstruct_cache = enable_recstruct_cache in ('True', '1') pid = server_pid(ping_the_process=False) if pid: print >> sys.stderr, "ERROR: bibsched seems to run with pid %d, according to %s." % (pid, pidfile) print >> sys.stderr, " Please stop bibsched before running this procedure." sys.exit(1) if enable_recstruct_cache: print ">>> Searching records which need recstruct cache resetting; this may take a while..." all_recids = intbitset(run_sql("SELECT id FROM bibrec")) good_recids = intbitset(run_sql("SELECT bibrec.id FROM bibrec JOIN bibfmt ON bibrec.id = bibfmt.id_bibrec WHERE format='recstruct' AND modification_date < last_updated")) recids = all_recids - good_recids print ">>> Generating recstruct cache..." tot = len(recids) count = 0 for recid in recids: value = zlib.compress(marshal.dumps(get_record(recid))) run_sql("DELETE FROM bibfmt WHERE id_bibrec=%s AND format='recstruct'", (recid, )) run_sql("INSERT INTO bibfmt(id_bibrec, format, last_updated, value) VALUES(%s, 'recstruct', NOW(), %s)", (recid, value)) count += 1 if count % 1000 == 0: print " ... done records %s/%s" % (count, tot) if count % 1000 != 0: print " ... done records %s/%s" % (count, tot) print ">>> recstruct cache generated successfully." else: print ">>> Cleaning recstruct cache..." run_sql("DELETE FROM bibfmt WHERE format='recstruct'") def cli_cmd_reset_siteadminemail(conf): """ Reset user-related tables with new CFG_SITE_ADMIN_EMAIL read from conf files. """ print ">>> Going to reset CFG_SITE_ADMIN_EMAIL..." from invenio.dbquery import run_sql siteadminemail = conf.get("Invenio", "CFG_SITE_ADMIN_EMAIL") run_sql("DELETE FROM user WHERE id=1") run_sql("""INSERT INTO user (id, email, password, note, nickname) VALUES (1, %s, AES_ENCRYPT(email, ''), 1, 'admin')""", (siteadminemail,)) print "You may want to restart Apache now." print ">>> CFG_SITE_ADMIN_EMAIL reset successfully." def cli_cmd_reset_fieldnames(conf): """ Reset I18N field names such as author, title, etc and other I18N ranking method names such as word similarity. Their translations are taken from the PO files. """ print ">>> Going to reset I18N field names..." from invenio.messages import gettext_set_language, language_list_long from invenio.dbquery import run_sql, IntegrityError ## get field id and name list: field_id_name_list = run_sql("SELECT id, name FROM field") ## get rankmethod id and name list: rankmethod_id_name_list = run_sql("SELECT id, name FROM rnkMETHOD") ## update names for every language: for lang, dummy in language_list_long(): _ = gettext_set_language(lang) ## this list is put here in order for PO system to pick names ## suitable for translation field_name_names = {"any field": _("any field"), "title": _("title"), "author": _("author"), "abstract": _("abstract"), "keyword": _("keyword"), "report number": _("report number"), "subject": _("subject"), "reference": _("reference"), "fulltext": _("fulltext"), "collection": _("collection"), "division": _("division"), "year": _("year"), "journal": _("journal"), "experiment": _("experiment"), "record ID": _("record ID")} ## update I18N names for every language: for (field_id, field_name) in field_id_name_list: if field_name_names.has_key(field_name): try: run_sql("""INSERT INTO fieldname (id_field,ln,type,value) VALUES (%s,%s,%s,%s)""", (field_id, lang, 'ln', field_name_names[field_name])) except IntegrityError: run_sql("""UPDATE fieldname SET value=%s WHERE id_field=%s AND ln=%s AND type=%s""", (field_name_names[field_name], field_id, lang, 'ln',)) ## ditto for rank methods: rankmethod_name_names = {"wrd": _("word similarity"), "demo_jif": _("journal impact factor"), "citation": _("times cited"),} for (rankmethod_id, rankmethod_name) in rankmethod_id_name_list: try: run_sql("""INSERT INTO rnkMETHODNAME (id_rnkMETHOD,ln,type,value) VALUES (%s,%s,%s,%s)""", (rankmethod_id, lang, 'ln', rankmethod_name_names[rankmethod_name])) except IntegrityError: run_sql("""UPDATE rnkMETHODNAME SET value=%s WHERE id_rnkMETHOD=%s AND ln=%s AND type=%s""", (rankmethod_name_names[rankmethod_name], rankmethod_id, lang, 'ln',)) print ">>> I18N field names reset successfully." +def cli_check_openoffice_dir(conf): + """ + If OpenOffice.org integration is enabled, checks whether the system is + properly configured. + """ + from invenio.textutils import wrap_text_in_a_box + from invenio.websubmit_file_converter import check_openoffice_tmpdir, \ + InvenioWebSubmitFileConverterError, CFG_OPENOFFICE_TMPDIR + from invenio.config import CFG_OPENOFFICE_USER, \ + CFG_PATH_OPENOFFICE_PYTHON, \ + CFG_OPENOFFICE_SERVER_HOST, \ + CFG_BIBSCHED_PROCESS_USER + from invenio.bibtask import guess_apache_process_user, \ + check_running_process_user + check_running_process_user() + print ">>> Checking if OpenOffice is correctly integrated...", + if CFG_OPENOFFICE_SERVER_HOST: + try: + check_openoffice_tmpdir() + except InvenioWebSubmitFileConverterError, err: + print wrap_text_in_a_box("""\ +OpenOffice.org can't properly create files in the OpenOffice.org temporary +directory %(tmpdir)s, as the user %(nobody)s (as configured in +CFG_OPENOFFICE_USER invenio(-local).conf variable): %(err)s. + + +In your /etc/sudoers file, you should authorize the %(apache)s user to run + %(python)s as %(nobody)s user as in: + + +%(apache)s localhost=(%(nobody)s) NOPASSWD: %(python)s + + +You should then run the following commands: + + +$ sudo mkdir -p %(tmpdir)s + +$ sudo chown %(nobody)s %(tmpdir)s + +$ sudo chmod 755 %(tmpdir)s""" % { + 'tmpdir' : CFG_OPENOFFICE_TMPDIR, + 'nobody' : CFG_OPENOFFICE_USER, + 'err' : err, + 'apache' : CFG_BIBSCHED_PROCESS_USER or guess_apache_process_user(), + 'python' : CFG_PATH_OPENOFFICE_PYTHON + }) + sys.exit(1) + print "ok" + else: + print "OpenOffice.org integration not enabled" + def test_db_connection(): """ Test DB connection, and if fails, advise user how to set it up. Useful to be called during table creation. """ print "Testing DB connection...", from invenio.textutils import wrap_text_in_a_box from invenio.dbquery import run_sql, Error ## first, test connection to the DB server: try: run_sql("SHOW TABLES") except Error, err: from invenio.dbquery import CFG_DATABASE_HOST, CFG_DATABASE_PORT, \ CFG_DATABASE_NAME, CFG_DATABASE_USER, CFG_DATABASE_PASS print wrap_text_in_a_box("""\ DATABASE CONNECTIVITY ERROR %(errno)d: %(errmsg)s.\n Perhaps you need to set up database and connection rights? If yes, then please login as MySQL admin user and run the following commands now: $ mysql -h %(dbhost)s -P %(dbport)s -u root -p mysql mysql> CREATE DATABASE %(dbname)s DEFAULT CHARACTER SET utf8; mysql> GRANT ALL PRIVILEGES ON %(dbname)s.* TO %(dbuser)s@%(webhost)s IDENTIFIED BY '%(dbpass)s'; mysql> QUIT The values printed above were detected from your configuration. If they are not right, then please edit your invenio-local.conf file and rerun 'inveniocfg --update-all' first. If the problem is of different nature, then please inspect the above error message and fix the problem before continuing.""" % \ {'errno': err.args[0], 'errmsg': err.args[1], 'dbname': CFG_DATABASE_NAME, 'dbhost': CFG_DATABASE_HOST, 'dbport': CFG_DATABASE_PORT, 'dbuser': CFG_DATABASE_USER, 'dbpass': CFG_DATABASE_PASS, 'webhost': CFG_DATABASE_HOST == 'localhost' and 'localhost' or os.popen('hostname -f', 'r').read().strip(), }) sys.exit(1) print "ok" ## second, test insert/select of a Unicode string to detect ## possible Python/MySQL/MySQLdb mis-setup: print "Testing Python/MySQL/MySQLdb UTF-8 chain...", try: beta_in_utf8 = "β" # Greek beta in UTF-8 is 0xCEB2 run_sql("CREATE TEMPORARY TABLE test__invenio__utf8 (x char(1), y varbinary(2)) DEFAULT CHARACTER SET utf8") run_sql("INSERT INTO test__invenio__utf8 (x, y) VALUES (%s, %s)", (beta_in_utf8, beta_in_utf8)) res = run_sql("SELECT x,y,HEX(x),HEX(y),LENGTH(x),LENGTH(y),CHAR_LENGTH(x),CHAR_LENGTH(y) FROM test__invenio__utf8") assert res[0] == ('\xce\xb2', '\xce\xb2', 'CEB2', 'CEB2', 2L, 2L, 1L, 2L) run_sql("DROP TEMPORARY TABLE test__invenio__utf8") except Exception, err: print wrap_text_in_a_box("""\ DATABASE RELATED ERROR %s\n A problem was detected with the UTF-8 treatment in the chain between the Python application, the MySQLdb connector, and the MySQL database. You may perhaps have installed older versions of some prerequisite packages?\n Please check the INSTALL file and please fix this problem before continuing.""" % err) sys.exit(1) print "ok" def cli_cmd_create_tables(conf): """Create and fill Invenio DB tables. Useful for the installation process.""" print ">>> Going to create and fill tables..." from invenio.config import CFG_PREFIX test_db_connection() for cmd in ["%s/bin/dbexec < %s/lib/sql/invenio/tabcreate.sql" % (CFG_PREFIX, CFG_PREFIX), "%s/bin/dbexec < %s/lib/sql/invenio/tabfill.sql" % (CFG_PREFIX, CFG_PREFIX)]: if os.system(cmd): print "ERROR: failed execution of", cmd sys.exit(1) cli_cmd_reset_sitename(conf) cli_cmd_reset_siteadminemail(conf) cli_cmd_reset_fieldnames(conf) for cmd in ["%s/bin/webaccessadmin -u admin -c -a" % CFG_PREFIX]: if os.system(cmd): print "ERROR: failed execution of", cmd sys.exit(1) print ">>> Tables created and filled successfully." def cli_cmd_load_webstat_conf(conf): print ">>> Going to load WebStat config..." from invenio.config import CFG_PREFIX cmd = "%s/bin/webstatadmin --load-config" % CFG_PREFIX if os.system(cmd): print "ERROR: failed execution of", cmd sys.exit(1) print ">>> WebStat config load successfully." def cli_cmd_drop_tables(conf): """Drop Invenio DB tables. Useful for the uninstallation process.""" print ">>> Going to drop tables..." from invenio.config import CFG_PREFIX from invenio.textutils import wrap_text_in_a_box, wait_for_user wait_for_user(wrap_text_in_a_box("""WARNING: You are going to destroy your database tables!""")) cmd = "%s/bin/dbexec < %s/lib/sql/invenio/tabdrop.sql" % (CFG_PREFIX, CFG_PREFIX) if os.system(cmd): print "ERROR: failed execution of", cmd sys.exit(1) print ">>> Tables dropped successfully." def cli_cmd_create_demo_site(conf): """Create demo site. Useful for testing purposes.""" print ">>> Going to create demo site..." from invenio.config import CFG_PREFIX from invenio.dbquery import run_sql run_sql("TRUNCATE schTASK") run_sql("TRUNCATE session") run_sql("DELETE FROM user WHERE email=''") for cmd in ["%s/bin/dbexec < %s/lib/sql/invenio/democfgdata.sql" % \ (CFG_PREFIX, CFG_PREFIX),]: if os.system(cmd): print "ERROR: failed execution of", cmd sys.exit(1) cli_cmd_reset_fieldnames(conf) # needed for I18N demo ranking method names for cmd in ["%s/bin/webaccessadmin -u admin -c -r -D" % CFG_PREFIX, "%s/bin/webcoll -u admin" % CFG_PREFIX, "%s/bin/webcoll 1" % CFG_PREFIX,]: if os.system(cmd): print "ERROR: failed execution of", cmd sys.exit(1) print ">>> Demo site created successfully." def cli_cmd_load_demo_records(conf): """Load demo records. Useful for testing purposes.""" from invenio.config import CFG_PREFIX from invenio.dbquery import run_sql print ">>> Going to load demo records..." run_sql("TRUNCATE schTASK") for cmd in ["%s/bin/bibupload -u admin -i %s/var/tmp/demobibdata.xml" % (CFG_PREFIX, CFG_PREFIX), "%s/bin/bibupload 1" % CFG_PREFIX, + "%s/bin/bibdocfile --textify --with-ocr --recid 97" % CFG_PREFIX, + "%s/bin/bibdocfile --textify --all" % CFG_PREFIX, "%s/bin/bibindex -u admin" % CFG_PREFIX, "%s/bin/bibindex 2" % CFG_PREFIX, "%s/bin/bibreformat -u admin -o HB" % CFG_PREFIX, "%s/bin/bibreformat 3" % CFG_PREFIX, "%s/bin/webcoll -u admin" % CFG_PREFIX, "%s/bin/webcoll 4" % CFG_PREFIX, "%s/bin/bibrank -u admin" % CFG_PREFIX, "%s/bin/bibrank 5" % CFG_PREFIX,]: if os.system(cmd): print "ERROR: failed execution of", cmd sys.exit(1) print ">>> Demo records loaded successfully." def cli_cmd_remove_demo_records(conf): """Remove demo records. Useful when you are finished testing.""" print ">>> Going to remove demo records..." from invenio.config import CFG_PREFIX from invenio.dbquery import run_sql from invenio.textutils import wrap_text_in_a_box, wait_for_user wait_for_user(wrap_text_in_a_box("""WARNING: You are going to destroy your records and documents!""")) if os.path.exists(CFG_PREFIX + os.sep + 'var' + os.sep + 'data'): shutil.rmtree(CFG_PREFIX + os.sep + 'var' + os.sep + 'data') run_sql("TRUNCATE schTASK") for cmd in ["%s/bin/dbexec < %s/lib/sql/invenio/tabbibclean.sql" % (CFG_PREFIX, CFG_PREFIX), "%s/bin/webcoll -u admin" % CFG_PREFIX, "%s/bin/webcoll 1" % CFG_PREFIX,]: if os.system(cmd): print "ERROR: failed execution of", cmd sys.exit(1) print ">>> Demo records removed successfully." def cli_cmd_drop_demo_site(conf): """Drop demo site completely. Useful when you are finished testing.""" print ">>> Going to drop demo site..." from invenio.textutils import wrap_text_in_a_box, wait_for_user wait_for_user(wrap_text_in_a_box("""WARNING: You are going to destroy your site and documents!""")) cli_cmd_drop_tables(conf) cli_cmd_create_tables(conf) cli_cmd_remove_demo_records(conf) print ">>> Demo site dropped successfully." def cli_cmd_run_unit_tests(conf): """Run unit tests, usually on the working demo site.""" from invenio.testutils import build_and_run_unit_test_suite build_and_run_unit_test_suite() def cli_cmd_run_regression_tests(conf): """Run regression tests, usually on the working demo site.""" from invenio.testutils import build_and_run_regression_test_suite build_and_run_regression_test_suite() def cli_cmd_run_web_tests(conf): """Run web tests in a browser. Requires Firefox with Selenium IDE extension.""" from invenio.testutils import build_and_run_web_test_suite build_and_run_web_test_suite() def _detect_ip_address(): """Detect IP address of this computer. Useful for creating Apache vhost conf snippet on RHEL like machines. @return: IP address, or '*' if cannot detect @rtype: string @note: creates socket for real in order to detect real IP address, not the loopback one. """ try: s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.connect(('cdsware.cern.ch', 0)) return s.getsockname()[0] except: return '*' def cli_cmd_create_apache_conf(conf): """ Create Apache conf files for this site, keeping previous files in a backup copy. """ print ">>> Going to create Apache conf files..." from invenio.textutils import wrap_text_in_a_box apache_conf_dir = conf.get("Invenio", 'CFG_ETCDIR') + \ os.sep + 'apache' xsendfile_directive_needed = int(conf.get("Invenio", 'CFG_BIBDOCFILE_USE_XSENDFILE')) != 0 ## Apache vhost conf file is distro specific, so analyze needs: # Gentoo (and generic defaults): listen_directive_needed = True ssl_pem_directive_needed = False ssl_pem_path = '/etc/apache2/ssl/apache.pem' ssl_crt_path = '/etc/apache2/ssl/server.crt' ssl_key_path = '/etc/apache2/ssl/server.key' vhost_ip_address_needed = False wsgi_socket_directive_needed = False # Debian: if os.path.exists(os.path.sep + 'etc' + os.path.sep + 'debian_version'): listen_directive_needed = False ssl_pem_directive_needed = True # RHEL/SLC: if os.path.exists(os.path.sep + 'etc' + os.path.sep + 'redhat-release'): listen_directive_needed = False ssl_crt_path = '/etc/pki/tls/certs/localhost.crt' ssl_key_path = '/etc/pki/tls/private/localhost.key' vhost_ip_address_needed = True wsgi_socket_directive_needed = True ## okay, let's create Apache vhost files: if not os.path.exists(apache_conf_dir): os.mkdir(apache_conf_dir) apache_vhost_file = apache_conf_dir + os.sep + \ 'invenio-apache-vhost.conf' apache_vhost_ssl_file = apache_conf_dir + os.sep + \ 'invenio-apache-vhost-ssl.conf' apache_vhost_body = """\ AddDefaultCharset UTF-8 ServerSignature Off ServerTokens Prod NameVirtualHost %(vhost_ip_address)s:80 %(listen_directive)s %(wsgi_socket_directive)s %(xsendfile_directive)s WSGIRestrictStdout Off WSGIImportScript %(wsgidir)s/invenio.wsgi process-group=invenio application-group=%%{GLOBAL} deny from all deny from all ServerName %(servername)s ServerAlias %(serveralias)s ServerAdmin %(serveradmin)s DocumentRoot %(webdir)s Options FollowSymLinks MultiViews AllowOverride None Order allow,deny Allow from all ErrorLog %(logdir)s/apache.err LogLevel warn CustomLog %(logdir)s/apache.log combined DirectoryIndex index.en.html index.html Alias /img/ %(webdir)s/img/ Alias /js/ %(webdir)s/js/ Alias /export/ %(webdir)s/export/ Alias /jsMath/ %(webdir)s/jsMath/ Alias /jsCalendar/ %(webdir)s/jsCalendar/ Alias /fckeditor/ %(webdir)s/fckeditor/ AliasMatch /sitemap-(.*) %(webdir)s/sitemap-$1 Alias /robots.txt %(webdir)s/robots.txt Alias /favicon.ico %(webdir)s/favicon.ico WSGIDaemonProcess invenio processes=5 threads=1 display-name=%%{GROUP} inactivity-timeout=3600 maximum-requests=10000 WSGIScriptAlias / %(wsgidir)s/invenio.wsgi WSGIPassAuthorization On WSGIProcessGroup invenio WSGIApplicationGroup %%{GLOBAL} Options FollowSymLinks MultiViews AllowOverride None Order allow,deny Allow from all """ % {'servername': conf.get('Invenio', 'CFG_SITE_URL').replace("http://", ""), 'serveralias': conf.get('Invenio', 'CFG_SITE_URL').replace("http://", "").split('.')[0], 'serveradmin': conf.get('Invenio', 'CFG_SITE_ADMIN_EMAIL'), 'webdir': conf.get('Invenio', 'CFG_WEBDIR'), 'logdir': conf.get('Invenio', 'CFG_LOGDIR'), 'libdir' : conf.get('Invenio', 'CFG_PYLIBDIR'), 'wsgidir': os.path.join(conf.get('Invenio', 'CFG_PREFIX'), 'var', 'www-wsgi'), 'vhost_ip_address': vhost_ip_address_needed and _detect_ip_address() or '*', 'listen_directive': listen_directive_needed and 'Listen 80' or '#Listen 80', 'wsgi_socket_directive': (wsgi_socket_directive_needed and \ 'WSGISocketPrefix ' or '#WSGISocketPrefix ') + \ conf.get('Invenio', 'CFG_PREFIX') + os.sep + 'var' + os.sep + 'run', 'xsendfile_directive' : xsendfile_directive_needed and \ "XSendFile On\nXSendFileAllowAbove On" or \ "#XSendFile On\n#XSendFileAllowAbove On", } apache_vhost_ssl_body = """\ ServerSignature Off ServerTokens Prod %(listen_directive)s NameVirtualHost %(vhost_ip_address)s:443 %(ssl_pem_directive)s %(ssl_crt_directive)s %(ssl_key_directive)s %(xsendfile_directive)s WSGIRestrictStdout Off deny from all deny from all ServerName %(servername)s ServerAlias %(serveralias)s ServerAdmin %(serveradmin)s SSLEngine on DocumentRoot %(webdir)s Options FollowSymLinks MultiViews AllowOverride None Order allow,deny Allow from all ErrorLog %(logdir)s/apache-ssl.err LogLevel warn CustomLog %(logdir)s/apache-ssl.log combined DirectoryIndex index.en.html index.html Alias /img/ %(webdir)s/img/ Alias /js/ %(webdir)s/js/ Alias /export/ %(webdir)s/export/ Alias /jsMath/ %(webdir)s/jsMath/ Alias /jsCalendar/ %(webdir)s/jsCalendar/ Alias /fckeditor/ %(webdir)s/fckeditor/ AliasMatch /sitemap-(.*) %(webdir)s/sitemap-$1 Alias /robots.txt %(webdir)s/robots.txt Alias /favicon.ico %(webdir)s/favicon.ico WSGIScriptAlias / %(wsgidir)s/invenio.wsgi WSGIPassAuthorization On WSGIProcessGroup invenio WSGIApplicationGroup %%{GLOBAL} Options FollowSymLinks MultiViews AllowOverride None Order allow,deny Allow from all """ % {'servername': conf.get('Invenio', 'CFG_SITE_SECURE_URL').replace("https://", ""), 'serveralias': conf.get('Invenio', 'CFG_SITE_SECURE_URL').replace("https://", "").split('.')[0], 'serveradmin': conf.get('Invenio', 'CFG_SITE_ADMIN_EMAIL'), 'webdir': conf.get('Invenio', 'CFG_WEBDIR'), 'logdir': conf.get('Invenio', 'CFG_LOGDIR'), 'libdir' : conf.get('Invenio', 'CFG_PYLIBDIR'), 'wsgidir' : os.path.join(conf.get('Invenio', 'CFG_PREFIX'), 'var', 'www-wsgi'), 'vhost_ip_address': vhost_ip_address_needed and _detect_ip_address() or '*', 'listen_directive' : listen_directive_needed and 'Listen 443' or '#Listen 443', 'ssl_pem_directive': ssl_pem_directive_needed and \ 'SSLCertificateFile %s' % ssl_pem_path or \ '#SSLCertificateFile %s' % ssl_pem_path, 'ssl_crt_directive': ssl_pem_directive_needed and \ '#SSLCertificateFile %s' % ssl_crt_path or \ 'SSLCertificateFile %s' % ssl_crt_path, 'ssl_key_directive': ssl_pem_directive_needed and \ '#SSLCertificateKeyFile %s' % ssl_key_path or \ 'SSLCertificateKeyFile %s' % ssl_key_path, 'xsendfile_directive' : xsendfile_directive_needed and \ "XSendFile On\nXSendFileAllowAbove On" or \ "#XSendFile On\n#XSendFileAllowAbove On", } # write HTTP vhost snippet: if os.path.exists(apache_vhost_file): shutil.copy(apache_vhost_file, apache_vhost_file + '.OLD') fdesc = open(apache_vhost_file, 'w') fdesc.write(apache_vhost_body) fdesc.close() print print "Created file", apache_vhost_file # write HTTPS vhost snippet: vhost_ssl_created = False if conf.get('Invenio', 'CFG_SITE_SECURE_URL') != \ conf.get('Invenio', 'CFG_SITE_URL'): if os.path.exists(apache_vhost_ssl_file): shutil.copy(apache_vhost_ssl_file, apache_vhost_ssl_file + '.OLD') fdesc = open(apache_vhost_ssl_file, 'w') fdesc.write(apache_vhost_ssl_body) fdesc.close() vhost_ssl_created = True print "Created file", apache_vhost_ssl_file print wrap_text_in_a_box("""\ Apache virtual host configuration file(s) for your Invenio site was(were) created. Please check created file(s) and activate virtual host(s). For example, you can put the following include statements in your httpd.conf:\n Include %s %s Please see the INSTALL file for more details. """ % (apache_vhost_file, (vhost_ssl_created and 'Include ' or '#Include ') + apache_vhost_ssl_file)) print ">>> Apache conf files created." def cli_cmd_get(conf, varname): """ Return value of VARNAME read from CONF files. Useful for third-party programs to access values of conf options such as CFG_PREFIX. Return None if VARNAME is not found. """ # do not pay attention to upper/lower case: varname = varname.lower() # do not pay attention to section names yet: all_options = {} for section in conf.sections(): for option in conf.options(section): all_options[option] = conf.get(section, option) return all_options.get(varname, None) def cli_cmd_list(conf): """ Print a list of all conf options and values from CONF. """ sections = conf.sections() sections.sort() for section in sections: options = conf.options(section) options.sort() for option in options: print option.upper(), '=', conf.get(section, option) def _grep_version_from_executable(path_to_exec, version_regexp): """ Try to detect a program version by digging into its binary PATH_TO_EXEC and looking for VERSION_REGEXP. Return program version as a string. Return empty string if not succeeded. """ from invenio.shellutils import run_shell_command exec_version = "" if os.path.exists(path_to_exec): dummy1, cmd2_out, dummy2 = run_shell_command("strings %s | grep %s", (path_to_exec, version_regexp)) if cmd2_out: for cmd2_out_line in cmd2_out.split("\n"): if len(cmd2_out_line) > len(exec_version): # the longest the better exec_version = cmd2_out_line return exec_version def detect_apache_version(): """ Try to detect Apache version by localizing httpd or apache executables and grepping inside binaries. Return list of all found Apache versions and paths. (For a given executable, the returned format is 'apache_version [apache_path]'.) Return empty list if no success. """ from invenio.shellutils import run_shell_command out = [] dummy1, cmd_out, dummy2 = run_shell_command("locate bin/httpd bin/apache") for apache in cmd_out.split("\n"): apache_version = _grep_version_from_executable(apache, '^Apache\/') if apache_version: out.append("%s [%s]" % (apache_version, apache)) return out def cli_cmd_detect_system_details(conf): """ Detect and print system details such as Apache/Python/MySQL versions etc. Useful for debugging problems on various OS. """ import MySQLdb print ">>> Going to detect system details..." print "* Hostname: " + socket.gethostname() print "* Invenio version: " + conf.get("Invenio", "CFG_VERSION") print "* Python version: " + sys.version.replace("\n", " ") print "* Apache version: " + ";\n ".join(detect_apache_version()) print "* MySQLdb version: " + MySQLdb.__version__ try: from invenio.dbquery import run_sql print "* MySQL version:" for key, val in run_sql("SHOW VARIABLES LIKE 'version%'") + \ run_sql("SHOW VARIABLES LIKE 'charact%'") + \ run_sql("SHOW VARIABLES LIKE 'collat%'"): if False: print " - %s: %s" % (key, val) elif key in ['version', 'character_set_client', 'character_set_connection', 'character_set_database', 'character_set_results', 'character_set_server', 'character_set_system', 'collation_connection', 'collation_database', 'collation_server']: print " - %s: %s" % (key, val) except ImportError: print "* ERROR: cannot import dbquery" print ">>> System details detected successfully." def main(): """Main entry point.""" conf = ConfigParser() if '--help' in sys.argv or \ '-h' in sys.argv: print_usage() elif '--version' in sys.argv or \ '-V' in sys.argv: print_version() else: confdir = None if '--conf-dir' in sys.argv: try: confdir = sys.argv[sys.argv.index('--conf-dir') + 1] except IndexError: pass # missing --conf-dir argument value if not os.path.exists(confdir): print "ERROR: bad or missing --conf-dir option value." sys.exit(1) else: ## try to detect path to conf dir (relative to this bin dir): confdir = re.sub(r'/bin$', '/etc', sys.path[0]) ## read conf files: for conffile in [confdir + os.sep + 'invenio.conf', confdir + os.sep + 'invenio-autotools.conf', confdir + os.sep + 'invenio-local.conf',]: if os.path.exists(conffile): conf.read(conffile) else: if not conffile.endswith("invenio-local.conf"): # invenio-local.conf is optional, otherwise stop print "ERROR: Badly guessed conf file location", conffile print "(Please use --conf-dir option.)" sys.exit(1) ## decide what to do: done = False for opt_idx in range(0, len(sys.argv)): opt = sys.argv[opt_idx] if opt == '--conf-dir': # already treated before, so skip silently: pass elif opt == '--get': try: varname = sys.argv[opt_idx + 1] except IndexError: print "ERROR: bad or missing --get option value." sys.exit(1) if varname.startswith('-'): print "ERROR: bad or missing --get option value." sys.exit(1) varvalue = cli_cmd_get(conf, varname) if varvalue is not None: print varvalue else: sys.exit(1) done = True elif opt == '--list': cli_cmd_list(conf) done = True elif opt == '--detect-system-details': cli_cmd_detect_system_details(conf) done = True elif opt == '--create-tables': cli_cmd_create_tables(conf) done = True elif opt == '--load-webstat-conf': cli_cmd_load_webstat_conf(conf) done = True elif opt == '--drop-tables': cli_cmd_drop_tables(conf) done = True + elif opt == '--check-openoffice-dir': + cli_check_openoffice_dir(conf) + done = True elif opt == '--create-demo-site': cli_cmd_create_demo_site(conf) done = True elif opt == '--load-demo-records': cli_cmd_load_demo_records(conf) done = True elif opt == '--remove-demo-records': cli_cmd_remove_demo_records(conf) done = True elif opt == '--drop-demo-site': cli_cmd_drop_demo_site(conf) done = True elif opt == '--run-unit-tests': cli_cmd_run_unit_tests(conf) done = True elif opt == '--run-regression-tests': cli_cmd_run_regression_tests(conf) done = True elif opt == '--run-web-tests': cli_cmd_run_web_tests(conf) done = True elif opt == '--update-all': cli_cmd_update_config_py(conf) cli_cmd_update_dbquery_py(conf) cli_cmd_update_dbexec(conf) cli_cmd_update_bibconvert_tpl(conf) cli_cmd_update_web_tests(conf) done = True elif opt == '--update-config-py': cli_cmd_update_config_py(conf) done = True elif opt == '--update-dbquery-py': cli_cmd_update_dbquery_py(conf) done = True elif opt == '--update-dbexec': cli_cmd_update_dbexec(conf) done = True elif opt == '--update-bibconvert-tpl': cli_cmd_update_bibconvert_tpl(conf) done = True elif opt == '--update-web-tests': cli_cmd_update_web_tests(conf) done = True elif opt == '--reset-all': cli_cmd_reset_sitename(conf) cli_cmd_reset_siteadminemail(conf) cli_cmd_reset_fieldnames(conf) cli_cmd_reset_recstruct_cache(conf) done = True elif opt == '--reset-sitename': cli_cmd_reset_sitename(conf) done = True elif opt == '--reset-siteadminemail': cli_cmd_reset_siteadminemail(conf) done = True elif opt == '--reset-fieldnames': cli_cmd_reset_fieldnames(conf) done = True elif opt == '--reset-recstruct-cache': cli_cmd_reset_recstruct_cache(conf) done = True elif opt == '--create-apache-conf': cli_cmd_create_apache_conf(conf) done = True elif opt.startswith("-") and opt != '--yes-i-know': print "ERROR: unknown option", opt sys.exit(1) if not done: print """ERROR: Please specify a command. Please see '--help'.""" sys.exit(1) if __name__ == '__main__': main() diff --git a/modules/miscutil/lib/shellutils.py b/modules/miscutil/lib/shellutils.py index fcb40dc9c..14e88718e 100644 --- a/modules/miscutil/lib/shellutils.py +++ b/modules/miscutil/lib/shellutils.py @@ -1,140 +1,259 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ The shellutils module contains helper functions useful for interacting with the operating system shell. The main API functions are: - run_shell_command() """ import os import tempfile +import time +import signal + +try: + import subprocess + from invenio.asyncproc import Timeout, with_timeout, Process + CFG_HAS_SUBPROCESS = True +except ImportError: + CFG_HAS_SUBPROCESS = False + +from invenio.config import CFG_MISCUTIL_DEFAULT_PROCESS_TIMEOUT + +__all__ = ['run_shell_command', 'run_process_with_timeout', 'Timeout'] + +""" +This module implements two functions: + - L{run_shell_command} + - L{run_process_with_timeout} + +L{run_shell_command} will run a command through a shell, capturing its +standard output and standard error. + +L{run_process_with_timeout} will run a process on its own allowing to +specify a input file, capturing the standard output and standard error and +killing the process after a given timeout. +""" def run_shell_command(cmd, args=None, filename_out=None, filename_err=None): """Run operating system command cmd with arguments from the args tuple in a sub-shell and return tuple (exit status code, stdout info, stderr info). @param cmd: Command to execute in a shell; may contain %s placeholders for arguments that will be expanded from the args tuple. Example: cmd='echo %s', args = ('hello',). @type cmd: string @param args: Arguments to be escaped and substituted for %s placeholders in cmd. @type args: tuple of strings @param filename_out: Desired filename for stdout output (optional; see below). @type filename_out: string @param filename_err: Desired filename for stderr output (optional; see below). @type filename_err: string @return: Tuple (exit code, string containing stdout output buffer, string containing stderr output buffer). However, if either filename_out or filename_err are defined, then the output buffers are not passed back but rather written into filename_out/filename_err pathnames. This is useful for commands that produce big files, for which it is not practical to pass results back to the callers in a Python text buffer. Note that it is the client's responsibility to name these files in the proper fashion (e.g. to be unique) and to close these files after use. @rtype: (number, string, string) @raise TypeError: if the number of args does not correspond to the number of placeholders in cmd. @note: Uses temporary files to store out/err output, not pipes due to potential pipe race condition on some systems. If either filename_out or filename_err are defined, then do not create temporary files, but store stdout or stderr output directly in these files instead, and do not delete them after execution. """ # wash args value: if args: args = tuple(args) else: args = () # construct command with argument substitution: try: cmd = cmd % tuple([escape_shell_arg(x) for x in args]) except TypeError: # there were problems with %s and args substitution, so raise an error: raise cmd_out = '' cmd_err = '' # create files: if filename_out: cmd_out_fd = os.open(filename_out, os.O_CREAT, 0644) file_cmd_out = filename_out else: cmd_out_fd, file_cmd_out = \ tempfile.mkstemp("invenio-shellutils-cmd-out") if filename_err: cmd_err_fd = os.open(filename_err, os.O_CREAT, 0644) file_cmd_err = filename_err else: cmd_err_fd, file_cmd_err = \ tempfile.mkstemp("invenio-shellutils-cmd-err") # run command: cmd_exit_code = os.system("%s > %s 2> %s" % (cmd, file_cmd_out, file_cmd_err)) # delete temporary files: (if applicable) if not filename_out: if os.path.exists(file_cmd_out): cmd_out_fo = open(file_cmd_out) cmd_out = cmd_out_fo.read() cmd_out_fo.close() os.remove(file_cmd_out) if not filename_err: if os.path.exists(file_cmd_err): cmd_err_fo = open(file_cmd_err) cmd_err = cmd_err_fo.read() cmd_err_fo.close() os.remove(file_cmd_err) os.close(cmd_out_fd) os.close(cmd_err_fd) # return results: return cmd_exit_code, cmd_out, cmd_err +def run_process_with_timeout(args, filename_in=None, filename_out=None, filename_err=None, cwd=None, timeout=CFG_MISCUTIL_DEFAULT_PROCESS_TIMEOUT): + """ + Run a process capturing its output and killing it after a given timeout. + + @param args: should be a string, or a sequence of program arguments. The + program to execute is the first item in the args sequence or the string + if a string is given. + @type args: string/sequence + @param filename_in: the path of a file to be used as standard input to + the process. If None, the process will receive no standard input. + @type filename_in: string + @param filename_out: the path of a file to be used as standard output from + the process. If None, the process standard output will still be + captured and returned. + @type filename_out: string + @param filename_err: the path of a file to be used as standard error from + the process. If None, the process standard error will still be + captured and returned. + @type filename_err: string + @param timeout: the number of seconds after which the process is killed. + @type timeout: int + @param cwd: the current working directory where to execute the process. + @type cwd: string + @return: a tuple containing with the exit status, the captured output and + the captured error. + @rtype: tuple + @raise Timeout: in case the process is still in execution after the + specified timeout. + @note: that if C{Timeout} exception is raised and cmd_out_file/cmd_err_file + have been specified, they will be probably partially filled. + @warning: in case Python 2.3 is used and the subprocess module is not + available this function will try to fallback on L{run_shell_command}, + provided that no C{cmd_in_file} parameter is filled. + """ + def call_the_process(the_process, stdout, stderr): + cmd_out = '' + cmd_err = '' + while True: + time.sleep(1) + poll = the_process.wait(os.WNOHANG) + tmp_cmd_out, tmp_cmd_err = the_process.readboth() + if stdout: + stdout.write(tmp_cmd_out) + if stderr: + stderr.write(tmp_cmd_err) + cmd_out += tmp_cmd_out + cmd_err += tmp_cmd_err + if poll != None: + break + return poll, cmd_out, cmd_err + + if not CFG_HAS_SUBPROCESS: + ## Let's fall back on run_shell_command. + if filename_in is not None: + raise ImportError, "Failed to import subprocess module and " \ + "run_process_with_timeout with cmd_in_file set, thus can not " \ + "fall back on run_shell_command." + if cwd: + cwd_str = "cd %s; " % escape_shell_arg(cwd) + else: + cwd_str = '' + return run_shell_command(cwd_str + ('%s ' * len(args))[:-1], args, filename_out=filename_out, filename_err=filename_err) + + if filename_in is not None: + stdin = open(filename_in) + else: + stdin = None + if filename_out is not None: + stdout = open(filename_out, 'w') + else: + stdout = None + if filename_err is not None: + stderr = open(filename_err, 'w') + else: + stderr = None + the_process = Process(args, stdin=stdin, cwd=cwd) + try: + return with_timeout(timeout, call_the_process, the_process, stdout, stderr) + except Timeout: + ## the_process.terminate() + ## FIXME: the_process.terminate() would rather be a better + ## solution, but apparently it does not work. When signal.SIGTERM + ## is sent to the process the wait operation down there does + ## not respect any timeout and will wait until the very end + ## of the process. So the afterwards SIGKILL will not find any + ## process... So let's send SIGTERM/SIGKILL directly here without + ## waiting anything. Anyway we are not interested in the outcome + ## of a timeouted process, we just want to kill it!! + the_process.kill(signal.SIGTERM) + time.sleep(1) + the_process.kill(signal.SIGKILL) + raise + def escape_shell_arg(shell_arg): """Escape shell argument shell_arg by placing it within single-quotes. Any single quotes found within the shell argument string will be escaped. @param shell_arg: The shell argument to be escaped. @type shell_arg: string @return: The single-quote-escaped value of the shell argument. @rtype: string @raise TypeError: if shell_arg is not a string. @see: U{http://mail.python.org/pipermail/python-list/2005-October/346957.html} """ if type(shell_arg) is not str: msg = "ERROR: escape_shell_arg() expected string argument but " \ "got '%s' of type '%s'." % (repr(shell_arg), type(shell_arg)) raise TypeError(msg) return "'%s'" % shell_arg.replace("'", r"'\''") diff --git a/modules/miscutil/lib/shellutils_tests.py b/modules/miscutil/lib/shellutils_tests.py index 36e3df7f2..f0ce79e15 100644 --- a/modules/miscutil/lib/shellutils_tests.py +++ b/modules/miscutil/lib/shellutils_tests.py @@ -1,116 +1,149 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Unit tests for shellutils library.""" __revision__ = "$Id$" import unittest +import time +import os -from invenio.shellutils import escape_shell_arg, run_shell_command +from invenio.config import CFG_TMPDIR + +from invenio.shellutils import escape_shell_arg, run_shell_command, \ + run_process_with_timeout, Timeout from invenio.testutils import make_test_suite, run_test_suite class EscapeShellArgTest(unittest.TestCase): """Testing of escaping shell arguments.""" def test_escape_simple(self): """shellutils - escaping simple strings""" self.assertEqual("'hello'", escape_shell_arg("hello")) def test_escape_backtick(self): """shellutils - escaping strings containing backticks""" self.assertEqual(r"'hello `world`'", escape_shell_arg(r'hello `world`')) def test_escape_quoted(self): """shellutils - escaping strings containing single quotes""" self.assertEqual("'hello'\\''world'", escape_shell_arg("hello'world")) def test_escape_double_quoted(self): """shellutils - escaping strings containing double-quotes""" self.assertEqual("""'"hello world"'""", escape_shell_arg('"hello world"')) def test_escape_complex_quoted(self): """shellutils - escaping strings containing complex quoting""" self.assertEqual(r"""'"Who is this `Eve'\'', Bob?", asked Alice.'""", escape_shell_arg(r""""Who is this `Eve', Bob?", asked Alice.""")) def test_escape_windows_style_path(self): """shellutils - escaping strings containing windows-style file paths""" self.assertEqual(r"'C:\Users\Test User\My Documents" \ "\funny file name (for testing).pdf'", escape_shell_arg(r'C:\Users\Test User\My Documents' \ '\funny file name (for testing).pdf')) def test_escape_unix_style_path(self): """shellutils - escaping strings containing unix-style file paths""" self.assertEqual(r"'/tmp/z_temp.txt'", escape_shell_arg(r'/tmp/z_temp.txt')) def test_escape_number_sign(self): """shellutils - escaping strings containing the number sign""" self.assertEqual(r"'Python comments start with #.'", escape_shell_arg(r'Python comments start with #.')) def test_escape_ampersand(self): """shellutils - escaping strings containing ampersand""" self.assertEqual(r"'Today the weather is hot & sunny'", escape_shell_arg(r'Today the weather is hot & sunny')) def test_escape_greater_than(self): """shellutils - escaping strings containing the greater-than sign""" self.assertEqual(r"'10 > 5'", escape_shell_arg(r'10 > 5')) def test_escape_less_than(self): """shellutils - escaping strings containing the less-than sign""" self.assertEqual(r"'5 < 10'", escape_shell_arg(r'5 < 10')) class RunShellCommandTest(unittest.TestCase): """Testing of running shell commands.""" def test_run_cmd_hello(self): """shellutils - running simple command""" self.assertEqual((0, "hello world\n", ''), run_shell_command("echo 'hello world'")) def test_run_cmd_hello_args(self): """shellutils - running simple command with an argument""" self.assertEqual((0, "hello world\n", ''), run_shell_command("echo 'hello %s'", ("world",))) def test_run_cmd_hello_quote(self): """shellutils - running simple command with an argument with quote""" self.assertEqual((0, "hel'lo world\n", ''), run_shell_command("echo %s %s", ("hel'lo", "world",))) def test_run_cmd_errorneous(self): """shellutils - running wrong command should raise an exception""" self.assertRaises(TypeError, run_shell_command, "echo %s %s %s", ("hello", "world",)) +class RunProcessWithTimeoutTest(unittest.TestCase): + """Testing of running a process with timeout.""" + def setUp(self): + self.script_path = os.path.join(CFG_TMPDIR, 'test_sleeping.sh') + script = open(self.script_path, 'w') + print >> script, "#!/bin/sh" + print >> script, "date" + print >> script, "sleep $1" + print >> script, "date" + script.close() + os.chmod(self.script_path, 0700) + + def tearDown(self): + os.remove(self.script_path) + + def test_run_cmd_timeout(self): + """shellutils - running simple command with expiring timeout""" + t1 = time.time() + self.assertRaises(Timeout, run_process_with_timeout, (self.script_path, '15'), timeout=5) + self.failUnless(time.time() - t1 < 7) + + def test_run_cmd_no_timeout(self): + """shellutils - running simple command with non expiring timeout""" + t1 = time.time() + self.assertEqual(3, len(run_process_with_timeout((self.script_path, '5'), timeout=15)[1].split('\n'))) + self.failUnless(time.time() - t1 < 7) + TEST_SUITE = make_test_suite(EscapeShellArgTest, - RunShellCommandTest,) + RunShellCommandTest, + RunProcessWithTimeoutTest) if __name__ == "__main__": run_test_suite(TEST_SUITE) diff --git a/modules/miscutil/lib/textutils.py b/modules/miscutil/lib/textutils.py index 60c80f171..4569b0a80 100644 --- a/modules/miscutil/lib/textutils.py +++ b/modules/miscutil/lib/textutils.py @@ -1,319 +1,341 @@ # -*- coding: utf-8 -*- ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ Functions useful for text wrapping (in a box) and indenting. """ __revision__ = "$Id$" import sys import re import textwrap +import invenio.template CFG_WRAP_TEXT_IN_A_BOX_STYLES = { '__DEFAULT' : { 'horiz_sep' : '*', 'max_col' : 72, 'min_col' : 40, 'tab_str' : ' ', 'tab_num' : 0, 'border' : ('**', '*', '**', '** ', ' **', '**', '*', '**'), 'prefix' : '\n', 'suffix' : '\n', 'break_long' : False, 'force_horiz' : False, }, 'squared' : { 'horiz_sep' : '-', 'border' : ('+', '-', '+', '| ', ' |', '+', '-', '+') }, 'double_sharp' : { 'horiz_sep' : '#', 'border' : ('##', '#', '##', '## ', ' ##', '##', '#', '##') }, 'single_sharp' : { 'horiz_sep' : '#', 'border' : ('#', '#', '#', '# ', ' #', '#', '#', '#') }, 'single_star' : { 'border' : ('*', '*', '*', '* ', ' *', '*', '*', '*',) }, 'double_star' : { }, 'no_border' : { 'horiz_sep' : '', 'border' : ('', '', '', '', '', '', '', ''), 'prefix' : '', 'suffix' : '' }, 'conclusion' : { 'border' : ('', '', '', '', '', '', '', ''), 'prefix' : '', 'horiz_sep' : '-', 'force_horiz' : True, }, 'important' : { 'tab_num' : 1, }, } def indent_text(text, nb_tabs=0, tab_str=" ", linebreak_input="\n", linebreak_output="\n", wrap=False): """ add tabs to each line of text @param text: the text to indent @param nb_tabs: number of tabs to add @param tab_str: type of tab (could be, for example "\t", default: 2 spaces @param linebreak_input: linebreak on input @param linebreak_output: linebreak on output @param wrap: wethever to apply smart text wrapping. (by means of wrap_text_in_a_box) @return: indented text as string """ if not wrap: lines = text.split(linebreak_input) tabs = nb_tabs*tab_str output = "" for line in lines: output += tabs + line + linebreak_output return output else: return wrap_text_in_a_box(body=text, style='no_border', tab_str=tab_str, tab_num=nb_tabs) _RE_BEGINNING_SPACES = re.compile(r'^\s*') _RE_NEWLINES_CLEANER = re.compile(r'\n+') _RE_LONELY_NEWLINES = re.compile(r'\b\n\b') def wrap_text_in_a_box(body='', title='', style='double_star', **args): """Return a nicely formatted text box: e.g. ****************** ** title ** **--------------** ** body ** ****************** Indentation and newline are respected. @param body: the main text @param title: an optional title @param style: the name of one of the style in CFG_WRAP_STYLES. By default the double_star style is used. You can further tune the desired style by setting various optional parameters: @param horiz_sep: a string that is repeated in order to produce a separator row between the title and the body (if needed) @param max_col: the maximum number of coulmns used by the box (including indentation) @param min_col: the symmetrical minimum number of columns @param tab_str: a string to represent indentation @param tab_num: the number of leveles of indentations @param border: a tuple of 8 element in the form (tl, t, tr, l, r, bl, b, br) of strings that represent the different corners and sides of the box @param prefix: a prefix string added before the box @param suffix: a suffix string added after the box @param break_long: wethever to break long words in order to respect max_col @param force_horiz: True in order to print the horizontal line even when there is no title e.g.: print wrap_text_in_a_box(title='prova', body=' 123 prova.\n Vediamo come si indenta', horiz_sep='-', style='no_border', max_col=20, tab_num=1) prova ---------------- 123 prova. Vediamo come si indenta """ def _wrap_row(row, max_col, break_long): """Wrap a single row""" spaces = _RE_BEGINNING_SPACES.match(row).group() row = row[len(spaces):] spaces = spaces.expandtabs() return textwrap.wrap(row, initial_indent=spaces, subsequent_indent=spaces, width=max_col, break_long_words=break_long) def _clean_newlines(text): text = _RE_LONELY_NEWLINES.sub(' \n', text) return _RE_NEWLINES_CLEANER.sub(lambda x: x.group()[:-1], text) body = unicode(body, 'utf-8') title = unicode(title, 'utf-8') astyle = dict(CFG_WRAP_TEXT_IN_A_BOX_STYLES['__DEFAULT']) if CFG_WRAP_TEXT_IN_A_BOX_STYLES.has_key(style): astyle.update(CFG_WRAP_TEXT_IN_A_BOX_STYLES[style]) astyle.update(args) horiz_sep = astyle['horiz_sep'] border = astyle['border'] tab_str = astyle['tab_str'] * astyle['tab_num'] max_col = max(astyle['max_col'] \ - len(border[3]) - len(border[4]) - len(tab_str), 1) min_col = astyle['min_col'] prefix = astyle['prefix'] suffix = astyle['suffix'] force_horiz = astyle['force_horiz'] break_long = astyle['break_long'] body = _clean_newlines(body) tmp_rows = [_wrap_row(row, max_col, break_long) for row in body.split('\n')] body_rows = [] for rows in tmp_rows: if rows: body_rows += rows else: body_rows.append('') if not ''.join(body_rows).strip(): # Concrete empty body body_rows = [] title = _clean_newlines(title) tmp_rows = [_wrap_row(row, max_col, break_long) for row in title.split('\n')] title_rows = [] for rows in tmp_rows: if rows: title_rows += rows else: title_rows.append('') if not ''.join(title_rows).strip(): # Concrete empty title title_rows = [] max_col = max([len(row) for row in body_rows + title_rows] + [min_col]) mid_top_border_len = max_col \ + len(border[3]) + len(border[4]) - len(border[0]) - len(border[2]) mid_bottom_border_len = max_col \ + len(border[3]) + len(border[4]) - len(border[5]) - len(border[7]) top_border = border[0] \ + (border[1] * mid_top_border_len)[:mid_top_border_len] + border[2] bottom_border = border[5] \ + (border[6] * mid_bottom_border_len)[:mid_bottom_border_len] \ + border[7] horiz_line = border[3] + (horiz_sep * max_col)[:max_col] + border[4] title_rows = [tab_str + border[3] + row + ' ' * (max_col - len(row)) + border[4] for row in title_rows] body_rows = [tab_str + border[3] + row + ' ' * (max_col - len(row)) + border[4] for row in body_rows] ret = [] if top_border: ret += [tab_str + top_border] ret += title_rows if title_rows or force_horiz: ret += [tab_str + horiz_line] ret += body_rows if bottom_border: ret += [tab_str + bottom_border] return (prefix + '\n'.join(ret) + suffix).encode('utf-8') def wait_for_user(msg=""): """ Print MSG and a confirmation prompt, waiting for user's confirmation, unless silent '--yes-i-know' command line option was used, in which case the function returns immediately without printing anything. """ if '--yes-i-know' in sys.argv: return print msg try: answer = raw_input("Please confirm by typing 'Yes, I know!': ") except KeyboardInterrupt: print answer = '' if answer != 'Yes, I know!': sys.stderr.write("ERROR: Aborted.\n") sys.exit(1) return def guess_minimum_encoding(text, charsets=('ascii', 'latin1', 'utf8')): """Try to guess the minimum charset that is able to represent the given text using the provided charsets. text is supposed to be encoded in utf8. Returns (encoded_text, charset) where charset is the first charset in the sequence being able to encode text. Returns (text_in_utf8, 'utf8') in case no charset is able to encode text. @note: If the input text is not in strict UTF-8, then replace any non-UTF-8 chars inside it. """ text_in_unicode = text.decode('utf8', 'replace') for charset in charsets: try: return (text_in_unicode.encode(charset), charset) except (UnicodeEncodeError, UnicodeDecodeError): pass return (text_in_unicode.encode('utf8'), 'utf8') def encode_for_xml(text, wash=False, xml_version='1.0'): """Encodes special characters in a text so that it would be XML-compliant. @param text: text to encode @return: an encoded text""" text = text.replace('&', '&') text = text.replace('<', '<') if wash: text = wash_for_xml(text, xml_version='1.0') return text try: unichr(0x100000) RE_ALLOWED_XML_1_0_CHARS = re.compile(u'[^\U00000009\U0000000A\U0000000D\U00000020-\U0000D7FF\U0000E000-\U0000FFFD\U00010000-\U0010FFFF]') RE_ALLOWED_XML_1_1_CHARS = re.compile(u'[^\U00000001-\U0000D7FF\U0000E000-\U0000FFFD\U00010000-\U0010FFFF]') except ValueError: # oops, we are running on a narrow UTF/UCS Python build, # so we have to limit the UTF/UCS char range: RE_ALLOWED_XML_1_0_CHARS = re.compile(u'[^\U00000009\U0000000A\U0000000D\U00000020-\U0000D7FF\U0000E000-\U0000FFFD]') RE_ALLOWED_XML_1_1_CHARS = re.compile(u'[^\U00000001-\U0000D7FF\U0000E000-\U0000FFFD]') def wash_for_xml(text, xml_version='1.0'): """ Removes any character which is not in the range of allowed characters for XML. The allowed characters depends on the version of XML. - XML 1.0: - XML 1.1: @param text: input string to wash. @param xml_version: version of the XML for which we wash the input. Value for this parameter can be '1.0' or '1.1' """ if xml_version == '1.0': return RE_ALLOWED_XML_1_0_CHARS.sub('', unicode(text, 'utf-8')).encode('utf-8') else: return RE_ALLOWED_XML_1_1_CHARS.sub('', unicode(text, 'utf-8')).encode('utf-8') + +def nice_size(size): + """ + @param size: the size. + @type size: int + @return: a nicely printed size. + @rtype: string + """ + websearch_templates = invenio.template.load('websearch') + unit = 'B' + if size > 1024: + size /= 1024.0 + unit = 'KB' + if size > 1024: + size /= 1024.0 + unit = 'MB' + if size > 1024: + size /= 1024.0 + unit = 'GB' + return '%s %s' % (websearch_templates.tmpl_nice_number(size, max_ndigits_after_dot=2), unit) + diff --git a/modules/miscutil/sql/tabcreate.sql b/modules/miscutil/sql/tabcreate.sql index 656c49605..cce131c80 100644 --- a/modules/miscutil/sql/tabcreate.sql +++ b/modules/miscutil/sql/tabcreate.sql @@ -1,3441 +1,3652 @@ -- $Id$ -- This file is part of CDS Invenio. -- Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. -- -- CDS Invenio is free software; you can redistribute it and/or -- modify it under the terms of the GNU General Public License as -- published by the Free Software Foundation; either version 2 of the -- License, or (at your option) any later version. -- -- CDS Invenio is distributed in the hope that it will be useful, but -- WITHOUT ANY WARRANTY; without even the implied warranty of -- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -- General Public License for more details. -- -- You should have received a copy of the GNU General Public License -- along with CDS Invenio; if not, write to the Free Software Foundation, Inc., -- 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. -- tables for bibliographic records: CREATE TABLE IF NOT EXISTS bibrec ( id mediumint(8) unsigned NOT NULL auto_increment, creation_date datetime NOT NULL default '0000-00-00', modification_date datetime NOT NULL default '0000-00-00', PRIMARY KEY (id), KEY creation_date (creation_date), KEY modification_date (modification_date) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib00x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib01x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib02x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib03x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib04x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib05x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib06x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib07x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib08x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib09x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib10x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib11x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib12x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib13x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib14x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib15x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib16x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib17x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib18x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib19x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib20x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib21x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib22x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib23x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib24x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib25x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib26x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib27x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib28x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib29x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib30x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib31x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib32x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib33x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib34x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib35x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib36x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib37x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib38x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib39x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib40x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib41x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib42x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib43x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib44x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib45x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib46x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib47x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib48x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib49x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib50x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib51x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib52x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib53x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib54x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib55x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib56x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib57x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib58x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib59x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib60x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib61x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib62x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib63x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib64x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib65x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib66x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib67x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib68x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib69x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib70x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib71x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib72x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib73x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib74x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib75x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib76x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib77x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib78x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib79x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib80x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib81x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib82x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib83x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib84x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib85x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(100)) -- URLs need usually a larger index for speedy lookups ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib86x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib87x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib88x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib89x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib90x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib91x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib92x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib93x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib94x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib95x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib96x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib97x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib98x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bib99x ( id mediumint(8) unsigned NOT NULL auto_increment, tag varchar(6) NOT NULL default '', value text NOT NULL, PRIMARY KEY (id), KEY kt (tag), KEY kv (value(35)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib00x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib01x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib02x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib03x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib04x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib05x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib06x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib07x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib08x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib09x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib10x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib11x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib12x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib13x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib14x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib15x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib16x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib17x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib18x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib19x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib20x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib21x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib22x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib23x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib24x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib25x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib26x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib27x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib28x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib29x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib30x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib31x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib32x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib33x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib34x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib35x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib36x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib37x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib38x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib39x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib40x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib41x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib42x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib43x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib44x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib45x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib46x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib47x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib48x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib49x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib50x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib51x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib52x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib53x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib54x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib55x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib56x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib57x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib58x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib59x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib60x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib61x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib62x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib63x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib64x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib65x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib66x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib67x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib68x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib69x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib70x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib71x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib72x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib73x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib74x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib75x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib76x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib77x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib78x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib79x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib80x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib81x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib82x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib83x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib84x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib85x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib86x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib87x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib88x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib89x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib90x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib91x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib92x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib93x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib94x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib95x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib96x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib97x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib98x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bib99x ( id_bibrec mediumint(8) unsigned NOT NULL default '0', id_bibxxx mediumint(8) unsigned NOT NULL default '0', field_number smallint(5) unsigned default NULL, KEY id_bibxxx (id_bibxxx), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; -- tables for bibliographic records formatted: CREATE TABLE IF NOT EXISTS bibfmt ( id mediumint(8) unsigned NOT NULL auto_increment, id_bibrec int(8) unsigned NOT NULL default '0', format varchar(10) NOT NULL default '', last_updated datetime NOT NULL default '0000-00-00', value longblob, PRIMARY KEY (id), KEY id_bibrec (id_bibrec), KEY format (format) ) TYPE=MyISAM; -- tables for index files: CREATE TABLE IF NOT EXISTS idxINDEX ( id mediumint(9) unsigned NOT NULL, name varchar(50) NOT NULL default '', description varchar(255) NOT NULL default '', last_updated datetime NOT NULL default '0000-00-00 00:00:00', stemming_language varchar(10) NOT NULL default '', PRIMARY KEY (id), UNIQUE KEY name (name) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxINDEXNAME ( id_idxINDEX mediumint(9) unsigned NOT NULL, ln char(5) NOT NULL default '', type char(3) NOT NULL default 'sn', value varchar(255) NOT NULL, PRIMARY KEY (id_idxINDEX,ln,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxINDEX_field ( id_idxINDEX mediumint(9) unsigned NOT NULL, id_field mediumint(9) unsigned NOT NULL, regexp_punctuation varchar(255) NOT NULL default "[\.\,\:\;\?\!\"]", regexp_alphanumeric_separators varchar(255) NOT NULL default "[\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]", PRIMARY KEY (id_idxINDEX,id_field) ) TYPE=MyISAM; -- this comment line here is just to fix the SQL display mode in Emacs ' CREATE TABLE IF NOT EXISTS idxWORD01F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD01R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD02F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD02R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD03F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD03R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD04F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD04R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD05F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD05R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD06F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD06R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD07F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD07R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD08F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD08R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD09F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD09R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD10F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD10R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD11F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD11R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD12F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD12R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD13F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD13R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD14F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxWORD14R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; +CREATE TABLE IF NOT EXISTS idxPAIR01F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR01R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR02F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR02R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR03F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR03R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR04F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR04R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR05F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR05R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR06F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR06R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR07F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR07R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR08F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR08R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR09F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR09R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR10F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR10R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR11F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR11R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR12F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR12R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR13F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR13R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR14F ( + id mediumint(9) unsigned NOT NULL auto_increment, + term varchar(100) default NULL, + hitlist longblob, + PRIMARY KEY (id), + UNIQUE KEY term (term) +) TYPE=MyISAM; + +CREATE TABLE IF NOT EXISTS idxPAIR14R ( + id_bibrec mediumint(9) unsigned NOT NULL, + termlist longblob, + type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', + PRIMARY KEY (id_bibrec,type) +) TYPE=MyISAM; + CREATE TABLE IF NOT EXISTS idxPHRASE01F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE01R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE02F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE02R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE03F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE03R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE04F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE04R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE05F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE05R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE06F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE06R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE07F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE07R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE08F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE08R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE09F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE09R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE10F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE10R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE11F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE11R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE12F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE12R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE13F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE13R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE14F ( id mediumint(9) unsigned NOT NULL auto_increment, term text default NULL, hitlist longblob, PRIMARY KEY (id), KEY term (term(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS idxPHRASE14R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; -- tables for ranking: CREATE TABLE IF NOT EXISTS rnkMETHOD ( id mediumint(9) unsigned NOT NULL auto_increment, name varchar(20) NOT NULL default '', last_updated datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (id), UNIQUE KEY name (name) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS rnkMETHODNAME ( id_rnkMETHOD mediumint(9) unsigned NOT NULL, ln char(5) NOT NULL default '', type char(3) NOT NULL default 'sn', value varchar(255) NOT NULL, PRIMARY KEY (id_rnkMETHOD,ln,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS rnkMETHODDATA ( id_rnkMETHOD mediumint(9) unsigned NOT NULL, relevance_data longblob, PRIMARY KEY (id_rnkMETHOD) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS collection_rnkMETHOD ( id_collection mediumint(9) unsigned NOT NULL, id_rnkMETHOD mediumint(9) unsigned NOT NULL, score tinyint(4) unsigned NOT NULL default '0', PRIMARY KEY (id_collection,id_rnkMETHOD) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS rnkWORD01F ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS rnkWORD01R ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS rnkAUTHORDATA ( aterm varchar(50) default NULL, hitlist longblob, UNIQUE KEY aterm (aterm) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS rnkPAGEVIEWS ( id_bibrec mediumint(8) unsigned default NULL, id_user int(15) unsigned default '0', client_host int(10) unsigned default NULL, view_time datetime default '0000-00-00 00:00:00', KEY view_time (view_time), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS rnkDOWNLOADS ( id_bibrec mediumint(8) unsigned default NULL, download_time datetime default '0000-00-00 00:00:00', client_host int(10) unsigned default NULL, id_user int(15) unsigned default NULL, id_bibdoc mediumint(9) unsigned default NULL, file_version smallint(2) unsigned default NULL, file_format varchar(10) NULL default NULL, KEY download_time (download_time), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; -- a table for citations. record-cites-record CREATE TABLE IF NOT EXISTS rnkCITATIONDATA ( id mediumint(8) unsigned NOT NULL auto_increment, object_name varchar(255) NOT NULL, object_value longblob, last_updated datetime NOT NULL default '0000-00-00', PRIMARY KEY id (id), UNIQUE KEY object_name (object_name) ) TYPE=MyISAM; -- a table for missing citations. This should be scanned by a program -- occasionally to check if some publication has been cited more than -- 50 times (or such), and alert cataloguers to create record for that -- external citation -- -- id_bibrec is the id of the record. extcitepubinfo is publication info -- that looks in general like hep-th/0112088 CREATE TABLE IF NOT EXISTS rnkCITATIONDATAEXT ( id_bibrec int(8) unsigned, extcitepubinfo varchar(255) NOT NULL, PRIMARY KEY (id_bibrec, extcitepubinfo), KEY extcitepubinfo (extcitepubinfo) ) TYPE=MyISAM; -- tables for collections and collection tree: CREATE TABLE IF NOT EXISTS collection ( id mediumint(9) unsigned NOT NULL auto_increment, name varchar(255) NOT NULL, dbquery text, nbrecs int(10) unsigned default '0', reclist longblob, PRIMARY KEY (id), UNIQUE KEY name (name), KEY dbquery (dbquery(50)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS collectionname ( id_collection mediumint(9) unsigned NOT NULL, ln char(5) NOT NULL default '', type char(3) NOT NULL default 'sn', value varchar(255) NOT NULL, PRIMARY KEY (id_collection,ln,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS collection_collection ( id_dad mediumint(9) unsigned NOT NULL, id_son mediumint(9) unsigned NOT NULL, type char(1) NOT NULL default 'r', score tinyint(4) unsigned NOT NULL default '0', PRIMARY KEY (id_dad,id_son) ) TYPE=MyISAM; -- tables for OAI sets: CREATE TABLE IF NOT EXISTS oaiREPOSITORY ( id mediumint(9) unsigned NOT NULL auto_increment, setName varchar(255) NOT NULL default '', setSpec varchar(255) NOT NULL default '', setCollection varchar(255) NOT NULL default '', setDescription text NOT NULL default '', setDefinition text NOT NULL default '', setRecList longblob, p1 text NOT NULL default '', f1 text NOT NULL default '', m1 text NOT NULL default '', p2 text NOT NULL default '', f2 text NOT NULL default '', m2 text NOT NULL default '', p3 text NOT NULL default '', f3 text NOT NULL default '', m3 text NOT NULL default '', PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS oaiHARVEST ( id mediumint(9) unsigned NOT NULL auto_increment, baseurl varchar(255) NOT NULL default '', metadataprefix varchar(255) NOT NULL default 'oai_dc', arguments text, comment text, bibconvertcfgfile varchar(255), name varchar(255) NOT NULL, lastrun datetime, frequency mediumint(12) NOT NULL default '0', postprocess varchar(20) NOT NULL default 'h', bibfilterprogram varchar(255) NOT NULL default '', setspecs text NOT NULL default '', PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS oaiHARVESTLOG ( id_oaiHARVEST mediumint(9) unsigned NOT NULL REFERENCES oaiHARVEST, -- source we harvest from id_bibrec mediumint(8) unsigned NOT NULL default '0', -- internal record id ( filled by bibupload ) bibupload_task_id int NOT NULL default 0, -- bib upload task number oai_id varchar(40) NOT NULL default "", -- OAI record identifier we harvested date_harvested datetime NOT NULL default '0000-00-00', -- when we harvested date_inserted datetime NOT NULL default '0000-00-00', -- when it was inserted inserted_to_db char(1) NOT NULL default 'P', -- where it was inserted (P=prod, H=holding-pen, etc) PRIMARY KEY (bibupload_task_id, oai_id, date_harvested) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibHOLDINGPEN ( changeset_id INT NOT NULL AUTO_INCREMENT, -- the identifier of the changeset stored in the holding pen changeset_date datetime NOT NULL DEFAULT '0000:00:00 00:00:00', -- when was the changeset inserted changeset_xml TEXT NOT NULL DEFAULT '', oai_id varchar(40) NOT NULL DEFAULT '', -- OAI identifier of concerned record id_bibrec mediumint(8) unsigned NOT NULL default '0', -- record ID of concerned record (filled by bibupload) PRIMARY KEY (changeset_id), KEY changeset_date (changeset_date), KEY id_bibrec (id_bibrec) ) TYPE=MyISAM; -- tables for portal elements: CREATE TABLE IF NOT EXISTS collection_portalbox ( id_collection mediumint(9) unsigned NOT NULL, id_portalbox mediumint(9) unsigned NOT NULL, ln char(5) NOT NULL default '', position char(3) NOT NULL default 'top', score tinyint(4) unsigned NOT NULL default '0', PRIMARY KEY (id_collection,id_portalbox,ln) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS portalbox ( id mediumint(9) unsigned NOT NULL auto_increment, title text NOT NULL, body text NOT NULL, UNIQUE KEY id (id) ) TYPE=MyISAM; -- tables for search examples: CREATE TABLE IF NOT EXISTS collection_example ( id_collection mediumint(9) unsigned NOT NULL, id_example mediumint(9) unsigned NOT NULL, score tinyint(4) unsigned NOT NULL default '0', PRIMARY KEY (id_collection,id_example) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS example ( id mediumint(9) unsigned NOT NULL auto_increment, type text NOT NULL default '', body text NOT NULL, PRIMARY KEY (id) ) TYPE=MyISAM; -- tables for collection formats: CREATE TABLE IF NOT EXISTS collection_format ( id_collection mediumint(9) unsigned NOT NULL, id_format mediumint(9) unsigned NOT NULL, score tinyint(4) unsigned NOT NULL default '0', PRIMARY KEY (id_collection,id_format) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS format ( id mediumint(9) unsigned NOT NULL auto_increment, name varchar(255) NOT NULL, code varchar(6) NOT NULL, description varchar(255) default '', content_type varchar(255) default '', visibility tinyint NOT NULL default '1', PRIMARY KEY (id), UNIQUE KEY code (code) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS formatname ( id_format mediumint(9) unsigned NOT NULL, ln char(5) NOT NULL default '', type char(3) NOT NULL default 'sn', value varchar(255) NOT NULL, PRIMARY KEY (id_format,ln,type) ) TYPE=MyISAM; -- tables for collection detailed page options CREATE TABLE IF NOT EXISTS collectiondetailedrecordpagetabs ( id_collection mediumint(9) unsigned NOT NULL, tabs varchar(255) NOT NULL default '', PRIMARY KEY (id_collection) ) TYPE=MyISAM; -- tables for search options and MARC tags: CREATE TABLE IF NOT EXISTS collection_field_fieldvalue ( id_collection mediumint(9) unsigned NOT NULL, id_field mediumint(9) unsigned NOT NULL, id_fieldvalue mediumint(9) unsigned, type char(3) NOT NULL default 'src', score tinyint(4) unsigned NOT NULL default '0', score_fieldvalue tinyint(4) unsigned NOT NULL default '0', KEY id_collection (id_collection), KEY id_field (id_field), KEY id_fieldvalue (id_fieldvalue) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS field ( id mediumint(9) unsigned NOT NULL auto_increment, name varchar(255) NOT NULL, code varchar(255) NOT NULL, PRIMARY KEY (id), UNIQUE KEY code (code) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS fieldname ( id_field mediumint(9) unsigned NOT NULL, ln char(5) NOT NULL default '', type char(3) NOT NULL default 'sn', value varchar(255) NOT NULL, PRIMARY KEY (id_field,ln,type) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS fieldvalue ( id mediumint(9) unsigned NOT NULL auto_increment, name varchar(255) NOT NULL, value text NOT NULL, PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS field_tag ( id_field mediumint(9) unsigned NOT NULL, id_tag mediumint(9) unsigned NOT NULL, score tinyint(4) unsigned NOT NULL default '0', PRIMARY KEY (id_field,id_tag) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS tag ( id mediumint(9) unsigned NOT NULL auto_increment, name varchar(255) NOT NULL, value char(6) NOT NULL, PRIMARY KEY (id) ) TYPE=MyISAM; -- tables for file management CREATE TABLE IF NOT EXISTS bibdoc ( id mediumint(9) unsigned NOT NULL auto_increment, status varchar(50) NOT NULL default '', docname varchar(250) COLLATE utf8_bin NOT NULL default 'file', creation_date datetime NOT NULL default '0000-00-00', modification_date datetime NOT NULL default '0000-00-00', + text_extraction_date datetime NOT NULL default '0000-00-00', more_info mediumblob NULL default NULL, PRIMARY KEY (id), KEY docname (docname), KEY creation_date (creation_date), KEY modification_date (modification_date) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibrec_bibdoc ( id_bibrec mediumint(9) unsigned NOT NULL default '0', id_bibdoc mediumint(9) unsigned NOT NULL default '0', type varchar(255), KEY (id_bibrec), KEY (id_bibdoc) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bibdoc_bibdoc ( id_bibdoc1 mediumint(9) unsigned NOT NULL, id_bibdoc2 mediumint(9) unsigned NOT NULL, type varchar(255), KEY (id_bibdoc1), KEY (id_bibdoc2) ) TYPE=MyISAM; -- tables for publication requests: CREATE TABLE IF NOT EXISTS publreq ( id int(11) NOT NULL auto_increment, host varchar(255) NOT NULL default '', date varchar(255) NOT NULL default '', name varchar(255) NOT NULL default '', email varchar(255) NOT NULL default '', address text NOT NULL, publication text NOT NULL, PRIMARY KEY (id) ) TYPE=MyISAM; -- table for sessions and users: CREATE TABLE IF NOT EXISTS session ( session_key varchar(32) NOT NULL default '', session_expiry int(11) unsigned NOT NULL default '0', session_object blob, uid int(15) unsigned NOT NULL, UNIQUE KEY session_key (session_key), KEY uid (uid) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS user ( id int(15) unsigned NOT NULL auto_increment, email varchar(255) NOT NULL default '', password blob NOT NULL, note varchar(255) default NULL, settings blob default NULL, nickname varchar(255) NOT NULL default '', last_login datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY id (id), KEY email (email), KEY nickname (nickname) ) TYPE=MyISAM; -- tables for usergroups CREATE TABLE IF NOT EXISTS usergroup ( id int(15) unsigned NOT NULL auto_increment, name varchar(255) NOT NULL default '', description text default '', join_policy char(2) NOT NULL default '', login_method varchar(255) NOT NULL default 'INTERNAL', PRIMARY KEY (id), UNIQUE KEY login_method_name (login_method(70), name), KEY name (name) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS user_usergroup ( id_user int(15) unsigned NOT NULL default '0', id_usergroup int(15) unsigned NOT NULL default '0', user_status char(1) NOT NULL default '', user_status_date datetime NOT NULL default '0000-00-00 00:00:00', KEY id_user (id_user), KEY id_usergroup (id_usergroup) ) TYPE=MyISAM; -- tables for access control engine CREATE TABLE IF NOT EXISTS accROLE ( id int(15) unsigned NOT NULL auto_increment, name varchar(32), description varchar(255), firerole_def_ser blob NULL, firerole_def_src text NULL, PRIMARY KEY (id), UNIQUE KEY name (name) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS user_accROLE ( id_user int(15) unsigned NOT NULL, id_accROLE int(15) unsigned NOT NULL, expiration datetime NOT NULL default '9999-12-31 23:59:59', PRIMARY KEY (id_user, id_accROLE) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS accMAILCOOKIE ( id int(15) unsigned NOT NULL auto_increment, data blob NOT NULL, expiration datetime NOT NULL default '9999-12-31 23:59:59', kind varchar(32) NOT NULL, onetime boolean NOT NULL default 0, status char(1) NOT NULL default 'W', PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS accACTION ( id int(15) unsigned NOT NULL auto_increment, name varchar(32), description varchar(255), allowedkeywords varchar(255), optional ENUM ('yes', 'no') NOT NULL default 'no', PRIMARY KEY (id), UNIQUE KEY name (name) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS accARGUMENT ( id int(15) unsigned NOT NULL auto_increment, keyword varchar (32), value varchar(255), PRIMARY KEY (id), KEY KEYVAL (keyword, value) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS accROLE_accACTION_accARGUMENT ( id_accROLE int(15), id_accACTION int(15), id_accARGUMENT int(15), argumentlistid mediumint(8), KEY id_accROLE (id_accROLE), KEY id_accACTION (id_accACTION), KEY id_accARGUMENT (id_accARGUMENT) ) TYPE=MyISAM; -- tables for personal/collaborative features (baskets, alerts, searches, messages, usergroups): CREATE TABLE IF NOT EXISTS user_query ( id_user int(15) unsigned NOT NULL default '0', id_query int(15) unsigned NOT NULL default '0', hostname varchar(50) default 'unknown host', date datetime default NULL, KEY id_user (id_user,id_query) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS query ( id int(15) unsigned NOT NULL auto_increment, type char(1) NOT NULL default 'r', urlargs text NOT NULL, PRIMARY KEY (id), KEY urlargs (urlargs(100)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS user_query_basket ( id_user int(15) unsigned NOT NULL default '0', id_query int(15) unsigned NOT NULL default '0', id_basket int(15) unsigned NOT NULL default '0', frequency varchar(5) NOT NULL default '', date_creation date default NULL, date_lastrun date default '0000-00-00', alert_name varchar(30) NOT NULL default '', notification char(1) NOT NULL default 'y', PRIMARY KEY (id_user,id_query,frequency,id_basket), KEY alert_name (alert_name) ) TYPE=MyISAM; -- baskets CREATE TABLE IF NOT EXISTS bskBASKET ( id int(15) unsigned NOT NULL auto_increment, id_owner int(15) unsigned NOT NULL default '0', name varchar(50) NOT NULL default '', date_modification datetime NOT NULL default '0000-00-00 00:00:00', nb_views int(15) NOT NULL default '0', PRIMARY KEY (id), KEY id_owner (id_owner), KEY name (name) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bskREC ( id_bibrec_or_bskEXTREC int(16) NOT NULL default '0', id_bskBASKET int(15) unsigned NOT NULL default '0', id_user_who_added_item int(15) NOT NULL default '0', score int(15) NOT NULL default '0', date_added datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (id_bibrec_or_bskEXTREC,id_bskBASKET), KEY id_bibrec_or_bskEXTREC (id_bibrec_or_bskEXTREC), KEY id_bskBASKET (id_bskBASKET), KEY score (score), KEY date_added (date_added) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bskEXTREC ( id int(15) unsigned NOT NULL auto_increment, external_id int(15) NOT NULL default '0', collection_id int(15) unsigned NOT NULL default '0', original_url text, creation_date datetime NOT NULL default '0000-00-00 00:00:00', modification_date datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bskEXTFMT ( id int(15) unsigned NOT NULL auto_increment, id_bskEXTREC int(15) unsigned NOT NULL default '0', format varchar(10) NOT NULL default '', last_updated datetime NOT NULL default '0000-00-00 00:00:00', value longblob, PRIMARY KEY (id), KEY id_bskEXTREC (id_bskEXTREC), KEY format (format) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS user_bskBASKET ( id_user int(15) unsigned NOT NULL default '0', id_bskBASKET int(15) unsigned NOT NULL default '0', topic varchar(50) NOT NULL default '', PRIMARY KEY (id_user,id_bskBASKET), KEY id_user (id_user), KEY id_bskBASKET (id_bskBASKET) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS usergroup_bskBASKET ( id_usergroup int(15) unsigned NOT NULL default '0', id_bskBASKET int(15) unsigned NOT NULL default '0', topic varchar(50) NOT NULL default '', date_shared datetime NOT NULL default '0000-00-00 00:00:00', share_level char(2) NOT NULL default '', PRIMARY KEY (id_usergroup,id_bskBASKET), KEY id_usergroup (id_usergroup), KEY id_bskBASKET (id_bskBASKET) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS bskRECORDCOMMENT ( id int(15) unsigned NOT NULL auto_increment, id_bibrec_or_bskEXTREC int(16) NOT NULL default '0', id_bskBASKET int(15) unsigned NOT NULL default '0', id_user int(15) unsigned NOT NULL default '0', title varchar(255) NOT NULL default '', body text NOT NULL, date_creation datetime NOT NULL default '0000-00-00 00:00:00', priority int(15) NOT NULL default '0', PRIMARY KEY (id), KEY id_bskBASKET (id_bskBASKET), KEY id_bibrec_or_bskEXTREC (id_bibrec_or_bskEXTREC), KEY date_creation (date_creation) ) TYPE=MyISAM; -- tables for messaging system CREATE TABLE IF NOT EXISTS msgMESSAGE ( id int(15) unsigned NOT NULL auto_increment, id_user_from int(15) unsigned NOT NULL default '0', sent_to_user_nicks text NOT NULL default '', sent_to_group_names text NOT NULL default '', subject text NOT NULL default '', body text default NULL, sent_date datetime NOT NULL default '0000-00-00 00:00:00', received_date datetime NULL default '0000-00-00 00:00:00', PRIMARY KEY id (id), KEY id_user_from (id_user_from) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS user_msgMESSAGE ( id_user_to int(15) unsigned NOT NULL default '0', id_msgMESSAGE int(15) unsigned NOT NULL default '0', status char(1) NOT NULL default 'N', PRIMARY KEY id (id_user_to, id_msgMESSAGE), KEY id_user_to (id_user_to), KEY id_msgMESSAGE (id_msgMESSAGE) ) TYPE=MyISAM; -- tables for WebComment CREATE TABLE IF NOT EXISTS cmtRECORDCOMMENT ( id int(15) unsigned NOT NULL auto_increment, id_bibrec int(15) unsigned NOT NULL default '0', id_user int(15) unsigned NOT NULL default '0', title varchar(255) NOT NULL default '', body text NOT NULL default '', date_creation datetime NOT NULL default '0000-00-00 00:00:00', star_score tinyint(5) unsigned NOT NULL default '0', nb_votes_yes int(10) NOT NULL default '0', nb_votes_total int(10) unsigned NOT NULL default '0', nb_abuse_reports int(10) NOT NULL default '0', status char(2) NOT NULL default 'ok', PRIMARY KEY (id), KEY id_bibrec (id_bibrec), KEY id_user (id_user), KEY status (status) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS cmtACTIONHISTORY ( id_cmtRECORDCOMMENT int(15) unsigned NULL, id_bibrec int(15) unsigned NULL, id_user int(15) unsigned NULL default NULL, client_host int(10) unsigned default NULL, action_time datetime NOT NULL default '0000-00-00 00:00:00', action_code char(1) NOT NULL, KEY id_cmtRECORDCOMMENT (id_cmtRECORDCOMMENT), KEY client_host (client_host), KEY id_user (id_user), KEY action_code (action_code) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS cmtSUBSCRIPTION ( id_bibrec mediumint(8) unsigned NOT NULL, id_user int(15) unsigned NOT NULL, creation_time datetime NOT NULL default '0000-00-00 00:00:00', KEY id_user (id_bibrec, id_user) ) TYPE=MyISAM; -- tables for BibKnowledge: CREATE TABLE IF NOT EXISTS knwKB ( id mediumint(8) unsigned NOT NULL auto_increment, name varchar(255) default '', description text default '', kbtype char default NULL, PRIMARY KEY (id), UNIQUE KEY name (name) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS knwKBRVAL ( id mediumint(8) unsigned NOT NULL auto_increment, m_key varchar(255) NOT NULL default '', m_value text NOT NULL default '', id_knwKB mediumint(8) NOT NULL default '0', PRIMARY KEY (id), KEY id_knwKB (id_knwKB), KEY m_key (m_key(30)), KEY m_value (m_value(30)) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS knwKBDDEF ( id_knwKB mediumint(8) unsigned NOT NULL, id_collection mediumint(9), output_tag text default '', search_expression text default '', PRIMARY KEY (id_knwKB) ) TYPE=MyISAM; -- tables for WebSubmit: CREATE TABLE IF NOT EXISTS sbmACTION ( lactname text, sactname char(3) NOT NULL default '', dir text, cd date default NULL, md date default NULL, actionbutton text, statustext text, PRIMARY KEY (sactname) ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmALLFUNCDESCR ( function varchar(40) NOT NULL default '', description tinytext, PRIMARY KEY (function) ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmAPPROVAL ( doctype varchar(10) NOT NULL default '', categ varchar(50) NOT NULL default '', rn varchar(50) NOT NULL default '', status varchar(10) NOT NULL default '', dFirstReq datetime NOT NULL default '0000-00-00 00:00:00', dLastReq datetime NOT NULL default '0000-00-00 00:00:00', dAction datetime NOT NULL default '0000-00-00 00:00:00', access varchar(20) NOT NULL default '0', note text NOT NULL default '', PRIMARY KEY (rn) ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmCPLXAPPROVAL ( doctype varchar(10) NOT NULL default '', categ varchar(50) NOT NULL default '', rn varchar(50) NOT NULL default '', type varchar(10) NOT NULL, status varchar(10) NOT NULL, id_group int(15) unsigned NOT NULL default '0', id_bskBASKET int(15) unsigned NOT NULL default '0', id_EdBoardGroup int(15) unsigned NOT NULL default '0', dFirstReq datetime NOT NULL default '0000-00-00 00:00:00', dLastReq datetime NOT NULL default '0000-00-00 00:00:00', dEdBoardSel datetime NOT NULL default '0000-00-00 00:00:00', dRefereeSel datetime NOT NULL default '0000-00-00 00:00:00', dRefereeRecom datetime NOT NULL default '0000-00-00 00:00:00', dEdBoardRecom datetime NOT NULL default '0000-00-00 00:00:00', dPubComRecom datetime NOT NULL default '0000-00-00 00:00:00', dProjectLeaderAction datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (rn, type) ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmCOLLECTION ( id int(11) NOT NULL auto_increment, name varchar(100) NOT NULL default '', PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS sbmCOLLECTION_sbmCOLLECTION ( id_father int(11) NOT NULL default '0', id_son int(11) NOT NULL default '0', catalogue_order int(11) NOT NULL default '0' ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS sbmCOLLECTION_sbmDOCTYPE ( id_father int(11) NOT NULL default '0', id_son char(10) NOT NULL default '0', catalogue_order int(11) NOT NULL default '0' ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS sbmCATEGORIES ( doctype varchar(10) NOT NULL default '', sname varchar(75) NOT NULL default '', lname varchar(75) NOT NULL default '', score tinyint unsigned NOT NULL default 0, PRIMARY KEY (doctype, sname), KEY doctype (doctype), KEY sname (sname) ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmCHECKS ( chname varchar(15) NOT NULL default '', chdesc text, cd date default NULL, md date default NULL, chefi1 text, chefi2 text, PRIMARY KEY (chname) ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmDOCTYPE ( ldocname text, sdocname varchar(10) default NULL, cd date default NULL, md date default NULL, description text ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmFIELD ( subname varchar(13) default NULL, pagenb int(11) default NULL, fieldnb int(11) default NULL, fidesc varchar(15) default NULL, fitext text, level char(1) default NULL, sdesc text, checkn text, cd date default NULL, md date default NULL, fiefi1 text, fiefi2 text ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmFIELDDESC ( name varchar(15) NOT NULL default '', alephcode varchar(50) default NULL, marccode varchar(50) NOT NULL default '', type char(1) default NULL, size int(11) default NULL, rows int(11) default NULL, cols int(11) default NULL, maxlength int(11) default NULL, val text, fidesc text, cd date default NULL, md date default NULL, modifytext text, fddfi2 text, cookie int(11) default '0', PRIMARY KEY (name) ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmFORMATEXTENSION ( FILE_FORMAT text NOT NULL, FILE_EXTENSION text NOT NULL ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmFUNCTIONS ( action varchar(10) NOT NULL default '', doctype varchar(10) NOT NULL default '', function varchar(40) NOT NULL default '', score int(11) NOT NULL default '0', step tinyint(4) NOT NULL default '1' ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmFUNDESC ( function varchar(40) NOT NULL default '', param varchar(40) default NULL ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmGFILERESULT ( FORMAT text NOT NULL, RESULT text NOT NULL ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmIMPLEMENT ( docname varchar(10) default NULL, actname char(3) default NULL, displayed char(1) default NULL, subname varchar(13) default NULL, nbpg int(11) default NULL, cd date default NULL, md date default NULL, buttonorder int(11) default NULL, statustext text, level char(1) NOT NULL default '', score int(11) NOT NULL default '0', stpage int(11) NOT NULL default '0', endtxt varchar(100) NOT NULL default '' ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmPARAMETERS ( doctype varchar(10) NOT NULL default '', name varchar(40) NOT NULL default '', value text NOT NULL default '', PRIMARY KEY (doctype,name) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS sbmPUBLICATION ( doctype varchar(10) NOT NULL default '', categ varchar(50) NOT NULL default '', rn varchar(50) NOT NULL default '', status varchar(10) NOT NULL default '', dFirstReq datetime NOT NULL default '0000-00-00 00:00:00', dLastReq datetime NOT NULL default '0000-00-00 00:00:00', dAction datetime NOT NULL default '0000-00-00 00:00:00', accessref varchar(20) NOT NULL default '', accessedi varchar(20) NOT NULL default '', access varchar(20) NOT NULL default '', referees varchar(50) NOT NULL default '', authoremail varchar(50) NOT NULL default '', dRefSelection datetime NOT NULL default '0000-00-00 00:00:00', dRefRec datetime NOT NULL default '0000-00-00 00:00:00', dEdiRec datetime NOT NULL default '0000-00-00 00:00:00', accessspo varchar(20) NOT NULL default '', journal varchar(100) default NULL, PRIMARY KEY (doctype,categ,rn) ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmPUBLICATIONCOMM ( id int(11) NOT NULL auto_increment, id_parent int(11) default '0', rn varchar(100) NOT NULL default '', firstname varchar(100) default NULL, secondname varchar(100) default NULL, email varchar(100) default NULL, date varchar(40) NOT NULL default '', synopsis varchar(255) NOT NULL default '', commentfulltext text, PRIMARY KEY (id) ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmPUBLICATIONDATA ( doctype varchar(10) NOT NULL default '', editoboard varchar(250) NOT NULL default '', base varchar(10) NOT NULL default '', logicalbase varchar(10) NOT NULL default '', spokesperson varchar(50) NOT NULL default '', PRIMARY KEY (doctype) ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmREFEREES ( doctype varchar(10) NOT NULL default '', categ varchar(10) NOT NULL default '', name varchar(50) NOT NULL default '', address varchar(50) NOT NULL default '', rid int(11) NOT NULL auto_increment, PRIMARY KEY (rid) ) TYPE=MyISAM PACK_KEYS=1; CREATE TABLE IF NOT EXISTS sbmSUBMISSIONS ( email varchar(50) NOT NULL default '', doctype varchar(10) NOT NULL default '', action varchar(10) NOT NULL default '', status varchar(10) NOT NULL default '', id varchar(30) NOT NULL default '', reference varchar(40) NOT NULL default '', cd datetime NOT NULL default '0000-00-00 00:00:00', md datetime NOT NULL default '0000-00-00 00:00:00' ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS sbmCOOKIES ( id int(15) unsigned NOT NULL auto_increment, name varchar(100) NOT NULL, value text, uid int(15) NOT NULL, PRIMARY KEY (id) ) TYPE=MyISAM; -- Scheduler tables CREATE TABLE IF NOT EXISTS schTASK ( id int(15) unsigned NOT NULL auto_increment, proc varchar(20) NOT NULL, host varchar(255) NOT NULL default '', user varchar(50) NOT NULL, runtime datetime NOT NULL, sleeptime varchar(20), arguments mediumblob, status varchar(50), progress varchar(255), priority tinyint(4) NOT NULL default 0, PRIMARY KEY (id), KEY status (status), KEY runtime (runtime), KEY priority (priority) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS hstTASK ( id int(15) unsigned NOT NULL, proc varchar(20) NOT NULL, host varchar(255) NOT NULL default '', user varchar(50) NOT NULL, runtime datetime NOT NULL, sleeptime varchar(20), arguments mediumblob, status varchar(50), progress varchar(255), priority tinyint(4) NOT NULL default 0, PRIMARY KEY (id), KEY status (status), KEY runtime (runtime), KEY priority (priority) ) TYPE=MyISAM; -- External collections CREATE TABLE IF NOT EXISTS collection_externalcollection ( id_collection mediumint(9) unsigned NOT NULL default '0', id_externalcollection mediumint(9) unsigned NOT NULL default '0', type tinyint(4) unsigned NOT NULL default '0', PRIMARY KEY (id_collection, id_externalcollection) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS externalcollection ( id mediumint(9) unsigned NOT NULL auto_increment, name varchar(255) NOT NULL default '', PRIMARY KEY (id), UNIQUE KEY name (name) ) TYPE=MyISAM; -- WebStat tables: CREATE TABLE IF NOT EXISTS staEVENT ( id varchar(255) NOT NULL, number smallint(2) unsigned ZEROFILL NOT NULL auto_increment, name varchar(255), creation_time TIMESTAMP DEFAULT NOW(), cols varchar(255), PRIMARY KEY (id), UNIQUE KEY number (number) ) TYPE=MyISAM; -- BibClassify tables: CREATE TABLE IF NOT EXISTS clsMETHOD ( id mediumint(9) unsigned NOT NULL, name varchar(50) NOT NULL default '', location varchar(255) NOT NULL default '', description varchar(255) NOT NULL default '', last_updated datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (id), UNIQUE KEY name (name) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS collection_clsMETHOD ( id_collection mediumint(9) unsigned NOT NULL, id_clsMETHOD mediumint(9) unsigned NOT NULL, PRIMARY KEY (id_collection, id_clsMETHOD) ) TYPE=MyISAM; -- WebJournal tables: CREATE TABLE IF NOT EXISTS jrnJOURNAL ( id mediumint(9) unsigned NOT NULL auto_increment, name varchar(50) NOT NULL default '', PRIMARY KEY (id), UNIQUE KEY name (name) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS jrnISSUE ( id_jrnJOURNAL mediumint(9) unsigned NOT NULL, issue_number varchar(50) NOT NULL default '', issue_display varchar(50) NOT NULL default '', date_released datetime NOT NULL default '0000-00-00 00:00:00', date_announced datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (id_jrnJOURNAL,issue_number) ) TYPE=MyISAM; -- tables recording history of record's metadata and fulltext documents: CREATE TABLE IF NOT EXISTS hstRECORD ( id_bibrec mediumint(8) unsigned NOT NULL, marcxml blob NOT NULL, job_id mediumint(15) unsigned NOT NULL, job_name varchar(255) NOT NULL, job_person varchar(255) NOT NULL, job_date datetime NOT NULL, job_details blob NOT NULL, KEY (id_bibrec), KEY (job_id), KEY (job_name), KEY (job_person), KEY (job_date) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS hstDOCUMENT ( id_bibdoc mediumint(9) unsigned NOT NULL, docname varchar(250) NOT NULL, docformat varchar(50) NOT NULL, docversion tinyint(4) unsigned NOT NULL, docsize bigint(15) unsigned NOT NULL, docchecksum char(32) NOT NULL, doctimestamp datetime NOT NULL, action varchar(50) NOT NULL, job_id mediumint(15) unsigned NULL default NULL, job_name varchar(255) NULL default NULL, job_person varchar(255) NULL default NULL, job_date datetime NULL default NULL, job_details blob NULL default NULL, KEY (action), KEY (id_bibdoc), KEY (docname), KEY (docformat), KEY (doctimestamp), KEY (job_id), KEY (job_name), KEY (job_person), KEY (job_date) ) TYPE=MyISAM; -- BibCirculation tables: CREATE TABLE IF NOT EXISTS crcBORROWER ( id int(15) unsigned NOT NULL auto_increment, name varchar(255) NOT NULL default '', email varchar(255) NOT NULL default '', phone varchar(60) default NULL, address varchar(60) default NULL, mailbox varchar(30) default NULL, borrower_since datetime NOT NULL default '0000-00-00 00:00:00', borrower_until datetime NOT NULL default '0000-00-00 00:00:00', notes text, PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS crcILLREQUEST ( id int(15) unsigned NOT NULL auto_increment, id_crcBORROWER int(15) unsigned NOT NULL default '0', barcode varchar(30) NOT NULL default '', period_of_interest_from datetime NOT NULL default '0000-00-00 00:00:00', period_of_interest_to datetime NOT NULL default '0000-00-00 00:00:00', id_crcLIBRARY int(15) unsigned NOT NULL default '0', request_date datetime NOT NULL default '0000-00-00 00:00:00', expected_date datetime NOT NULL default '0000-00-00 00:00:00', arrival_date datetime NOT NULL default '0000-00-00 00:00:00', due_date datetime NOT NULL default '0000-00-00 00:00:00', return_date datetime NOT NULL default '0000-00-00 00:00:00', status varchar(20) NOT NULL default '', cost varchar(30) NOT NULL default '', book_info text, borrower_comments text, only_this_edition varchar(10) NOT NULL default '', library_notes text, PRIMARY KEY (id), KEY id_crcborrower (id_crcBORROWER), KEY id_crclibrary (id_crcLIBRARY) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS crcITEM ( barcode varchar(30) NOT NULL default '', id_bibrec int(15) unsigned NOT NULL default '0', id_crcLIBRARY int(15) unsigned NOT NULL default '0', collection varchar(60) default NULL, location varchar(60) default NULL, description varchar(60) default NULL, loan_period varchar(30) NOT NULL default '', status varchar(20) NOT NULL default '', creation_date datetime NOT NULL default '0000-00-00 00:00:00', modification_date datetime NOT NULL default '0000-00-00 00:00:00', number_of_requests int(3) unsigned NOT NULL default '0', PRIMARY KEY (barcode), KEY id_bibrec (id_bibrec), KEY id_crclibrary (id_crcLIBRARY) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS crcLIBRARY ( id int(15) unsigned NOT NULL auto_increment, name varchar(80) NOT NULL default '', address varchar(255) NOT NULL default '', email varchar(255) NOT NULL default '', phone varchar(30) NOT NULL default '', type varchar(30) default NULL, notes text, PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS crcLOAN ( id int(15) unsigned NOT NULL auto_increment, id_crcBORROWER int(15) unsigned NOT NULL default '0', id_bibrec int(15) unsigned NOT NULL default '0', barcode varchar(30) NOT NULL default '', loaned_on datetime NOT NULL default '0000-00-00 00:00:00', returned_on date NOT NULL default '0000-00-00', due_date datetime NOT NULL default '0000-00-00 00:00:00', number_of_renewals int(3) unsigned NOT NULL default '0', overdue_letter_number int(3) unsigned NOT NULL default '0', overdue_letter_date datetime NOT NULL default '0000-00-00 00:00:00', status varchar(20) NOT NULL default '', type varchar(20) NOT NULL default '', notes text, PRIMARY KEY (id), KEY id_crcborrower (id_crcBORROWER), KEY id_bibrec (id_bibrec), KEY barcode (barcode) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS crcLOANREQUEST ( id int(15) unsigned NOT NULL auto_increment, id_crcBORROWER int(15) unsigned NOT NULL default '0', id_bibrec int(15) unsigned NOT NULL default '0', barcode varchar(30) NOT NULL default '', period_of_interest_from datetime NOT NULL default '0000-00-00 00:00:00', period_of_interest_to datetime NOT NULL default '0000-00-00 00:00:00', status varchar(20) NOT NULL default '', notes text, request_date datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (id), KEY id_crcborrower (id_crcBORROWER), KEY id_bibrec (id_bibrec), KEY barcode (barcode) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS crcPURCHASE ( id int(15) unsigned NOT NULL auto_increment, id_bibrec int(15) unsigned NOT NULL default '0', id_crcVENDOR int(15) unsigned NOT NULL default '0', ordered_date datetime NOT NULL default '0000-00-00 00:00:00', expected_date datetime NOT NULL default '0000-00-00 00:00:00', price varchar(20) NOT NULL default '0', status varchar(20) NOT NULL default '', notes text, PRIMARY KEY (id), KEY id_bibrec (id_bibrec), KEY id_crcVENDOR (id_crcVENDOR) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS crcVENDOR ( id int(15) unsigned NOT NULL auto_increment, name varchar(80) NOT NULL default '', address varchar(255) NOT NULL default '', email varchar(255) NOT NULL default '', phone varchar(30) NOT NULL default '', notes text, PRIMARY KEY (id) ) TYPE=MyISAM; -- BibExport tables: CREATE TABLE IF NOT EXISTS expJOB ( id int(15) unsigned NOT NULL auto_increment, jobname varchar(50) NOT NULL default '', jobfreq mediumint(12) NOT NULL default '0', output_format mediumint(12) NOT NULL default '0', deleted mediumint(12) NOT NULL default '0', lastrun datetime NOT NULL default '0000-00-00 00:00:00', output_directory text, PRIMARY KEY (id), UNIQUE KEY jobname (jobname) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS expQUERY ( id int(15) unsigned NOT NULL auto_increment, name varchar(255) NOT NULL, search_criteria text NOT NULL, output_fields text NOT NULL, notes text, deleted mediumint(12) NOT NULL default '0', PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS expJOB_expQUERY ( id_expJOB int(15) NOT NULL, id_expQUERY int(15) NOT NULL, PRIMARY KEY (id_expJOB,id_expQUERY), KEY id_expJOB (id_expJOB), KEY id_expQUERY (id_expQUERY) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS expQUERYRESULT ( id int(15) unsigned NOT NULL auto_increment, id_expQUERY int(15) NOT NULL, result text NOT NULL, status mediumint(12) NOT NULL default '0', status_message text NOT NULL, PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS expJOBRESULT ( id int(15) unsigned NOT NULL auto_increment, id_expJOB int(15) NOT NULL, execution_time datetime NOT NULL default '0000-00-00 00:00:00', status mediumint(12) NOT NULL default '0', status_message text NOT NULL, PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS expJOBRESULT_expQUERYRESULT ( id_expJOBRESULT int(15) NOT NULL, id_expQUERYRESULT int(15) NOT NULL, PRIMARY KEY (id_expJOBRESULT, id_expQUERYRESULT), KEY id_expJOBRESULT (id_expJOBRESULT), KEY id_expQUERYRESULT (id_expQUERYRESULT) ) TYPE=MyISAM; CREATE TABLE IF NOT EXISTS user_expJOB ( id_user int(15) NOT NULL, id_expJOB int(15) NOT NULL, PRIMARY KEY (id_user, id_expJOB), KEY id_user (id_user), KEY id_expJOB (id_expJOB) ) TYPE=MyISAM; -- end of file diff --git a/modules/miscutil/sql/tabdrop.sql b/modules/miscutil/sql/tabdrop.sql index f664a4dba..c1c9ddbf6 100644 --- a/modules/miscutil/sql/tabdrop.sql +++ b/modules/miscutil/sql/tabdrop.sql @@ -1,401 +1,429 @@ -- $Id$ -- This file is part of CDS Invenio. -- Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. -- -- CDS Invenio is free software; you can redistribute it and/or -- modify it under the terms of the GNU General Public License as -- published by the Free Software Foundation; either version 2 of the -- License, or (at your option) any later version. -- -- CDS Invenio is distributed in the hope that it will be useful, but -- WITHOUT ANY WARRANTY; without even the implied warranty of -- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -- General Public License for more details. -- -- You should have received a copy of the GNU General Public License -- along with CDS Invenio; if not, write to the Free Software Foundation, Inc., -- 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. DROP TABLE IF EXISTS bibrec; DROP TABLE IF EXISTS bib00x; DROP TABLE IF EXISTS bib01x; DROP TABLE IF EXISTS bib02x; DROP TABLE IF EXISTS bib03x; DROP TABLE IF EXISTS bib04x; DROP TABLE IF EXISTS bib05x; DROP TABLE IF EXISTS bib06x; DROP TABLE IF EXISTS bib07x; DROP TABLE IF EXISTS bib08x; DROP TABLE IF EXISTS bib09x; DROP TABLE IF EXISTS bib10x; DROP TABLE IF EXISTS bib11x; DROP TABLE IF EXISTS bib12x; DROP TABLE IF EXISTS bib13x; DROP TABLE IF EXISTS bib14x; DROP TABLE IF EXISTS bib15x; DROP TABLE IF EXISTS bib16x; DROP TABLE IF EXISTS bib17x; DROP TABLE IF EXISTS bib18x; DROP TABLE IF EXISTS bib19x; DROP TABLE IF EXISTS bib20x; DROP TABLE IF EXISTS bib21x; DROP TABLE IF EXISTS bib22x; DROP TABLE IF EXISTS bib23x; DROP TABLE IF EXISTS bib24x; DROP TABLE IF EXISTS bib25x; DROP TABLE IF EXISTS bib26x; DROP TABLE IF EXISTS bib27x; DROP TABLE IF EXISTS bib28x; DROP TABLE IF EXISTS bib29x; DROP TABLE IF EXISTS bib30x; DROP TABLE IF EXISTS bib31x; DROP TABLE IF EXISTS bib32x; DROP TABLE IF EXISTS bib33x; DROP TABLE IF EXISTS bib34x; DROP TABLE IF EXISTS bib35x; DROP TABLE IF EXISTS bib36x; DROP TABLE IF EXISTS bib37x; DROP TABLE IF EXISTS bib38x; DROP TABLE IF EXISTS bib39x; DROP TABLE IF EXISTS bib40x; DROP TABLE IF EXISTS bib41x; DROP TABLE IF EXISTS bib42x; DROP TABLE IF EXISTS bib43x; DROP TABLE IF EXISTS bib44x; DROP TABLE IF EXISTS bib45x; DROP TABLE IF EXISTS bib46x; DROP TABLE IF EXISTS bib47x; DROP TABLE IF EXISTS bib48x; DROP TABLE IF EXISTS bib49x; DROP TABLE IF EXISTS bib50x; DROP TABLE IF EXISTS bib51x; DROP TABLE IF EXISTS bib52x; DROP TABLE IF EXISTS bib53x; DROP TABLE IF EXISTS bib54x; DROP TABLE IF EXISTS bib55x; DROP TABLE IF EXISTS bib56x; DROP TABLE IF EXISTS bib57x; DROP TABLE IF EXISTS bib58x; DROP TABLE IF EXISTS bib59x; DROP TABLE IF EXISTS bib60x; DROP TABLE IF EXISTS bib61x; DROP TABLE IF EXISTS bib62x; DROP TABLE IF EXISTS bib63x; DROP TABLE IF EXISTS bib64x; DROP TABLE IF EXISTS bib65x; DROP TABLE IF EXISTS bib66x; DROP TABLE IF EXISTS bib67x; DROP TABLE IF EXISTS bib68x; DROP TABLE IF EXISTS bib69x; DROP TABLE IF EXISTS bib70x; DROP TABLE IF EXISTS bib71x; DROP TABLE IF EXISTS bib72x; DROP TABLE IF EXISTS bib73x; DROP TABLE IF EXISTS bib74x; DROP TABLE IF EXISTS bib75x; DROP TABLE IF EXISTS bib76x; DROP TABLE IF EXISTS bib77x; DROP TABLE IF EXISTS bib78x; DROP TABLE IF EXISTS bib79x; DROP TABLE IF EXISTS bib80x; DROP TABLE IF EXISTS bib81x; DROP TABLE IF EXISTS bib82x; DROP TABLE IF EXISTS bib83x; DROP TABLE IF EXISTS bib84x; DROP TABLE IF EXISTS bib85x; DROP TABLE IF EXISTS bib86x; DROP TABLE IF EXISTS bib87x; DROP TABLE IF EXISTS bib88x; DROP TABLE IF EXISTS bib89x; DROP TABLE IF EXISTS bib90x; DROP TABLE IF EXISTS bib91x; DROP TABLE IF EXISTS bib92x; DROP TABLE IF EXISTS bib93x; DROP TABLE IF EXISTS bib94x; DROP TABLE IF EXISTS bib95x; DROP TABLE IF EXISTS bib96x; DROP TABLE IF EXISTS bib97x; DROP TABLE IF EXISTS bib98x; DROP TABLE IF EXISTS bib99x; DROP TABLE IF EXISTS bibrec_bib00x; DROP TABLE IF EXISTS bibrec_bib01x; DROP TABLE IF EXISTS bibrec_bib02x; DROP TABLE IF EXISTS bibrec_bib03x; DROP TABLE IF EXISTS bibrec_bib04x; DROP TABLE IF EXISTS bibrec_bib05x; DROP TABLE IF EXISTS bibrec_bib06x; DROP TABLE IF EXISTS bibrec_bib07x; DROP TABLE IF EXISTS bibrec_bib08x; DROP TABLE IF EXISTS bibrec_bib09x; DROP TABLE IF EXISTS bibrec_bib10x; DROP TABLE IF EXISTS bibrec_bib11x; DROP TABLE IF EXISTS bibrec_bib12x; DROP TABLE IF EXISTS bibrec_bib13x; DROP TABLE IF EXISTS bibrec_bib14x; DROP TABLE IF EXISTS bibrec_bib15x; DROP TABLE IF EXISTS bibrec_bib16x; DROP TABLE IF EXISTS bibrec_bib17x; DROP TABLE IF EXISTS bibrec_bib18x; DROP TABLE IF EXISTS bibrec_bib19x; DROP TABLE IF EXISTS bibrec_bib20x; DROP TABLE IF EXISTS bibrec_bib21x; DROP TABLE IF EXISTS bibrec_bib22x; DROP TABLE IF EXISTS bibrec_bib23x; DROP TABLE IF EXISTS bibrec_bib24x; DROP TABLE IF EXISTS bibrec_bib25x; DROP TABLE IF EXISTS bibrec_bib26x; DROP TABLE IF EXISTS bibrec_bib27x; DROP TABLE IF EXISTS bibrec_bib28x; DROP TABLE IF EXISTS bibrec_bib29x; DROP TABLE IF EXISTS bibrec_bib30x; DROP TABLE IF EXISTS bibrec_bib31x; DROP TABLE IF EXISTS bibrec_bib32x; DROP TABLE IF EXISTS bibrec_bib33x; DROP TABLE IF EXISTS bibrec_bib34x; DROP TABLE IF EXISTS bibrec_bib35x; DROP TABLE IF EXISTS bibrec_bib36x; DROP TABLE IF EXISTS bibrec_bib37x; DROP TABLE IF EXISTS bibrec_bib38x; DROP TABLE IF EXISTS bibrec_bib39x; DROP TABLE IF EXISTS bibrec_bib40x; DROP TABLE IF EXISTS bibrec_bib41x; DROP TABLE IF EXISTS bibrec_bib42x; DROP TABLE IF EXISTS bibrec_bib43x; DROP TABLE IF EXISTS bibrec_bib44x; DROP TABLE IF EXISTS bibrec_bib45x; DROP TABLE IF EXISTS bibrec_bib46x; DROP TABLE IF EXISTS bibrec_bib47x; DROP TABLE IF EXISTS bibrec_bib48x; DROP TABLE IF EXISTS bibrec_bib49x; DROP TABLE IF EXISTS bibrec_bib50x; DROP TABLE IF EXISTS bibrec_bib51x; DROP TABLE IF EXISTS bibrec_bib52x; DROP TABLE IF EXISTS bibrec_bib53x; DROP TABLE IF EXISTS bibrec_bib54x; DROP TABLE IF EXISTS bibrec_bib55x; DROP TABLE IF EXISTS bibrec_bib56x; DROP TABLE IF EXISTS bibrec_bib57x; DROP TABLE IF EXISTS bibrec_bib58x; DROP TABLE IF EXISTS bibrec_bib59x; DROP TABLE IF EXISTS bibrec_bib60x; DROP TABLE IF EXISTS bibrec_bib61x; DROP TABLE IF EXISTS bibrec_bib62x; DROP TABLE IF EXISTS bibrec_bib63x; DROP TABLE IF EXISTS bibrec_bib64x; DROP TABLE IF EXISTS bibrec_bib65x; DROP TABLE IF EXISTS bibrec_bib66x; DROP TABLE IF EXISTS bibrec_bib67x; DROP TABLE IF EXISTS bibrec_bib68x; DROP TABLE IF EXISTS bibrec_bib69x; DROP TABLE IF EXISTS bibrec_bib70x; DROP TABLE IF EXISTS bibrec_bib71x; DROP TABLE IF EXISTS bibrec_bib72x; DROP TABLE IF EXISTS bibrec_bib73x; DROP TABLE IF EXISTS bibrec_bib74x; DROP TABLE IF EXISTS bibrec_bib75x; DROP TABLE IF EXISTS bibrec_bib76x; DROP TABLE IF EXISTS bibrec_bib77x; DROP TABLE IF EXISTS bibrec_bib78x; DROP TABLE IF EXISTS bibrec_bib79x; DROP TABLE IF EXISTS bibrec_bib80x; DROP TABLE IF EXISTS bibrec_bib81x; DROP TABLE IF EXISTS bibrec_bib82x; DROP TABLE IF EXISTS bibrec_bib83x; DROP TABLE IF EXISTS bibrec_bib84x; DROP TABLE IF EXISTS bibrec_bib85x; DROP TABLE IF EXISTS bibrec_bib86x; DROP TABLE IF EXISTS bibrec_bib87x; DROP TABLE IF EXISTS bibrec_bib88x; DROP TABLE IF EXISTS bibrec_bib89x; DROP TABLE IF EXISTS bibrec_bib90x; DROP TABLE IF EXISTS bibrec_bib91x; DROP TABLE IF EXISTS bibrec_bib92x; DROP TABLE IF EXISTS bibrec_bib93x; DROP TABLE IF EXISTS bibrec_bib94x; DROP TABLE IF EXISTS bibrec_bib95x; DROP TABLE IF EXISTS bibrec_bib96x; DROP TABLE IF EXISTS bibrec_bib97x; DROP TABLE IF EXISTS bibrec_bib98x; DROP TABLE IF EXISTS bibrec_bib99x; DROP TABLE IF EXISTS bibfmt; DROP TABLE IF EXISTS idxINDEX; DROP TABLE IF EXISTS idxINDEXNAME; DROP TABLE IF EXISTS idxINDEX_field; DROP TABLE IF EXISTS idxWORD01F; DROP TABLE IF EXISTS idxWORD02F; DROP TABLE IF EXISTS idxWORD03F; DROP TABLE IF EXISTS idxWORD04F; DROP TABLE IF EXISTS idxWORD05F; DROP TABLE IF EXISTS idxWORD06F; DROP TABLE IF EXISTS idxWORD07F; DROP TABLE IF EXISTS idxWORD08F; DROP TABLE IF EXISTS idxWORD09F; DROP TABLE IF EXISTS idxWORD10F; DROP TABLE IF EXISTS idxWORD11F; DROP TABLE IF EXISTS idxWORD12F; DROP TABLE IF EXISTS idxWORD13F; DROP TABLE IF EXISTS idxWORD14F; DROP TABLE IF EXISTS idxWORD01R; DROP TABLE IF EXISTS idxWORD02R; DROP TABLE IF EXISTS idxWORD03R; DROP TABLE IF EXISTS idxWORD04R; DROP TABLE IF EXISTS idxWORD05R; DROP TABLE IF EXISTS idxWORD06R; DROP TABLE IF EXISTS idxWORD07R; DROP TABLE IF EXISTS idxWORD08R; DROP TABLE IF EXISTS idxWORD09R; DROP TABLE IF EXISTS idxWORD10R; DROP TABLE IF EXISTS idxWORD11R; DROP TABLE IF EXISTS idxWORD12R; DROP TABLE IF EXISTS idxWORD13R; DROP TABLE IF EXISTS idxWORD14R; +DROP TABLE IF EXISTS idxPAIR01F; +DROP TABLE IF EXISTS idxPAIR02F; +DROP TABLE IF EXISTS idxPAIR03F; +DROP TABLE IF EXISTS idxPAIR04F; +DROP TABLE IF EXISTS idxPAIR05F; +DROP TABLE IF EXISTS idxPAIR06F; +DROP TABLE IF EXISTS idxPAIR07F; +DROP TABLE IF EXISTS idxPAIR08F; +DROP TABLE IF EXISTS idxPAIR09F; +DROP TABLE IF EXISTS idxPAIR10F; +DROP TABLE IF EXISTS idxPAIR11F; +DROP TABLE IF EXISTS idxPAIR12F; +DROP TABLE IF EXISTS idxPAIR13F; +DROP TABLE IF EXISTS idxPAIR14F; +DROP TABLE IF EXISTS idxPAIR01R; +DROP TABLE IF EXISTS idxPAIR02R; +DROP TABLE IF EXISTS idxPAIR03R; +DROP TABLE IF EXISTS idxPAIR04R; +DROP TABLE IF EXISTS idxPAIR05R; +DROP TABLE IF EXISTS idxPAIR06R; +DROP TABLE IF EXISTS idxPAIR07R; +DROP TABLE IF EXISTS idxPAIR08R; +DROP TABLE IF EXISTS idxPAIR09R; +DROP TABLE IF EXISTS idxPAIR10R; +DROP TABLE IF EXISTS idxPAIR11R; +DROP TABLE IF EXISTS idxPAIR12R; +DROP TABLE IF EXISTS idxPAIR13R; +DROP TABLE IF EXISTS idxPAIR14R; DROP TABLE IF EXISTS idxPHRASE01F; DROP TABLE IF EXISTS idxPHRASE02F; DROP TABLE IF EXISTS idxPHRASE03F; DROP TABLE IF EXISTS idxPHRASE04F; DROP TABLE IF EXISTS idxPHRASE05F; DROP TABLE IF EXISTS idxPHRASE06F; DROP TABLE IF EXISTS idxPHRASE07F; DROP TABLE IF EXISTS idxPHRASE08F; DROP TABLE IF EXISTS idxPHRASE09F; DROP TABLE IF EXISTS idxPHRASE10F; DROP TABLE IF EXISTS idxPHRASE11F; DROP TABLE IF EXISTS idxPHRASE12F; DROP TABLE IF EXISTS idxPHRASE13F; DROP TABLE IF EXISTS idxPHRASE14F; DROP TABLE IF EXISTS idxPHRASE01R; DROP TABLE IF EXISTS idxPHRASE02R; DROP TABLE IF EXISTS idxPHRASE03R; DROP TABLE IF EXISTS idxPHRASE04R; DROP TABLE IF EXISTS idxPHRASE05R; DROP TABLE IF EXISTS idxPHRASE06R; DROP TABLE IF EXISTS idxPHRASE07R; DROP TABLE IF EXISTS idxPHRASE08R; DROP TABLE IF EXISTS idxPHRASE09R; DROP TABLE IF EXISTS idxPHRASE10R; DROP TABLE IF EXISTS idxPHRASE11R; DROP TABLE IF EXISTS idxPHRASE12R; DROP TABLE IF EXISTS idxPHRASE13R; DROP TABLE IF EXISTS idxPHRASE14R; DROP TABLE IF EXISTS rnkMETHOD; DROP TABLE IF EXISTS rnkMETHODNAME; DROP TABLE IF EXISTS rnkMETHODDATA; DROP TABLE IF EXISTS rnkWORD01F; DROP TABLE IF EXISTS rnkWORD01R; DROP TABLE IF EXISTS rnkPAGEVIEWS; DROP TABLE IF EXISTS rnkDOWNLOADS; DROP TABLE IF EXISTS rnkCITATIONDATA; DROP TABLE IF EXISTS rnkCITATIONDATAEXT; DROP TABLE IF EXISTS rnkAUTHORDATA; DROP TABLE IF EXISTS collection_rnkMETHOD; DROP TABLE IF EXISTS collection; DROP TABLE IF EXISTS collectionname; DROP TABLE IF EXISTS oaiREPOSITORY; DROP TABLE IF EXISTS oaiHARVEST; DROP TABLE IF EXISTS oaiHARVESTLOG; DROP TABLE IF EXISTS bibHOLDINGPEN; DROP TABLE IF EXISTS collection_collection; DROP TABLE IF EXISTS collection_portalbox; DROP TABLE IF EXISTS portalbox; DROP TABLE IF EXISTS collection_example; DROP TABLE IF EXISTS example; DROP TABLE IF EXISTS collection_format; DROP TABLE IF EXISTS format; DROP TABLE IF EXISTS formatname; DROP TABLE IF EXISTS collection_field_fieldvalue; DROP TABLE IF EXISTS field; DROP TABLE IF EXISTS fieldname; DROP TABLE IF EXISTS fieldvalue; DROP TABLE IF EXISTS field_tag; DROP TABLE IF EXISTS tag; DROP TABLE IF EXISTS publreq; DROP TABLE IF EXISTS session; DROP TABLE IF EXISTS user; DROP TABLE IF EXISTS accROLE; DROP TABLE IF EXISTS accMAILCOOKIE; DROP TABLE IF EXISTS user_accROLE; DROP TABLE IF EXISTS accACTION; DROP TABLE IF EXISTS accARGUMENT; DROP TABLE IF EXISTS accROLE_accACTION_accARGUMENT; DROP TABLE IF EXISTS user_query; DROP TABLE IF EXISTS query; DROP TABLE IF EXISTS user_basket; DROP TABLE IF EXISTS basket; DROP TABLE IF EXISTS basket_record; DROP TABLE IF EXISTS record; DROP TABLE IF EXISTS user_query_basket; DROP TABLE IF EXISTS cmtRECORDCOMMENT; DROP TABLE IF EXISTS knwKB; DROP TABLE IF EXISTS knwKBRVAL; DROP TABLE IF EXISTS knwKBDDEF; DROP TABLE IF EXISTS sbmACTION; DROP TABLE IF EXISTS sbmALLFUNCDESCR; DROP TABLE IF EXISTS sbmAPPROVAL; DROP TABLE IF EXISTS sbmCPLXAPPROVAL; DROP TABLE IF EXISTS sbmCOLLECTION; DROP TABLE IF EXISTS sbmCOLLECTION_sbmCOLLECTION; DROP TABLE IF EXISTS sbmCOLLECTION_sbmDOCTYPE; DROP TABLE IF EXISTS sbmCATEGORIES; DROP TABLE IF EXISTS sbmCHECKS; DROP TABLE IF EXISTS sbmCOOKIES; DROP TABLE IF EXISTS sbmDOCTYPE; DROP TABLE IF EXISTS sbmFIELD; DROP TABLE IF EXISTS sbmFIELDDESC; DROP TABLE IF EXISTS sbmFORMATEXTENSION; DROP TABLE IF EXISTS sbmFUNCTIONS; DROP TABLE IF EXISTS sbmFUNDESC; DROP TABLE IF EXISTS sbmGFILERESULT; DROP TABLE IF EXISTS sbmIMPLEMENT; DROP TABLE IF EXISTS sbmPARAMETERS; DROP TABLE IF EXISTS sbmPUBLICATION; DROP TABLE IF EXISTS sbmPUBLICATIONCOMM; DROP TABLE IF EXISTS sbmPUBLICATIONDATA; DROP TABLE IF EXISTS sbmREFEREES; DROP TABLE IF EXISTS sbmSUBMISSIONS; DROP TABLE IF EXISTS schTASK; DROP TABLE IF EXISTS bibdoc; DROP TABLE IF EXISTS bibdoc_bibdoc; DROP TABLE IF EXISTS bibrec_bibdoc; DROP TABLE IF EXISTS usergroup; DROP TABLE IF EXISTS user_usergroup; DROP TABLE IF EXISTS user_basket; DROP TABLE IF EXISTS msgMESSAGE; DROP TABLE IF EXISTS user_msgMESSAGE; DROP TABLE IF EXISTS bskBASKET; DROP TABLE IF EXISTS bskEXTREC; DROP TABLE IF EXISTS bskEXTFMT; DROP TABLE IF EXISTS bskREC; DROP TABLE IF EXISTS bskRECORDCOMMENT; DROP TABLE IF EXISTS cmtACTIONHISTORY; DROP TABLE IF EXISTS cmtSUBSCRIPTION; DROP TABLE IF EXISTS user_bskBASKET; DROP TABLE IF EXISTS usergroup_bskBASKET; DROP TABLE IF EXISTS collection_externalcollection; DROP TABLE IF EXISTS externalcollection; DROP TABLE IF EXISTS collectiondetailedrecordpagetabs; DROP TABLE IF EXISTS staEVENT; DROP TABLE IF EXISTS clsMETHOD; DROP TABLE IF EXISTS collection_clsMETHOD; DROP TABLE IF EXISTS jrnJOURNAL; DROP TABLE IF EXISTS jrnISSUE; DROP TABLE IF EXISTS hstRECORD; DROP TABLE IF EXISTS hstDOCUMENT; DROP TABLE IF EXISTS hstTASK; DROP TABLE IF EXISTS crcBORROWER; DROP TABLE IF EXISTS crcILLREQUEST; DROP TABLE IF EXISTS crcITEM; DROP TABLE IF EXISTS crcLIBRARY; DROP TABLE IF EXISTS crcLOAN; DROP TABLE IF EXISTS crcLOANREQUEST; DROP TABLE IF EXISTS crcPURCHASE; DROP TABLE IF EXISTS crcVENDOR; DROP TABLE IF EXISTS expJOB; DROP TABLE IF EXISTS expQUERY; DROP TABLE IF EXISTS expJOB_expQUERY; DROP TABLE IF EXISTS expQUERYRESULT; DROP TABLE IF EXISTS expJOBRESULT; DROP TABLE IF EXISTS expJOBRESULT_expQUERYRESULT; DROP TABLE IF EXISTS user_expJOB; -- end of file diff --git a/modules/miscutil/sql/tabfill.sql b/modules/miscutil/sql/tabfill.sql index f30c0e479..ec6791786 100644 --- a/modules/miscutil/sql/tabfill.sql +++ b/modules/miscutil/sql/tabfill.sql @@ -1,585 +1,580 @@ -- $Id$ -- This file is part of CDS Invenio. -- Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. -- -- CDS Invenio is free software; you can redistribute it and/or -- modify it under the terms of the GNU General Public License as -- published by the Free Software Foundation; either version 2 of the -- License, or (at your option) any later version. -- -- CDS Invenio is distributed in the hope that it will be useful, but -- WITHOUT ANY WARRANTY; without even the implied warranty of -- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -- General Public License for more details. -- -- You should have received a copy of the GNU General Public License -- along with CDS Invenio; if not, write to the Free Software Foundation, Inc., -- 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. -- Fill Invenio configuration tables with defaults suitable for any site. INSERT INTO rnkMETHOD (id,name,last_updated) VALUES (1,'wrd','0000-00-00 00:00:00'); INSERT INTO collection_rnkMETHOD (id_collection,id_rnkMETHOD,score) VALUES (1,1,100); INSERT INTO rnkCITATIONDATA VALUES (1,'citationdict',NULL,'0000-00-00'); INSERT INTO rnkCITATIONDATA VALUES (2,'reversedict',NULL,'0000-00-00'); INSERT INTO rnkCITATIONDATA VALUES (3,'selfcitdict',NULL,'0000-00-00'); INSERT INTO rnkCITATIONDATA VALUES (4,'selfcitedbydict',NULL,'0000-00-00'); INSERT INTO field VALUES (1,'any field','anyfield'); INSERT INTO field VALUES (2,'title','title'); INSERT INTO field VALUES (3,'author','author'); INSERT INTO field VALUES (4,'abstract','abstract'); INSERT INTO field VALUES (5,'keyword','keyword'); INSERT INTO field VALUES (6,'report number','reportnumber'); INSERT INTO field VALUES (7,'subject','subject'); INSERT INTO field VALUES (8,'reference','reference'); INSERT INTO field VALUES (9,'fulltext','fulltext'); INSERT INTO field VALUES (10,'collection','collection'); INSERT INTO field VALUES (11,'division','division'); INSERT INTO field VALUES (12,'year','year'); INSERT INTO field VALUES (13,'experiment','experiment'); INSERT INTO field VALUES (14,'record ID','recid'); INSERT INTO field VALUES (15,'isbn','isbn'); INSERT INTO field VALUES (16,'issn','issn'); INSERT INTO field VALUES (17,'coden','coden'); -- INSERT INTO field VALUES (18,'doi','doi'); INSERT INTO field VALUES (19,'journal','journal'); INSERT INTO field VALUES (20,'collaboration','collaboration'); INSERT INTO field VALUES (21,'affiliation','affiliation'); INSERT INTO field VALUES (22,'exact author','exactauthor'); INSERT INTO field VALUES (23,'date created','datecreated'); INSERT INTO field VALUES (24,'date modified','datemodified'); INSERT INTO field_tag VALUES (1,100,10); INSERT INTO field_tag VALUES (1,102,10); INSERT INTO field_tag VALUES (1,103,10); INSERT INTO field_tag VALUES (1,104,10); INSERT INTO field_tag VALUES (1,105,10); INSERT INTO field_tag VALUES (1,106,10); INSERT INTO field_tag VALUES (1,107,10); INSERT INTO field_tag VALUES (1,108,10); INSERT INTO field_tag VALUES (1,109,10); INSERT INTO field_tag VALUES (1,110,10); INSERT INTO field_tag VALUES (1,111,10); INSERT INTO field_tag VALUES (1,112,10); INSERT INTO field_tag VALUES (1,113,10); INSERT INTO field_tag VALUES (1,114,10); INSERT INTO field_tag VALUES (1,16,10); INSERT INTO field_tag VALUES (1,17,10); INSERT INTO field_tag VALUES (1,18,10); INSERT INTO field_tag VALUES (1,19,10); INSERT INTO field_tag VALUES (1,20,10); INSERT INTO field_tag VALUES (1,21,10); INSERT INTO field_tag VALUES (1,22,10); INSERT INTO field_tag VALUES (1,23,10); INSERT INTO field_tag VALUES (1,24,10); INSERT INTO field_tag VALUES (1,25,10); INSERT INTO field_tag VALUES (1,26,10); INSERT INTO field_tag VALUES (1,27,10); INSERT INTO field_tag VALUES (1,28,10); INSERT INTO field_tag VALUES (1,29,10); INSERT INTO field_tag VALUES (1,30,10); INSERT INTO field_tag VALUES (1,31,10); INSERT INTO field_tag VALUES (1,32,10); INSERT INTO field_tag VALUES (1,33,10); INSERT INTO field_tag VALUES (1,34,10); INSERT INTO field_tag VALUES (1,35,10); INSERT INTO field_tag VALUES (1,36,10); INSERT INTO field_tag VALUES (1,37,10); INSERT INTO field_tag VALUES (1,38,10); INSERT INTO field_tag VALUES (1,39,10); INSERT INTO field_tag VALUES (1,40,10); INSERT INTO field_tag VALUES (1,41,10); INSERT INTO field_tag VALUES (1,42,10); INSERT INTO field_tag VALUES (1,43,10); INSERT INTO field_tag VALUES (1,44,10); INSERT INTO field_tag VALUES (1,45,10); INSERT INTO field_tag VALUES (1,46,10); INSERT INTO field_tag VALUES (1,47,10); INSERT INTO field_tag VALUES (1,48,10); INSERT INTO field_tag VALUES (1,49,10); INSERT INTO field_tag VALUES (1,50,10); INSERT INTO field_tag VALUES (1,51,10); INSERT INTO field_tag VALUES (1,52,10); INSERT INTO field_tag VALUES (1,53,10); INSERT INTO field_tag VALUES (1,54,10); INSERT INTO field_tag VALUES (1,55,10); INSERT INTO field_tag VALUES (1,56,10); INSERT INTO field_tag VALUES (1,57,10); INSERT INTO field_tag VALUES (1,58,10); INSERT INTO field_tag VALUES (1,59,10); INSERT INTO field_tag VALUES (1,60,10); INSERT INTO field_tag VALUES (1,61,10); INSERT INTO field_tag VALUES (1,62,10); INSERT INTO field_tag VALUES (1,63,10); INSERT INTO field_tag VALUES (1,64,10); INSERT INTO field_tag VALUES (1,65,10); INSERT INTO field_tag VALUES (1,66,10); INSERT INTO field_tag VALUES (1,67,10); INSERT INTO field_tag VALUES (1,68,10); INSERT INTO field_tag VALUES (1,69,10); INSERT INTO field_tag VALUES (1,70,10); INSERT INTO field_tag VALUES (1,71,10); INSERT INTO field_tag VALUES (1,72,10); INSERT INTO field_tag VALUES (1,73,10); INSERT INTO field_tag VALUES (1,74,10); INSERT INTO field_tag VALUES (1,75,10); INSERT INTO field_tag VALUES (1,76,10); INSERT INTO field_tag VALUES (1,77,10); INSERT INTO field_tag VALUES (1,78,10); INSERT INTO field_tag VALUES (1,79,10); INSERT INTO field_tag VALUES (1,80,10); INSERT INTO field_tag VALUES (1,81,10); INSERT INTO field_tag VALUES (1,82,10); INSERT INTO field_tag VALUES (1,83,10); INSERT INTO field_tag VALUES (1,84,10); INSERT INTO field_tag VALUES (1,85,10); INSERT INTO field_tag VALUES (1,86,10); INSERT INTO field_tag VALUES (1,87,10); INSERT INTO field_tag VALUES (1,88,10); INSERT INTO field_tag VALUES (1,89,10); INSERT INTO field_tag VALUES (1,90,10); INSERT INTO field_tag VALUES (1,91,10); INSERT INTO field_tag VALUES (1,92,10); INSERT INTO field_tag VALUES (1,93,10); INSERT INTO field_tag VALUES (1,94,10); INSERT INTO field_tag VALUES (1,95,10); INSERT INTO field_tag VALUES (1,96,10); INSERT INTO field_tag VALUES (1,97,10); INSERT INTO field_tag VALUES (1,98,10); INSERT INTO field_tag VALUES (1,99,10); INSERT INTO field_tag VALUES (1,122,10); INSERT INTO field_tag VALUES (1,123,10); INSERT INTO field_tag VALUES (1,124,10); INSERT INTO field_tag VALUES (1,125,10); INSERT INTO field_tag VALUES (1,126,10); INSERT INTO field_tag VALUES (1,127,10); INSERT INTO field_tag VALUES (1,128,10); INSERT INTO field_tag VALUES (1,129,10); INSERT INTO field_tag VALUES (1,130,10); INSERT INTO field_tag VALUES (10,11,100); INSERT INTO field_tag VALUES (11,14,100); INSERT INTO field_tag VALUES (12,15,10); INSERT INTO field_tag VALUES (13,116,10); INSERT INTO field_tag VALUES (2,3,100); INSERT INTO field_tag VALUES (2,4,90); INSERT INTO field_tag VALUES (3,1,100); INSERT INTO field_tag VALUES (3,2,90); INSERT INTO field_tag VALUES (4,5,100); INSERT INTO field_tag VALUES (5,6,100); INSERT INTO field_tag VALUES (6,7,30); INSERT INTO field_tag VALUES (6,8,10); INSERT INTO field_tag VALUES (6,9,20); INSERT INTO field_tag VALUES (7,12,100); INSERT INTO field_tag VALUES (7,13,90); INSERT INTO field_tag VALUES (8,10,100); INSERT INTO field_tag VALUES (9,115,100); INSERT INTO field_tag VALUES (14,117,100); INSERT INTO field_tag VALUES (15,118,100); INSERT INTO field_tag VALUES (16,119,100); INSERT INTO field_tag VALUES (17,120,100); -- INSERT INTO field_tag VALUES (18,121,100); INSERT INTO field_tag VALUES (19,131,100); INSERT INTO field_tag VALUES (20,132,100); INSERT INTO field_tag VALUES (21,133,100); INSERT INTO field_tag VALUES (21,134,90); INSERT INTO field_tag VALUES (22,1,100); INSERT INTO field_tag VALUES (22,2,90); INSERT INTO format VALUES (1,'HTML brief','hb', 'HTML brief output format, used for search results pages.', 'text/html', 1); INSERT INTO format VALUES (2,'HTML detailed','hd', 'HTML detailed output format, used for Detailed record pages.', 'text/html', 1); INSERT INTO format VALUES (3,'MARC','hm', 'HTML MARC.', 'text/html', 1); INSERT INTO format VALUES (4,'Dublin Core','xd', 'XML Dublin Core.', 'text/xml', 1); INSERT INTO format VALUES (5,'MARCXML','xm', 'XML MARC.', 'text/xml', 1); INSERT INTO format VALUES (6,'portfolio','hp', 'HTML portfolio-style output format for photos.', 'text/html', 1); INSERT INTO format VALUES (7,'photo captions only','hc', 'HTML caption-only output format for photos.', 'text/html', 1); INSERT INTO format VALUES (8,'BibTeX','hx', 'BibTeX.', 'text/html', 1); INSERT INTO format VALUES (9,'EndNote','xe', 'XML EndNote.', 'text/xml', 1); INSERT INTO format VALUES (10,'NLM','xn', 'XML NLM.', 'text/xml', 1); INSERT INTO format VALUES (11,'Excel','excel', 'Excel csv output', 'application/ms-excel', 0); INSERT INTO format VALUES (12,'HTML similarity','hs', 'Very short HTML output for similarity box (people also viewed..).', 'text/html', 0); INSERT INTO format VALUES (13,'RSS','xr', 'RSS.', 'text/xml', 0); INSERT INTO format VALUES (14,'OAI DC','xoaidc', 'OAI DC.', 'text/xml', 0); INSERT INTO format VALUES (15,'File mini-panel', 'hdfile', 'Used to show fulltext files in mini-panel of detailed record pages.', 'text/html', 0); INSERT INTO format VALUES (16,'Actions mini-panel', 'hdact', 'Used to display actions in mini-panel of detailed record pages.', 'text/html', 0); INSERT INTO format VALUES (17,'References tab', 'hdref', 'Display record references in References tab.', 'text/html', 0); INSERT INTO format VALUES (18,'HTML citesummary','hcs', 'HTML cite summary format, used for search results pages.', 'text/html', 1); INSERT INTO format VALUES (19,'RefWorks','xw', 'RefWorks.', 'text/xml', 1); INSERT INTO format VALUES (20,'MODS', 'xo', 'Metadata Object Description Schema', 'application/xml', 1); INSERT INTO tag VALUES (1,'first author name','100__a'); INSERT INTO tag VALUES (2,'additional author name','700__a'); INSERT INTO tag VALUES (3,'main title','245__%'); INSERT INTO tag VALUES (4,'additional title','246__%'); INSERT INTO tag VALUES (5,'abstract','520__%'); INSERT INTO tag VALUES (6,'keyword','6531_a'); INSERT INTO tag VALUES (7,'primary report number','037__a'); INSERT INTO tag VALUES (8,'additional report number','088__a'); INSERT INTO tag VALUES (9,'added report number','909C0r'); INSERT INTO tag VALUES (10,'reference','999C5%'); INSERT INTO tag VALUES (11,'collection identifier','980__%'); INSERT INTO tag VALUES (12,'main subject','65017a'); INSERT INTO tag VALUES (13,'additional subject','65027a'); INSERT INTO tag VALUES (14,'division','909C0p'); INSERT INTO tag VALUES (15,'year','909C0y'); INSERT INTO tag VALUES (16,'00x','00%'); INSERT INTO tag VALUES (17,'01x','01%'); INSERT INTO tag VALUES (18,'02x','02%'); INSERT INTO tag VALUES (19,'03x','03%'); INSERT INTO tag VALUES (20,'lang','04%'); INSERT INTO tag VALUES (21,'05x','05%'); INSERT INTO tag VALUES (22,'06x','06%'); INSERT INTO tag VALUES (23,'07x','07%'); INSERT INTO tag VALUES (24,'08x','08%'); INSERT INTO tag VALUES (25,'09x','09%'); INSERT INTO tag VALUES (26,'10x','10%'); INSERT INTO tag VALUES (27,'11x','11%'); INSERT INTO tag VALUES (28,'12x','12%'); INSERT INTO tag VALUES (29,'13x','13%'); INSERT INTO tag VALUES (30,'14x','14%'); INSERT INTO tag VALUES (31,'15x','15%'); INSERT INTO tag VALUES (32,'16x','16%'); INSERT INTO tag VALUES (33,'17x','17%'); INSERT INTO tag VALUES (34,'18x','18%'); INSERT INTO tag VALUES (35,'19x','19%'); INSERT INTO tag VALUES (36,'20x','20%'); INSERT INTO tag VALUES (37,'21x','21%'); INSERT INTO tag VALUES (38,'22x','22%'); INSERT INTO tag VALUES (39,'23x','23%'); INSERT INTO tag VALUES (40,'24x','24%'); INSERT INTO tag VALUES (41,'25x','25%'); INSERT INTO tag VALUES (42,'internal','26%'); INSERT INTO tag VALUES (43,'27x','27%'); INSERT INTO tag VALUES (44,'28x','28%'); INSERT INTO tag VALUES (45,'29x','29%'); INSERT INTO tag VALUES (46,'pages','30%'); INSERT INTO tag VALUES (47,'31x','31%'); INSERT INTO tag VALUES (48,'32x','32%'); INSERT INTO tag VALUES (49,'33x','33%'); INSERT INTO tag VALUES (50,'34x','34%'); INSERT INTO tag VALUES (51,'35x','35%'); INSERT INTO tag VALUES (52,'36x','36%'); INSERT INTO tag VALUES (53,'37x','37%'); INSERT INTO tag VALUES (54,'38x','38%'); INSERT INTO tag VALUES (55,'39x','39%'); INSERT INTO tag VALUES (56,'40x','40%'); INSERT INTO tag VALUES (57,'41x','41%'); INSERT INTO tag VALUES (58,'42x','42%'); INSERT INTO tag VALUES (59,'43x','43%'); INSERT INTO tag VALUES (60,'44x','44%'); INSERT INTO tag VALUES (61,'45x','45%'); INSERT INTO tag VALUES (62,'46x','46%'); INSERT INTO tag VALUES (63,'47x','47%'); INSERT INTO tag VALUES (64,'48x','48%'); INSERT INTO tag VALUES (65,'series','49%'); INSERT INTO tag VALUES (66,'50x','50%'); INSERT INTO tag VALUES (67,'51x','51%'); INSERT INTO tag VALUES (68,'52x','52%'); INSERT INTO tag VALUES (69,'53x','53%'); INSERT INTO tag VALUES (70,'54x','54%'); INSERT INTO tag VALUES (71,'55x','55%'); INSERT INTO tag VALUES (72,'56x','56%'); INSERT INTO tag VALUES (73,'57x','57%'); INSERT INTO tag VALUES (74,'58x','58%'); INSERT INTO tag VALUES (75,'summary','59%'); INSERT INTO tag VALUES (76,'60x','60%'); INSERT INTO tag VALUES (77,'61x','61%'); INSERT INTO tag VALUES (78,'62x','62%'); INSERT INTO tag VALUES (79,'63x','63%'); INSERT INTO tag VALUES (80,'64x','64%'); INSERT INTO tag VALUES (81,'65x','65%'); INSERT INTO tag VALUES (82,'66x','66%'); INSERT INTO tag VALUES (83,'67x','67%'); INSERT INTO tag VALUES (84,'68x','68%'); INSERT INTO tag VALUES (85,'subject','69%'); INSERT INTO tag VALUES (86,'70x','70%'); INSERT INTO tag VALUES (87,'71x','71%'); INSERT INTO tag VALUES (88,'author-ad','72%'); INSERT INTO tag VALUES (89,'73x','73%'); INSERT INTO tag VALUES (90,'74x','74%'); INSERT INTO tag VALUES (91,'75x','75%'); INSERT INTO tag VALUES (92,'76x','76%'); INSERT INTO tag VALUES (93,'77x','77%'); INSERT INTO tag VALUES (94,'78x','78%'); INSERT INTO tag VALUES (95,'79x','79%'); INSERT INTO tag VALUES (96,'80x','80%'); INSERT INTO tag VALUES (97,'81x','81%'); INSERT INTO tag VALUES (98,'82x','82%'); INSERT INTO tag VALUES (99,'83x','83%'); INSERT INTO tag VALUES (100,'84x','84%'); INSERT INTO tag VALUES (101,'electr','85%'); INSERT INTO tag VALUES (102,'86x','86%'); INSERT INTO tag VALUES (103,'87x','87%'); INSERT INTO tag VALUES (104,'88x','88%'); INSERT INTO tag VALUES (105,'89x','89%'); INSERT INTO tag VALUES (106,'publication','90%'); INSERT INTO tag VALUES (107,'pub-conf-cit','91%'); INSERT INTO tag VALUES (108,'92x','92%'); INSERT INTO tag VALUES (109,'93x','93%'); INSERT INTO tag VALUES (110,'94x','94%'); INSERT INTO tag VALUES (111,'95x','95%'); INSERT INTO tag VALUES (112,'catinfo','96%'); INSERT INTO tag VALUES (113,'97x','97%'); INSERT INTO tag VALUES (114,'98x','98%'); INSERT INTO tag VALUES (115,'url','8564_u'); INSERT INTO tag VALUES (116,'experiment','909C0e'); INSERT INTO tag VALUES (117,'record ID','001'); INSERT INTO tag VALUES (118,'isbn','020__a'); INSERT INTO tag VALUES (119,'issn','022__a'); INSERT INTO tag VALUES (120,'coden','030__a'); -- INSERT INTO tag VALUES (121,'doi','773__a'); INSERT INTO tag VALUES (122,'850x','850%'); INSERT INTO tag VALUES (123,'851x','851%'); INSERT INTO tag VALUES (124,'852x','852%'); INSERT INTO tag VALUES (125,'853x','853%'); INSERT INTO tag VALUES (126,'854x','854%'); INSERT INTO tag VALUES (127,'855x','855%'); INSERT INTO tag VALUES (128,'857x','857%'); INSERT INTO tag VALUES (129,'858x','858%'); INSERT INTO tag VALUES (130,'859x','859%'); INSERT INTO tag VALUES (131,'journal','909C4%'); INSERT INTO tag VALUES (132,'collaboration','710__g'); INSERT INTO tag VALUES (133,'first author affiliation','100__u'); INSERT INTO tag VALUES (134,'additional author affiliation','700__u'); INSERT INTO idxINDEX VALUES (1,'global','This index contains words/phrases from global fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (2,'collection','This index contains words/phrases from collection identifiers fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (3,'abstract','This index contains words/phrases from abstract fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (4,'author','This index contains fuzzy words/phrases from author fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (5,'keyword','This index contains words/phrases from keyword fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (6,'reference','This index contains words/phrases from references fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (7,'reportnumber','This index contains words/phrases from report numbers fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (8,'title','This index contains words/phrases from title fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (9,'fulltext','This index contains words/phrases from fulltext fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (10,'year','This index contains words/phrases from year fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (11,'journal','This index contains words/phrases from journal publication information fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (12,'collaboration','This index contains words/phrases from collaboration name fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (13,'affiliation','This index contains words/phrases from institutional affiliation fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX VALUES (14,'exactauthor','This index contains exact words/phrases from author fields.','0000-00-00 00:00:00', ''); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (1,1); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (2,10); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (3,4); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (4,3); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (5,5); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (6,8); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (7,6); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (8,2); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (9,9); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (10,12); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (11,19); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (12,20); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (13,21); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (14,22); INSERT INTO sbmACTION VALUES ('Submit New Record','SBI','running','1998-08-17','2001-08-08','','Submit New Record'); INSERT INTO sbmACTION VALUES ('Modify Record','MBI','modify','1998-08-17','2001-11-07','','Modify Record'); INSERT INTO sbmACTION VALUES ('Submit New File','SRV','revise','0000-00-00','2001-11-07','','Submit New File'); INSERT INTO sbmACTION VALUES ('Approve Record','APP','approve','2001-11-08','2002-06-11','','Approve Record'); INSERT INTO sbmALLFUNCDESCR VALUES ('Ask_For_Record_Details_Confirmation',''); INSERT INTO sbmALLFUNCDESCR VALUES ('CaseEDS',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Create_Modify_Interface',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Create_Recid',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Finish_Submission',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Get_Info',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Get_Recid', 'This function gets the recid for a document with a given report-number (as stored in the global variable rn).'); INSERT INTO sbmALLFUNCDESCR VALUES ('Get_Report_Number',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Get_Sysno',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Insert_Modify_Record',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Insert_Record',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Is_Original_Submitter',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Is_Referee','This function checks whether the logged user is a referee for the current document'); INSERT INTO sbmALLFUNCDESCR VALUES ('Mail_Approval_Request_to_Referee',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Mail_Approval_Withdrawn_to_Referee',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Mail_Submitter',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Make_Modify_Record',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Make_Record',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_From_Pending',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_to_Done',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_to_Pending',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success_Approval_Request',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success_APP',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success_DEL','Prepare a message for the user informing them that their record was successfully deleted.'); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success_MBI',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success_SRV',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Register_Approval_Request',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Register_Referee_Decision',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Withdraw_Approval_Request',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Report_Number_Generation',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Second_Report_Number_Generation','Generate a secondary report number for a document.'); INSERT INTO sbmALLFUNCDESCR VALUES ('Send_Approval_Request',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Send_APP_Mail',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Send_Delete_Mail',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Send_Modify_Mail',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Send_SRV_Mail',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Stamp_Replace_Single_File_Approval','Stamp a single file when a document is approved.'); INSERT INTO sbmALLFUNCDESCR VALUES ('Stamp_Uploaded_Files','Stamp some of the files that were uploaded during a submission.'); INSERT INTO sbmALLFUNCDESCR VALUES ('Test_Status',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Update_Approval_DB',NULL); -INSERT INTO sbmALLFUNCDESCR VALUES ('Upload_Files',''); INSERT INTO sbmALLFUNCDESCR VALUES ('User_is_Record_Owner_or_Curator',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_Files_to_Storage',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Make_Dummy_MARC_XML_Record',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_FCKeditor_Files_to_Storage','Transfer files attached to the record with the FCKeditor'); INSERT INTO sbmALLFUNCDESCR VALUES ('Create_Upload_Files_Interface','Display generic interface to add/revise/delete files. To be used before function "Move_Uploaded_Files_to_Storage"'); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_Uploaded_Files_to_Storage','Attach files uploaded with "Create_Upload_Files_Interface"'); -INSERT INTO sbmCHECKS VALUES ('AUCheck','function AUCheck(txt) {\r\n var res=1;\r\n tmp=txt.indexOf(\"\\015\");\r\n while (tmp != -1) {\r\n left=txt.substring(0,tmp);\r\n right=txt.substring(tmp+2,txt.length);\r\n txt=left + \"\\012\" + right;\r\n tmp=txt.indexOf(\"\\015\");\r\n }\r\n tmp=txt.indexOf(\"\\012\");\r\n if (tmp==-1){\r\n line=txt;\r\n txt=\'\';}\r\n else{\r\n line=txt.substring(0,tmp);\r\n txt=txt.substring(tmp+1,txt.length);}\r\n while (line != \"\"){\r\n coma=line.indexOf(\",\");\r\n left=line.substring(0,coma);\r\n right=line.substring(coma+1,line.length);\r\n coma2=right.indexOf(\",\");\r\n space=right.indexOf(\" \");\r\n if ((coma==-1)||(left==\"\")||(right==\"\")||(space!=0)||(coma2!=-1)){\r\n res=0;\r\n error_log=line;\r\n }\r\n tmp=txt.indexOf(\"\\012\");\r\n if (tmp==-1){\r\n line=txt;\r\n txt=\'\';}\r\n else{\r\n line=txt.substring(0,tmp-1);\r\n txt=txt.substring(tmp+1,txt.length);}\r\n }\r\n if (res == 0){\r\n alert(\"This author name cannot be managed \\: \\012\\012\" + error_log + \" \\012\\012It is not in the required format!\\012Put one author per line and a comma (,) between the name and the firstname initial letters. \\012The name is going first, followed by the firstname initial letters.\\012Do not forget the whitespace after the comma!!!\\012\\012Example \\: Put\\012\\012Le Meur, J Y \\012Baron, T \\012\\012for\\012\\012Le Meur Jean-Yves & Baron Thomas.\");\r\n return 0;\r\n } \r\n return 1; \r\n}','1998-08-18','0000-00-00','',''); -INSERT INTO sbmCHECKS VALUES ('DatCheckNew','function DatCheckNew(txt) {\r\n var res=1;\r\n if (txt.length != 10){res=0;}\r\n if (txt.indexOf(\"/\") != 2){res=0;}\r\n if (txt.lastIndexOf(\"/\") != 5){res=0;}\r\n tmp=parseInt(txt.substring(0,2),10);\r\n if ((tmp > 31)||(tmp < 1)||(isNaN(tmp))){res=0;}\r\n tmp=parseInt(txt.substring(3,5),10);\r\n if ((tmp > 12)||(tmp < 1)||(isNaN(tmp))){res=0;}\r\n tmp=parseInt(txt.substring(6,10),10);\r\n if ((tmp < 1)||(isNaN(tmp))){res=0;}\r\n if (txt.length == 0){res=1;}\r\n if (res == 0){\r\n alert(\"Please enter a correct Date \\012Format: dd/mm/yyyy\");\r\n return 0;\r\n }\r\n return 1; \r\n}','0000-00-00','0000-00-00','',''); +INSERT INTO sbmCHECKS VALUES ('AUCheck','function AUCheck(txt) {\r\n var res=1;\r\n tmp=txt.indexOf(\"\\015\");\r\n while (tmp != -1) {\r\n left=txt.substring(0,tmp);\r\n right=txt.substring(tmp+2,txt.length);\r\n txt=left + \"\\012\" + right;\r\n tmp=txt.indexOf(\"\\015\");\r\n }\r\n tmp=txt.indexOf(\"\\012\");\r\n if (tmp==-1){\r\n line=txt;\r\n txt=\'\';}\r\n else{\r\n line=txt.substring(0,tmp);\r\n txt=txt.substring(tmp+1,txt.length);}\r\n while (line != \"\"){\r\n coma=line.indexOf(\",\");\r\n left=line.substring(0,coma);\r\n right=line.substring(coma+1,line.length);\r\n coma2=right.indexOf(\",\");\r\n space=right.indexOf(\" \");\r\n if ((coma==-1)||(left==\"\")||(right==\"\")||(space!=0)||(coma2!=-1)){\r\n res=0;\r\n error_log=line;\r\n }\r\n tmp=txt.indexOf(\"\\012\");\r\n if (tmp==-1){\r\n line=txt;\r\n txt=\'\';}\r\n else{\r\n line=txt.substring(0,tmp-1);\r\n txt=txt.substring(tmp+1,txt.length);}\r\n }\r\n if (res == 0){\r\n alert(\"This author name cannot be managed \\: \\012\\012\" + error_log + \" \\012\\012It is not in the required format!\\012Put one author per line and a comma (,) between the name and the firstname initial letters. \\012The name is going first, followed by the firstname initial letters.\\012Do not forget the whitespace after the comma!!!\\012\\012Example \\: Put\\012\\012Le Meur, J Y \\012Baron, T \\012\\012for\\012\\012Le Meur Jean-Yves & Baron Thomas.\");\r\n return 0;\r\n } \r\n return 1; \r\n}','1998-08-18','0000-00-00','',''); +INSERT INTO sbmCHECKS VALUES ('DatCheckNew','function DatCheckNew(txt) {\r\n var res=1;\r\n if (txt.length != 10){res=0;}\r\n if (txt.indexOf(\"/\") != 2){res=0;}\r\n if (txt.lastIndexOf(\"/\") != 5){res=0;}\r\n tmp=parseInt(txt.substring(0,2),10);\r\n if ((tmp > 31)||(tmp < 1)||(isNaN(tmp))){res=0;}\r\n tmp=parseInt(txt.substring(3,5),10);\r\n if ((tmp > 12)||(tmp < 1)||(isNaN(tmp))){res=0;}\r\n tmp=parseInt(txt.substring(6,10),10);\r\n if ((tmp < 1)||(isNaN(tmp))){res=0;}\r\n if (txt.length == 0){res=1;}\r\n if (res == 0){\r\n alert(\"Please enter a correct Date \\012Format: dd/mm/yyyy\");\r\n return 0;\r\n }\r\n return 1; \r\n}','0000-00-00','0000-00-00','',''); INSERT INTO sbmFORMATEXTENSION VALUES ('WORD','.doc'); INSERT INTO sbmFORMATEXTENSION VALUES ('PostScript','.ps'); INSERT INTO sbmFORMATEXTENSION VALUES ('PDF','.pdf'); INSERT INTO sbmFORMATEXTENSION VALUES ('JPEG','.jpg'); INSERT INTO sbmFORMATEXTENSION VALUES ('JPEG','.jpeg'); INSERT INTO sbmFORMATEXTENSION VALUES ('GIF','.gif'); INSERT INTO sbmFORMATEXTENSION VALUES ('PPT','.ppt'); INSERT INTO sbmFORMATEXTENSION VALUES ('HTML','.htm'); INSERT INTO sbmFORMATEXTENSION VALUES ('HTML','.html'); INSERT INTO sbmFORMATEXTENSION VALUES ('Latex','.tex'); INSERT INTO sbmFORMATEXTENSION VALUES ('Compressed PostScript','.ps.gz'); INSERT INTO sbmFORMATEXTENSION VALUES ('Tarred Tex (.tar)','.tar'); INSERT INTO sbmFORMATEXTENSION VALUES ('Text','.txt'); INSERT INTO sbmFUNDESC VALUES ('Get_Report_Number','edsrn'); INSERT INTO sbmFUNDESC VALUES ('Send_Modify_Mail','addressesMBI'); INSERT INTO sbmFUNDESC VALUES ('Send_Modify_Mail','sourceDoc'); INSERT INTO sbmFUNDESC VALUES ('Register_Approval_Request','categ_file_appreq'); INSERT INTO sbmFUNDESC VALUES ('Register_Approval_Request','categ_rnseek_appreq'); INSERT INTO sbmFUNDESC VALUES ('Register_Approval_Request','note_file_appreq'); INSERT INTO sbmFUNDESC VALUES ('Register_Referee_Decision','decision_file'); INSERT INTO sbmFUNDESC VALUES ('Withdraw_Approval_Request','categ_file_withd'); INSERT INTO sbmFUNDESC VALUES ('Withdraw_Approval_Request','categ_rnseek_withd'); INSERT INTO sbmFUNDESC VALUES ('Report_Number_Generation','edsrn'); INSERT INTO sbmFUNDESC VALUES ('Report_Number_Generation','autorngen'); INSERT INTO sbmFUNDESC VALUES ('Report_Number_Generation','rnin'); INSERT INTO sbmFUNDESC VALUES ('Report_Number_Generation','counterpath'); INSERT INTO sbmFUNDESC VALUES ('Report_Number_Generation','rnformat'); INSERT INTO sbmFUNDESC VALUES ('Report_Number_Generation','yeargen'); INSERT INTO sbmFUNDESC VALUES ('Report_Number_Generation','nblength'); INSERT INTO sbmFUNDESC VALUES ('Mail_Approval_Request_to_Referee','categ_file_appreq'); INSERT INTO sbmFUNDESC VALUES ('Mail_Approval_Request_to_Referee','categ_rnseek_appreq'); INSERT INTO sbmFUNDESC VALUES ('Mail_Approval_Request_to_Referee','edsrn'); INSERT INTO sbmFUNDESC VALUES ('Mail_Approval_Withdrawn_to_Referee','categ_file_withd'); INSERT INTO sbmFUNDESC VALUES ('Mail_Approval_Withdrawn_to_Referee','categ_rnseek_withd'); INSERT INTO sbmFUNDESC VALUES ('Mail_Submitter','authorfile'); INSERT INTO sbmFUNDESC VALUES ('Mail_Submitter','status'); INSERT INTO sbmFUNDESC VALUES ('Send_Approval_Request','authorfile'); INSERT INTO sbmFUNDESC VALUES ('Create_Modify_Interface','fieldnameMBI'); INSERT INTO sbmFUNDESC VALUES ('Send_Modify_Mail','fieldnameMBI'); INSERT INTO sbmFUNDESC VALUES ('Update_Approval_DB','categformatDAM'); INSERT INTO sbmFUNDESC VALUES ('Update_Approval_DB','decision_file'); INSERT INTO sbmFUNDESC VALUES ('Send_SRV_Mail','categformatDAM'); INSERT INTO sbmFUNDESC VALUES ('Send_SRV_Mail','addressesSRV'); INSERT INTO sbmFUNDESC VALUES ('Send_Approval_Request','directory'); INSERT INTO sbmFUNDESC VALUES ('Send_Approval_Request','categformatDAM'); INSERT INTO sbmFUNDESC VALUES ('Send_Approval_Request','addressesDAM'); INSERT INTO sbmFUNDESC VALUES ('Send_Approval_Request','titleFile'); INSERT INTO sbmFUNDESC VALUES ('Send_APP_Mail','edsrn'); INSERT INTO sbmFUNDESC VALUES ('Mail_Submitter','titleFile'); INSERT INTO sbmFUNDESC VALUES ('Send_Modify_Mail','emailFile'); INSERT INTO sbmFUNDESC VALUES ('Get_Info','authorFile'); INSERT INTO sbmFUNDESC VALUES ('Get_Info','emailFile'); INSERT INTO sbmFUNDESC VALUES ('Get_Info','titleFile'); INSERT INTO sbmFUNDESC VALUES ('Make_Modify_Record','modifyTemplate'); INSERT INTO sbmFUNDESC VALUES ('Send_APP_Mail','addressesAPP'); INSERT INTO sbmFUNDESC VALUES ('Send_APP_Mail','categformatAPP'); INSERT INTO sbmFUNDESC VALUES ('Send_APP_Mail','newrnin'); INSERT INTO sbmFUNDESC VALUES ('Send_APP_Mail','decision_file'); INSERT INTO sbmFUNDESC VALUES ('Send_APP_Mail','comments_file'); INSERT INTO sbmFUNDESC VALUES ('CaseEDS','casevariable'); INSERT INTO sbmFUNDESC VALUES ('CaseEDS','casevalues'); INSERT INTO sbmFUNDESC VALUES ('CaseEDS','casesteps'); INSERT INTO sbmFUNDESC VALUES ('CaseEDS','casedefault'); INSERT INTO sbmFUNDESC VALUES ('Send_SRV_Mail','noteFile'); INSERT INTO sbmFUNDESC VALUES ('Send_SRV_Mail','emailFile'); INSERT INTO sbmFUNDESC VALUES ('Mail_Submitter','emailFile'); INSERT INTO sbmFUNDESC VALUES ('Mail_Submitter','edsrn'); INSERT INTO sbmFUNDESC VALUES ('Mail_Submitter','newrnin'); -INSERT INTO sbmFUNDESC VALUES ('Upload_Files','maxsize'); -INSERT INTO sbmFUNDESC VALUES ('Upload_Files','minsize'); -INSERT INTO sbmFUNDESC VALUES ('Upload_Files','iconsize'); -INSERT INTO sbmFUNDESC VALUES ('Upload_Files','type'); INSERT INTO sbmFUNDESC VALUES ('Make_Record','sourceTemplate'); INSERT INTO sbmFUNDESC VALUES ('Make_Record','createTemplate'); INSERT INTO sbmFUNDESC VALUES ('Print_Success','edsrn'); INSERT INTO sbmFUNDESC VALUES ('Print_Success','newrnin'); INSERT INTO sbmFUNDESC VALUES ('Print_Success','status'); INSERT INTO sbmFUNDESC VALUES ('Make_Modify_Record','sourceTemplate'); INSERT INTO sbmFUNDESC VALUES ('Move_Files_to_Storage','documenttype'); INSERT INTO sbmFUNDESC VALUES ('Move_Files_to_Storage','iconsize'); INSERT INTO sbmFUNDESC VALUES ('Move_Files_to_Storage','paths_and_suffixes'); INSERT INTO sbmFUNDESC VALUES ('Move_Files_to_Storage','rename'); INSERT INTO sbmFUNDESC VALUES ('Stamp_Uploaded_Files','files_to_be_stamped'); INSERT INTO sbmFUNDESC VALUES ('Stamp_Uploaded_Files','latex_template'); INSERT INTO sbmFUNDESC VALUES ('Stamp_Uploaded_Files','latex_template_vars'); INSERT INTO sbmFUNDESC VALUES ('Stamp_Uploaded_Files','stamp'); INSERT INTO sbmFUNDESC VALUES ('Make_Dummy_MARC_XML_Record','dummyrec_source_tpl'); INSERT INTO sbmFUNDESC VALUES ('Make_Dummy_MARC_XML_Record','dummyrec_create_tpl'); INSERT INTO sbmFUNDESC VALUES ('Print_Success_APP','decision_file'); INSERT INTO sbmFUNDESC VALUES ('Print_Success_APP','newrnin'); INSERT INTO sbmFUNDESC VALUES ('Send_Delete_Mail','edsrn'); INSERT INTO sbmFUNDESC VALUES ('Send_Delete_Mail','record_managers'); INSERT INTO sbmFUNDESC VALUES ('Second_Report_Number_Generation','2nd_rn_file'); INSERT INTO sbmFUNDESC VALUES ('Second_Report_Number_Generation','2nd_rn_format'); INSERT INTO sbmFUNDESC VALUES ('Second_Report_Number_Generation','2nd_rn_yeargen'); INSERT INTO sbmFUNDESC VALUES ('Second_Report_Number_Generation','2nd_rncateg_file'); INSERT INTO sbmFUNDESC VALUES ('Second_Report_Number_Generation','2nd_counterpath'); INSERT INTO sbmFUNDESC VALUES ('Second_Report_Number_Generation','2nd_nb_length'); INSERT INTO sbmFUNDESC VALUES ('Stamp_Replace_Single_File_Approval','file_to_be_stamped'); INSERT INTO sbmFUNDESC VALUES ('Stamp_Replace_Single_File_Approval','latex_template'); INSERT INTO sbmFUNDESC VALUES ('Stamp_Replace_Single_File_Approval','latex_template_vars'); INSERT INTO sbmFUNDESC VALUES ('Stamp_Replace_Single_File_Approval','new_file_name'); INSERT INTO sbmFUNDESC VALUES ('Stamp_Replace_Single_File_Approval','stamp'); INSERT INTO sbmFUNDESC VALUES ('Move_FCKeditor_Files_to_Storage','input_fields'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','maxsize'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','minsize'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','doctypes'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','restrictions'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canDeleteDoctypes'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canReviseDoctypes'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canDescribeDoctypes'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canCommentDoctypes'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canKeepDoctypes'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canAddFormatDoctypes'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canRestrictDoctypes'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canRenameDoctypes'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canNameNewFiles'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','createRelatedFormats'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','keepDefault'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','showLinks'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','fileLabel'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','filenameLabel'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','descriptionLabel'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','commentLabel'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','restrictionLabel'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','startDoc'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','endDoc'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','defaultFilenameDoctypes'); INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','maxFilesDoctypes'); INSERT INTO sbmFUNDESC VALUES ('Move_Uploaded_Files_to_Storage','iconsize'); INSERT INTO sbmFUNDESC VALUES ('Move_Uploaded_Files_to_Storage','createIconDoctypes'); INSERT INTO sbmFUNDESC VALUES ('Move_Uploaded_Files_to_Storage','forceFileRevision'); INSERT INTO sbmGFILERESULT VALUES ('HTML','HTML document'); INSERT INTO sbmGFILERESULT VALUES ('WORD','data'); INSERT INTO sbmGFILERESULT VALUES ('PDF','PDF document'); INSERT INTO sbmGFILERESULT VALUES ('PostScript','PostScript document'); INSERT INTO sbmGFILERESULT VALUES ('PostScript','data '); INSERT INTO sbmGFILERESULT VALUES ('PostScript','HP Printer Job Language data'); INSERT INTO sbmGFILERESULT VALUES ('jpg','JPEG image'); INSERT INTO sbmGFILERESULT VALUES ('Compressed PostScript','gzip compressed data'); INSERT INTO sbmGFILERESULT VALUES ('Tarred Tex (.tar)','tar archive'); INSERT INTO sbmGFILERESULT VALUES ('JPEG','JPEG image'); INSERT INTO sbmGFILERESULT VALUES ('GIF','GIF'); INSERT INTO collectiondetailedrecordpagetabs VALUES (8, 'usage;comments;metadata'); INSERT INTO collectiondetailedrecordpagetabs VALUES (19, 'usage;comments;metadata'); INSERT INTO collectiondetailedrecordpagetabs VALUES (18, 'usage;comments;metadata'); INSERT INTO collectiondetailedrecordpagetabs VALUES (17, 'usage;comments;metadata'); -- end of file diff --git a/modules/webaccess/lib/access_control_config.py b/modules/webaccess/lib/access_control_config.py index b10b367dd..506eb99f1 100644 --- a/modules/webaccess/lib/access_control_config.py +++ b/modules/webaccess/lib/access_control_config.py @@ -1,281 +1,282 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDS Invenio Access Control Config. """ __revision__ = \ "$Id$" # pylint: disable-msg=C0301 from invenio.config import CFG_SITE_NAME, CFG_SITE_URL, CFG_SITE_LANG, \ CFG_SITE_SECURE_URL, CFG_SITE_SUPPORT_EMAIL, CFG_CERN_SITE import cPickle from zlib import compress from invenio.messages import gettext_set_language class InvenioWebAccessFireroleError(Exception): """Just an Exception to discover if it's a FireRole problem""" pass # VALUES TO BE EXPORTED # CURRENTLY USED BY THE FILES access_control_engine.py access_control_admin.py webaccessadmin_lib.py # name of the role giving superadmin rights SUPERADMINROLE = 'superadmin' # name of the webaccess webadmin role WEBACCESSADMINROLE = 'webaccessadmin' # name of the action allowing roles to access the web administrator interface WEBACCESSACTION = 'cfgwebaccess' # name of the action allowing roles to access the web administrator interface VIEWRESTRCOLL = 'viewrestrcoll' # name of the action allowing roles to delegate the rights to other roles # ex: libraryadmin to delegate libraryworker DELEGATEADDUSERROLE = 'accdelegaterole' # max number of users to display in the drop down selects MAXSELECTUSERS = 25 # max number of users to display in a page (mainly for user area) MAXPAGEUSERS = 25 # default role definition, source: CFG_ACC_EMPTY_ROLE_DEFINITION_SRC = 'deny all' # default role definition, compiled: CFG_ACC_EMPTY_ROLE_DEFINITION_OBJ = (False, False, ()) # default role definition, compiled and serialized: CFG_ACC_EMPTY_ROLE_DEFINITION_SER = None # List of tags containing (multiple) emails of users who should authorize # to access the corresponding record regardless of collection restrictions. if CFG_CERN_SITE: CFG_ACC_GRANT_AUTHOR_RIGHTS_TO_EMAILS_IN_TAGS = ['859__f', '270__m'] else: CFG_ACC_GRANT_AUTHOR_RIGHTS_TO_EMAILS_IN_TAGS = ['8560_f'] # Use external source for access control? # Atleast one must be added # Adviced not to change the name, since it is used to identify the account # Format is: System name: (System class, Default True/Flase), atleast one # must be default CFG_EXTERNAL_AUTHENTICATION = {"Local" : (None, True)} # Variables to set to the SSO Authentication name if using SSO CFG_EXTERNAL_AUTH_USING_SSO = False CFG_EXTERNAL_AUTH_LOGOUT_SSO = None if CFG_CERN_SITE: if True: import external_authentication_sso as ea_sso CFG_EXTERNAL_AUTH_USING_SSO = "CERN" # Link to reach in order to logout from SSO CFG_EXTERNAL_AUTH_LOGOUT_SSO = 'https://login.cern.ch/adfs/ls/?wa=wsignout1.0' CFG_EXTERNAL_AUTHENTICATION = {CFG_EXTERNAL_AUTH_USING_SSO : (ea_sso.ExternalAuthSSO(), True)} else: import external_authentication_cern as ea_cern CFG_EXTERNAL_AUTHENTICATION = {"Local": (None, False), \ "CERN": (ea_cern.ExternalAuthCern(), True)} # default data for the add_default_settings function # Note: by default the definition is set to deny any. This won't be a problem # because userid directly connected with roles will still be allowed. # roles # name description definition DEF_ROLES = ((SUPERADMINROLE, 'superuser with all rights', 'deny any'), (WEBACCESSADMINROLE, 'WebAccess administrator', 'deny any'), ('anyuser', 'Any user', 'allow any'), ('basketusers', 'Users who can use baskets', 'allow any'), ('loanusers', 'Users who can use loans', 'allow any'), ('groupusers', 'Users who can use groups', 'allow any'), ('alertusers', 'Users who can use alerts', 'allow any'), ('messageusers', 'Users who can use messages', 'allow any'), ('holdingsusers', 'Users who can view holdings', 'allow any'), ('statisticsusers', 'Users who can view statistics', 'allow any')) # Demo site roles DEF_DEMO_ROLES = (('photocurator', 'Photo collection curator', 'deny any'), ('thesesviewer', 'Theses viewer', 'allow group "Theses viewers"\nallow apache_group "theses"'), ('thesescurator', 'Theses collection curator', 'deny any'), ('referee_DEMOBOO_*', 'Book collection curator', 'deny any'), ('restrictedpicturesviewer', 'Restricted pictures viewer', 'deny any'), ('curator', 'Curator', 'deny any'), ('basketusers', 'Users who can use baskets', 'deny email "hyde@cds.cern.ch"\nallow any'), ('submit_DEMOJRN_*', 'Users who can submit (and modify) "Atlantis Times" articles', 'deny all'), ('atlantiseditor', 'Users who can configure "Atlantis Times" journal', 'deny all'), ('commentmoderator', 'Users who can moderate comments', 'deny all')) DEF_DEMO_USER_ROLES = (('jekyll@cds.cern.ch', 'thesesviewer'), ('dorian.gray@cds.cern.ch', 'referee_DEMOBOO_*'), ('balthasar.montague@cds.cern.ch', 'curator'), ('romeo.montague@cds.cern.ch', 'restrictedpicturesviewer'), ('romeo.montague@cds.cern.ch', 'thesescurator'), ('juliet.capulet@cds.cern.ch', 'restrictedpicturesviewer'), ('juliet.capulet@cds.cern.ch', 'photocurator'), ('romeo.montague@cds.cern.ch', 'submit_DEMOJRN_*'), ('juliet.capulet@cds.cern.ch', 'submit_DEMOJRN_*'), ('balthasar.montague@cds.cern.ch', 'atlantiseditor')) # users # list of e-mail addresses DEF_USERS = [] # actions # name desc allowedkeywords optional DEF_ACTIONS = ( ('cfgwebsearch', 'configure WebSearch', '', 'no'), ('cfgbibformat', 'configure BibFormat', '', 'no'), ('cfgbibknowledge', 'configure BibKnowledge', '', 'no'), ('cfgwebsubmit', 'configure WebSubmit', '', 'no'), ('cfgbibrank', 'configure BibRank', '', 'no'), ('cfgwebcomment', 'configure WebComment', '', 'no'), ('cfgoaiharvest', 'configure OAI Harvest', '', 'no'), ('cfgoairepository', 'configure OAI Repository', '', 'no'), ('cfgbibindex', 'configure BibIndex', '', 'no'), + ('cfgbibexport', 'configure BibExport', '', 'no'), ('runbibindex', 'run BibIndex', '', 'no'), ('runbibupload', 'run BibUpload', '', 'no'), ('runwebcoll', 'run webcoll', 'collection', 'yes'), ('runbibformat', 'run BibFormat', 'format', 'yes'), ('runbibclassify', 'run BibClassify', 'taxonomy', 'yes'), ('runbibtaskex', 'run BibTaskEx example', '', 'no'), ('runbibrank', 'run BibRank', '', 'no'), ('runoaiharvest', 'run oaiharvest task', '', 'no'), ('runoairepository', 'run oairepositoryupdater task', '', 'no'), ('runbibedit', 'run Record Editor', 'collection', 'yes'), ('runbibeditmulti', 'run Multi-Record Editor', '', 'no'), ('runbibmerge', 'run Record Merger', '', 'no'), ('runwebstatadmin', 'run WebStadAdmin', '', 'no'), ('runinveniogc', 'run InvenioGC', '', 'no'), + ('runbibexport', 'run BibExport', '', 'no'), ('referee', 'referee document type doctype/category categ', 'doctype,categ', 'yes'), ('submit', 'use webSubmit', 'doctype,act,categ', 'yes'), ('viewrestrdoc', 'view restricted document', 'status', 'no'), (WEBACCESSACTION, 'configure WebAccess', '', 'no'), (DELEGATEADDUSERROLE, 'delegate subroles inside WebAccess', 'role', 'no'), (VIEWRESTRCOLL, 'view restricted collection', 'collection', 'no'), ('cfgwebjournal', 'configure WebJournal', 'name,with_editor_rights', 'no'), ('viewcomment', 'view comments', 'collection', 'no'), ('sendcomment', 'send comments', 'collection', 'no'), ('attachcommentfile', 'attach files to comments', 'collection', 'no'), ('attachsubmissionfile', 'upload files to drop box during submission', '', 'no'), ('cfgbibexport', 'configure BibExport', '', 'no'), ('runbibexport', 'run BibExport', '', 'no'), - ('fulltext', 'administrate Fulltext', '', 'no'), ('usebaskets', 'use baskets', '', 'no'), ('useloans', 'use loans', '', 'no'), ('usegroups', 'use groups', '', 'no'), ('usealerts', 'use alerts', '', 'no'), ('usemessages', 'use messages', '', 'no'), ('viewholdings', 'view holdings', 'collection', 'yes'), ('viewstatistics', 'view statistics', 'collection', 'yes'), ('runbibcirculation', 'run BibCirculation', '', 'no'), ('moderatecomments', 'moderate comments', 'collection', 'no') ) # Default authorizations # role action arguments DEF_AUTHS = (('basketusers', 'usebaskets', {}), ('loanusers', 'useloans', {}), ('groupusers', 'usegroups', {}), ('alertusers', 'usealerts', {}), ('messageusers', 'usemessages', {}), ('holdingsusers', 'viewholdings', {}), ('statisticsusers', 'viewstatistics', {})) # Demo site authorizations # role action arguments DEF_DEMO_AUTHS = ( ('photocurator', 'runwebcoll', {'collection': 'Pictures'}), ('restrictedpicturesviewer', 'viewrestrdoc', {'status': 'restricted_picture'}), ('thesesviewer', VIEWRESTRCOLL, {'collection': 'Theses'}), ('referee_DEMOBOO_*', 'referee', {'doctype': 'DEMOBOO', 'categ': '*'}), ('curator', 'cfgbibknowledge', {}), ('curator', 'runbibedit', {}), ('curator', 'runbibeditmulti', {}), ('curator', 'runbibmerge', {}), ('thesescurator', 'runbibedit', {'collection': 'Theses'}), ('thesescurator', VIEWRESTRCOLL, {'collection': 'Theses'}), ('photocurator', 'runbibedit', {'collection': 'Pictures'}), ('referee_DEMOBOO_*', 'runbibedit', {'collection': 'Books'}), ('submit_DEMOJRN_*', 'submit', {'doctype': 'DEMOJRN', 'act': 'SBI', 'categ': '*'}), ('submit_DEMOJRN_*', 'submit', {'doctype': 'DEMOJRN', 'act': 'MBI', 'categ': '*'}), ('submit_DEMOJRN_*', 'cfgwebjournal', {'name': 'AtlantisTimes', 'with_editor_rights': 'no'}), ('atlantiseditor', 'cfgwebjournal', {'name': 'AtlantisTimes', 'with_editor_rights': 'yes'}) ) _ = gettext_set_language(CFG_SITE_LANG) # Activities (i.e. actions) for which exists an administrative web interface. CFG_ACC_ACTIVITIES_URLS = { 'runbibedit' : (_("Run Record Editor"), "%s/record/edit/?ln=%%s" % CFG_SITE_URL), 'runbibeditmulti' : (_("Run Multi-Record Editor"), "%s/record/multiedit/?ln=%%s" % CFG_SITE_URL), 'runbibmerge' : (_("Run Multi-Record Editor"), "%s/record/multiedit/?ln=%%s" % CFG_SITE_URL), 'cfgbibknowledge' : (_("Configure Bibknowledge"), "%s/kb?ln=%%s" % CFG_SITE_URL), 'cfgbibformat' : (_("Configure BibFormat"), "%s/admin/bibformat/bibformatadmin.py?ln=%%s" % CFG_SITE_URL), 'cfgoaiharvest' : (_("Configure OAI Harvest"), "%s/admin/bibharvest/oaiharvestadmin.py?ln=%%s" % CFG_SITE_URL), 'cfgoairepository' : (_("Configure OAI Repository"), "%s/admin/bibharvest/oairepositoryadmin.py?ln=%%s" % CFG_SITE_URL), 'cfgbibindex' : (_("Configure BibIndex"), "%s/admin/bibindex/bibindexadmin.py?ln=%%s" % CFG_SITE_URL), 'cfgbibrank' : (_("Configure BibRank"), "%s/admin/bibrank/bibrankadmin.py?ln=%%s" % CFG_SITE_URL), 'cfgwebaccess' : (_("Configure WebAccess"), "%s/admin/webaccess/webaccessadmin.py?ln=%%s" % CFG_SITE_URL), 'cfgwebcomment' : (_("Configure WebComment"), "%s/admin/webcomment/webcommentadmin.py?ln=%%s" % CFG_SITE_URL), 'cfgwebsearch' : (_("Configure WebSearch"), "%s/admin/websearch/websearchadmin.py?ln=%%s" % CFG_SITE_URL), 'cfgwebsubmit' : (_("Configure WebSubmit"), "%s/admin/websubmit/websubmitadmin.py?ln=%%s" % CFG_SITE_URL), 'runbibcirculation' : (_("Run BibCirculation"), "%s/admin/bibcirculation/bibcirculationadmin.py?ln=%%s" % CFG_SITE_URL) } CFG_WEBACCESS_MSGS = { 0: 'Try to login with another account.' % (CFG_SITE_SECURE_URL), 1: '
    If you think this is not correct, please contact: %s' % (CFG_SITE_SUPPORT_EMAIL, CFG_SITE_SUPPORT_EMAIL), 2: '
    If you have any questions, please write to %s' % (CFG_SITE_SUPPORT_EMAIL, CFG_SITE_SUPPORT_EMAIL), 3: 'Guest users are not allowed, please login.' % CFG_SITE_SECURE_URL, 4: 'The site is temporarily closed for maintenance. Please come back soon.', 5: 'Authorization failure', 6: '%s temporarily closed' % CFG_SITE_NAME, 7: 'This functionality is temporarily closed due to server maintenance. Please use only the search engine in the meantime.', 8: 'Functionality temporarily closed' } CFG_WEBACCESS_WARNING_MSGS = { 0: 'Authorization granted', - 1: 'Error(1): You are not authorized to perform this action.', - 2: 'Error(2): You are not authorized to perform any action.', - 3: 'Error(3): The action %s does not exist.', - 4: 'Error(4): Unexpected error occurred.', - 5: 'Error(5): Missing mandatory keyword argument(s) for this action.', - 6: 'Error(6): Guest accounts are not authorized to perform this action.', - 7: 'Error(7): Not enough arguments, user ID and action name required.', - 8: 'Error(8): Incorrect keyword argument(s) for this action.', - 9: """Error(9): Account '%s' is not yet activated.""", - 10: """Error(10): You were not authorized by the authentication method '%s'.""", - 11: """Error(11): The selected login method '%s' is not the default method for this account, please try another one.""", - 12: """Error(12): Selected login method '%s' does not exist.""", - 13: """Error(13): Could not register '%s' account.""", - 14: """Error(14): Could not login using '%s', because this user is unknown.""", - 15: """Error(15): Could not login using your '%s' account, because you have introduced a wrong password.""", - 16: """Error(16): External authentication troubles using '%s' (maybe temporary network problems).""", - 17: """Error(17): You have not yet confirmed the email address for the '%s' authentication method.""", - 18: """Error(18): The administrator has not yet activated your account for the '%s' authentication method.""", - 19: """Error(19): The site is having troubles in sending you an email for confirming your email address. The error has been logged and will be taken care of as soon as possible.""", - 20: """Error(20): No roles are authorized to perform action %s with the given parameters.""" + 1: 'You are not authorized to perform this action.', + 2: 'You are not authorized to perform any action.', + 3: 'The action %s does not exist.', + 4: 'Unexpected error occurred.', + 5: 'Missing mandatory keyword argument(s) for this action.', + 6: 'Guest accounts are not authorized to perform this action.', + 7: 'Not enough arguments, user ID and action name required.', + 8: 'Incorrect keyword argument(s) for this action.', + 9: """Account '%s' is not yet activated.""", + 10: """You were not authorized by the authentication method '%s'.""", + 11: """The selected login method '%s' is not the default method for this account, please try another one.""", + 12: """Selected login method '%s' does not exist.""", + 13: """Could not register '%s' account.""", + 14: """Could not login using '%s', because this user is unknown.""", + 15: """Could not login using your '%s' account, because you have introduced a wrong password.""", + 16: """External authentication troubles using '%s' (maybe temporary network problems).""", + 17: """You have not yet confirmed the email address for the '%s' authentication method.""", + 18: """The administrator has not yet activated your account for the '%s' authentication method.""", + 19: """The site is having troubles in sending you an email for confirming your email address. The error has been logged and will be taken care of as soon as possible.""", + 20: """No roles are authorized to perform action %s with the given parameters.""" } diff --git a/modules/webhelp/web/admin/howto/Makefile.am b/modules/webhelp/web/admin/howto/Makefile.am index 3195974e7..4aab3232c 100644 --- a/modules/webhelp/web/admin/howto/Makefile.am +++ b/modules/webhelp/web/admin/howto/Makefile.am @@ -1,27 +1,28 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -## General Public License for more details. +## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. webdoclibdir = $(libdir)/webdoc/invenio/admin webdoclib_DATA = howto.webdoc \ howto-run.webdoc \ howto-marc.webdoc \ - howto-migrate.webdoc + howto-migrate.webdoc \ + howto-fulltext.webdoc EXTRA_DIST = $(webdoclib_DATA) CLEANFILES = *~ *.tmp diff --git a/modules/webhelp/web/admin/howto/howto-fulltext.webdoc b/modules/webhelp/web/admin/howto/howto-fulltext.webdoc new file mode 100644 index 000000000..c0158f775 --- /dev/null +++ b/modules/webhelp/web/admin/howto/howto-fulltext.webdoc @@ -0,0 +1,329 @@ +## This file is part of CDS Invenio. +## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. +## +## CDS Invenio is free software; you can redistribute it and/or +## modify it under the terms of the GNU General Public License as +## published by the Free Software Foundation; either version 2 of the +## License, or (at your option) any later version. +## +## CDS Invenio is distributed in the hope that it will be useful, but +## WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +## General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., +## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. + + + + + +

    How to manage your fulltext files through BibDocFile

    +
    +Usage: /opt/cds-invenio/bin/bibdocfile [options]
    +
    +Options:
    +  --version             show program's version number and exit
    +  -h, --help            show this help message and exit
    +  -D, --debug
    +  -H, --human-readable  print sizes in human readable format (e.g., 1KB 234MB
    +                        2GB)
    +
    +  Query options:
    +    -r RECIDS, --recids=RECIDS
    +                        matches records by recids, e.g.: --recids=1-3,5-7
    +    -d DOCIDS, --docids=DOCIDS
    +                        matches documents by docids, e.g.: --docids=1-3,5-7
    +    -a, --all           Select all the records
    +    --with-deleted-recs=yes/no/only
    +                        'Yes' to also match deleted records, 'no' to exclude
    +                        them, 'only' to match only deleted ones
    +    --with-deleted-docs=yes/no/only
    +                        'Yes' to also match deleted documents, 'no' to exclude
    +                        them, 'only' to match only deleted ones (e.g. for
    +                        undeletion)
    +    --with-empty-recs=yes/no/only
    +                        'Yes' to also match records without attached
    +                        documents, 'no' to exclude them, 'only' to consider
    +                        only such records (e.g. for statistics)
    +    --with-empty-docs=yes/no/only
    +                        'Yes' to also match documents without attached files,
    +                        'no' to exclude them, 'only' to consider only such
    +                        documents (e.g. for sanity checking)
    +    --with-record-modification-date=date1,date2
    +                        matches records modified date1 and date2; dates can be
    +                        expressed relatively, e.g.:"-5m,2030-2-23 04:40" #
    +                        matches records modified since 5 minutes ago until the
    +                        2030...
    +    --with-record-creation-date=date1,date2
    +                        matches records created between date1 and date2; dates
    +                        can be expressed relatively
    +    --with-document-modification-date=date1,date2
    +                        matches documents modified between date1 and date2;
    +                        dates can be expressed relatively
    +    --with-document-creation-date=date1,date2
    +                        matches documents created between date1 and date2;
    +                        dates can be expressed relatively
    +    --url=URL           matches the document referred by the URL, e.g. "http:/
    +                        /pcsk.cern.ch/record/1/files/foobar.pdf?version=2"
    +    --path=PATH         matches the document referred by the internal
    +                        filesystem path, e.g. /opt/cds-
    +                        dev/var/data/files/g0/1/foobar.pdf\;1
    +    --with-docname=DOCNAME
    +                        matches documents with the given docname (accept
    +                        wildcards)
    +    --with-doctype=DOCTYPE
    +                        matches documents with the given doctype
    +    -p PATTERN, --pattern=PATTERN
    +                        matches records by pattern
    +    -c COLLECTION, --collection=COLLECTION
    +                        matches records by collection
    +    --force             force an action even when it's not necessary e.g.
    +                        textify on an already textified bibdoc.
    +
    +  Actions for getting information:
    +    --get-info          print all the informations about the matched
    +                        record/documents
    +    --get-disk-usage    print disk usage statistics of the matched documents
    +    --get-history       print the matched documents history
    +
    +  Actions for setting information:
    +    --set-doctype=doctype
    +                        specify the new doctype
    +    --set-description=description
    +                        specify a description
    +    --set-comment=comment
    +                        specify a comment
    +    --set-restriction=restriction
    +                        specify a restriction tag
    +    --set-docname=docname
    +                        specifies a new docname for renaming
    +    --unset-comment     remove any comment
    +    --unset-descriptions
    +                        remove any description
    +    --unset-restrictions
    +                        remove any restriction
    +    --hide              hides matched documents and revisions
    +    --unhide            hides matched documents and revisions
    +
    +  Action for revising content:
    +    --append=PATH/URL   specify the URL/path of the file that will appended to
    +                        the bibdoc
    +    --revise=PATH/URL   specify the URL/path of the file that will revise the
    +                        bibdoc
    +    --revert            reverts a document to the specified version
    +    --delete            soft-delete the matched documents (applies to all
    +                        revisions and formats)
    +    --hard-delete       hard-delete the matched documents (applies to matched
    +                        revisions and formats)
    +    --undelete          undelete previosuly soft-deleted documents (applies to
    +                        all revisions and formats)
    +    --purge             purge (i.e. hard-delete previous versions) the matched
    +                        documents
    +    --expunge           expunge (i.e. hard-delete any version and formats) the
    +                        matched documents
    +    --with-versions=VERSION
    +                        specifies the version(s) to be used with hard-delete,
    +                        hide, revert, e.g.: 1-2,3 or all
    +    --with-format=FORMAT
    +                        to specify a format when
    +                        appending/revising/deleting/reverting a document, e.g.
    +                        "pdf"
    +    --with-hide-previous
    +                        when revising, hides previous versions
    +
    +  Actions for housekeeping:
    +    --check-md5         check md5 checksum validity of files
    +    --check-format      check if any format-related inconsistences exists
    +    --check-duplicate-docnames
    +                        check for duplicate docnames associated with the same
    +                        record
    +    --update-md5        update md5 checksum of files
    +    --fix-all           fix inconsistences in filesystem vs database vs MARC
    +    --fix-marc          synchronize MARC after filesystem/database
    +    --fix-format        fix format related inconsistences
    +    --fix-duplicate-docnames
    +                        fix duplicate docnames associated with the same record
    +
    +  Experimental options (do not expect to find them in the next release):
    +    --set-icon=URL/PATH
    +                        attache the specified icon to the matched documents
    +    --unset-icon        remove any icon on the matched documents
    +    --textify           extract text from matched documents and store it for
    +                        later indexing
    +    --with-ocr=yes/no/always
    +                        when used with --textify, wether to perform OCR (yes
    +                        will perform it only if necessary, based on an
    +                        heuristic)
    +
    +
    +Examples:
    +    $ bibdocfile --append foo.tar.gz --recid=1
    +    $ bibdocfile --revise http://foo.com?search=123 --with-docname='sam'
    +            --format=pdf --recid=3 --set-docname='pippo' # revise for record 3
    +                    # the document sam, renaming it to pippo.
    +    $ bibdocfile --delete *sam --all # delete all documents starting ending
    +                                     # with "sam"
    +    $ bibdocfile --undelete -c "Test Collection" # undelete documents for
    +                                                 # the collection
    +    $ bibdocfile --get-info --recids=1-4,6-8 # obtain informations
    +    $ bibdocfile -r 1 --with-docname=foo --set-docname=bar # Rename a document
    +
    + +

    How to convert your fulltext files among different formats

    +
    +Usage: python /opt/cds-invenio/lib/python/invenio/websubmit_file_converter.py [options]
    +
    +Options:
    +  -h, --help            show this help message and exit
    +  -c FILE, --convert=FILE
    +                        convert the specified FILE
    +  -d, --debug           Enable debug information
    +  --special-pdf2hocr2pdf=FILE
    +                        convert the given scanned PDF into a PDF with OCRed
    +                        text
    +  -f FORMAT, --format=FORMAT
    +                        the desired output format
    +  -o OUTPUT_NAME, --output=OUTPUT_NAME
    +                        the desired output FILE (if not specified a new file
    +                        will be generated with the desired output format)
    +  --without-pdfa        don't force creation of PDF/A  PDFs
    +  --without-pdfopt      don't force optimization of PDFs files
    +  --without-ocr         don't force OCR
    +  --can-convert=FORMAT  display all the possible format that is possible to
    +                        generate from the given format
    +  --is-ocr-needed=FILE  check if OCR is needed for the FILE specified
    +  -t TITLE, --title=TITLE
    +                        specify the title (used when creating PDFs)
    +  -l LN, --language=LN  specify the language (used when performing OCR, e.g.
    +                        en, it, fr...)
    +
    +  Examples:
    +    python /opt/cds-invenio/lib/python/invenio/websubmit_file_converter.py \
    +              --convert=foo.docx -f pdf
    +    ## The previous command will generate a PDF/A out of a Microsoft Office
    +    ## Word document
    +
    +    python /opt/cds-invenio/lib/python/invenio/websubmit_file_converter.py \
    +              --special-pdf2hocr2pdf=scanned-foo.pdf --output=ocred-foo.pdf
    +    ## The previous command will generate a new PDF with a text stream
    +    ## extracted via an OCR engine.
    +
    +
    +

    This module is used internally by Invenio to convert from one format to +another

    +

    It can also be invoked manually by a system administrator to obtain the +same level of conversion quality offered by CDS Invenio.

    + +

    How to create icons

    +
    +Usage:
    +  python /opt/cds-invenio/lib/python/invenio/websubmit_icon_creator.py \
    +                           [options] input-file.jpg
    +
    +  websubmit_icon_creator.py is used to create an icon for an image.
    +
    +  Options:
    +   -h, --help                      Print this help.
    +   -V, --version                   Print version information.
    +   -v, --verbose=LEVEL             Verbose level (0=min, 1=default, 9=max).
    +                                    [NOT IMPLEMENTED]
    +   -s, --icon-scale
    +                                   Scaling information for the icon that is to
    +                                   be created. Must be an integer. Defaults to
    +                                   180.
    +   -m, --multipage-icon
    +                                   A flag to indicate that the icon should
    +                                   consist of multiple pages. Will only be
    +                                   respected if the requested icon type is GIF
    +                                   and the input file is a PS or PDF consisting
    +                                   of several pages.
    +   -d, --multipage-icon-delay=VAL
    +                                   If the icon consists of several pages and is
    +                                   an animated GIF, a delay between frames can
    +                                   be specified. Must be an integer. Defaults
    +                                   to 100.
    +   -f, --icon-file-format=FORMAT
    +                                   The file format of the icon to be created.
    +                                   Must be one of:
    +                                       [pdf, gif, jpg, jpeg, ps, png, bmp]
    +                                   Defaults to gif.
    +   -o, --icon-name=XYZ
    +                                   The optional name to be given to the created
    +                                   icon file. If this is omitted, the icon file
    +                                   will be given the same name as the input
    +                                   file, but will be prefixed by "icon-";
    +
    +  Examples:
    +    python /opt/cds-invenio/lib/python/invenio/websubmit_icon_creator.py \
    +              --icon-scale=200 \
    +              --icon-name=test-icon \
    +              --icon-file-format=jpg \
    +              test-image.jpg
    +
    +    python /opt/cds-invenio/lib/python/invenio/websubmit_icon_creator.py \
    +              --icon-scale=200 \
    +              --icon-name=test-icon2 \
    +              --icon-file-format=gif \
    +              --multipage-icon \
    +              --multipage-icon-delay=50 \
    +              test-image2.pdf
    +
    + +

    How to stamp PDFs

    +
    +Usage:
    +  python /opt/cds-invenio/lib/python/invenio/websubmit_file_stamper.py \
    +                           [options] input-file.pdf
    +
    +  websubmit_file_stamper.py  is used to add a "stamp" to a PDF file.
    +  A LaTeX template is used to create the stamp and this stamp is then
    +  concatenated with the original PDF file.
    +  The stamp can take the form of either a separate "cover page" that is
    +  appended to the document; or a "mark" that is applied somewhere either
    +  on the document's first page or on all of its pages.
    +
    +  Options:
    +   -h, --help                Print this help.
    +   -V, --version             Print version information.
    +   -v, --verbose=LEVEL       Verbose level (0=min, 1=default, 9=max).
    +                              [NOT IMPLEMENTED]
    +   -t, --latex-template=PATH
    +                             Path to the LaTeX template file that should be used
    +                             for the creation of the PDF stamp. (Note, if it's
    +                             just a basename, it will be sought first in the
    +                             current working directory, and then in the invenio
    +                             file-stamper templates directory; If there is a
    +                             qualifying path to the template name, it will be
    +                             sought only in that location);
    +   -c, --latex-template-var='VARNAME=VALUE'
    +                             A variable that should be replaced in the LaTeX
    +                             template file with its corresponding value. Of the
    +                             following format:
    +                                 VARNAME=VALUE
    +                             This option is repeatable - one for each template
    +                             variable;
    +   -s, --stamp=STAMP-TYPE
    +                             The type of stamp to be applied to the subject
    +                             file. Must be one of 3 values:
    +                              + "first" - stamp only the first page;
    +                              + "all"   - stamp all pages;
    +                              + "coverpage" - add a cover page to the
    +                                document;
    +                             The default value is "first";
    +   -o, --output-file=XYZ
    +                             The optional name to be given to the finished
    +                             (stamped) file. If this is omitted, the stamped
    +                             file will be given the same name as the input
    +                             file, but will be prefixed by "stamped-";
    +
    +  Example:
    +    python /opt/cds-invenio/lib/python/invenio/websubmit_file_stamper.py \
    +              --latex-template=demo-stamp-left.tex \
    +              --latex-template-var='REPORTNUMBER=TEST-THESIS-2008-019' \
    +              --latex-template-var='DATE=27/02/2008' \
    +              --stamp='first' \
    +              --output-file=testfile_stamped.pdf \
    +              testfile.pdf
    +
    diff --git a/modules/webhelp/web/admin/howto/howto.webdoc b/modules/webhelp/web/admin/howto/howto.webdoc index 1cdad6066..f0965978c 100644 --- a/modules/webhelp/web/admin/howto/howto.webdoc +++ b/modules/webhelp/web/admin/howto/howto.webdoc @@ -1,48 +1,52 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

    The HOWTO guides will give you both short and not-so-short recipes and thoughts on some of the most frequently encountered administrative tasks.

    HOWTO MARC
    Describes how to choose the MARC representation of your metadata and how it will be stored in CDS Invenio.
    HOWTO Migrate
    Describes how to migrate a bunch of your old data from any format you might have into CDS Invenio.
    HOWTO Run
    Describes how to run your CDS Invenio installation and how to take care of its normal operation day by day. +
    HOWTO Manage Fulltext Files + +
    Describes how to manipulate fulltext files within your CDS Invenio +installation.

    Haven't found what you were looking for? Suggest a HOWTO. Welcome to the CDS Invenio Developers' corner. Before diving into the source, make sure you don't miss our user-level and admin-level documentation as well. And now, back to the source, and happy hacking!

    General information, coding practices

    Common Concepts
    Summarizing common terms you will encounter here and there.
    Coding Style
    A policy we try to follow, for good or bad.
    Release Numbering
    Presenting the version numbering scheme adopted for CDS Invenio stable and development releases.
    Directory Organization
    How the source and target directories are organized, where the sources get installed to, what is the visible URL policy, etc.
    Modules Overview
    Presenting a summary of various CDS Invenio modules and their relationships.
    Test Suite
    Describes our unit and regression test suites.

    For more developer-related information, be sure to visit Invenio wiki.

    Module-specific information

    BibClassify Internals
    Describes information useful to understand how BibClassify works, the taxonomy extensions we use, how the keyword extraction algorithm works.
    BibConvert Internals
    Describes information useful to understand how BibConvert works, and the BibConvert functions can be reused.
    BibFormat Internals
    Describes information useful to understand how BibFormat works.
    BibRank Internals
    Describes information useful to understand how the various ranking methods available in bibrank works, and how they can be tweaked to give various output.
    BibEdit Internals
    Describes information useful to manipulate single records.
    MiscUtil Internals
    Describes information useful to understand what can be found inside the miscellaneous utilities module, like database access, error management, date handling library, etc.
    WebJournal Internals
    Describes the WebJournal database and required MARC tags for article records.
    WebSearch Internals
    Describes information useful to understand the search process internals, like the different search stages, the high- and low-level API, etc.
    WebAccess Internals
    Describes information useful to understand the access control process internals, its API, etc.
    WebStyle Internals
    Describes how to customize WebDoc files, etc.
    -
    WebSubmit BibDocFile
    -
    Describes the fulltext document management library.
    +
    WebSubmit Internals
    +
    Describes the internal tools available in WebSubmit.
    Bibliographic Task Howto
    Describes how to customize create a new Bibliographic Task.
    diff --git a/modules/websearch/lib/search_engine.py b/modules/websearch/lib/search_engine.py index 06eef3880..dc8810f41 100644 --- a/modules/websearch/lib/search_engine.py +++ b/modules/websearch/lib/search_engine.py @@ -1,5092 +1,5092 @@ # -*- coding: utf-8 -*- ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. # pylint: disable-msg=C0301 """CDS Invenio Search Engine in mod_python.""" __lastupdated__ = """$Date$""" __revision__ = "$Id$" ## import general modules: import cgi import cStringIO import copy import string import os import re import time import urllib import urlparse import zlib import sys if sys.hexversion < 0x2040000: # pylint: disable-msg=W0622 from sets import Set as set # pylint: enable-msg=W0622 ## import CDS Invenio stuff: from invenio.config import \ CFG_CERN_SITE, \ CFG_INSPIRE_SITE, \ CFG_OAI_ID_FIELD, \ CFG_WEBCOMMENT_ALLOW_REVIEWS, \ CFG_WEBSEARCH_CALL_BIBFORMAT, \ CFG_WEBSEARCH_CREATE_SIMILARLY_NAMED_AUTHORS_LINK_BOX, \ CFG_WEBSEARCH_FIELDS_CONVERT, \ CFG_WEBSEARCH_NB_RECORDS_TO_SORT, \ CFG_WEBSEARCH_SEARCH_CACHE_SIZE, \ CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS, \ CFG_WEBSEARCH_USE_ALEPH_SYSNOS, \ CFG_WEBSEARCH_DEF_RECORDS_IN_GROUPS, \ CFG_WEBSEARCH_FULLTEXT_SNIPPETS, \ CFG_BIBUPLOAD_SERIALIZE_RECORD_STRUCTURE, \ CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG, \ CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS, \ CFG_SITE_LANG, \ CFG_SITE_NAME, \ CFG_LOGDIR, \ CFG_BIBFORMAT_HIDDEN_TAGS, \ CFG_SITE_URL, \ CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS from invenio.search_engine_config import InvenioWebSearchUnknownCollectionError from invenio.bibrecord import create_record, record_get_field_instances from invenio.bibrank_record_sorter import get_bibrank_methods, rank_records, is_method_valid from invenio.bibrank_downloads_similarity import register_page_view_event, calculate_reading_similarity_list from invenio.bibindex_engine_stemmer import stem from invenio.bibindex_engine_tokenizer import wash_author_name, author_name_requires_phrase_search from invenio.bibformat import format_record, format_records, get_output_format_content_type, create_excel from invenio.bibformat_config import CFG_BIBFORMAT_USE_OLD_BIBFORMAT from invenio.bibrank_downloads_grapher import create_download_history_graph_and_box from invenio.data_cacher import DataCacher from invenio.websearch_external_collections import print_external_results_overview, perform_external_collection_search from invenio.access_control_admin import acc_get_action_id from invenio.access_control_config import VIEWRESTRCOLL, \ CFG_ACC_GRANT_AUTHOR_RIGHTS_TO_EMAILS_IN_TAGS from invenio.websearchadminlib import get_detailed_page_tabs from invenio.intbitset import intbitset as HitSet from invenio.dbquery import DatabaseError from invenio.access_control_engine import acc_authorize_action from invenio.errorlib import register_exception from invenio.textutils import encode_for_xml import invenio.template webstyle_templates = invenio.template.load('webstyle') webcomment_templates = invenio.template.load('webcomment') from invenio.bibrank_citation_searcher import get_cited_by_count, calculate_cited_by_list, \ calculate_co_cited_with_list, get_records_with_num_cites, get_self_cited_by from invenio.bibrank_citation_grapher import create_citation_history_graph_and_box from invenio.dbquery import run_sql, run_sql_cached, get_table_update_time, Error from invenio.webuser import getUid, collect_user_info from invenio.webpage import pageheaderonly, pagefooteronly, create_error_box from invenio.messages import gettext_set_language from invenio.search_engine_query_parser import SearchQueryParenthesisedParser, \ InvenioWebSearchQueryParserException, SpiresToInvenioSyntaxConverter from invenio import webinterface_handler_wsgi_utils as apache try: import invenio.template websearch_templates = invenio.template.load('websearch') except: pass from invenio.websearch_external_collections import calculate_hosted_collections_results, do_calculate_hosted_collections_results from invenio.websearch_external_collections_config import CFG_HOSTED_COLLECTION_TIMEOUT_ANTE_SEARCH from invenio.websearch_external_collections_config import CFG_HOSTED_COLLECTION_TIMEOUT_POST_SEARCH from invenio.websearch_external_collections_config import CFG_EXTERNAL_COLLECTION_MAXRESULTS ## global vars: cfg_nb_browse_seen_records = 100 # limit of the number of records to check when browsing certain collection cfg_nicely_ordered_collection_list = 0 # do we propose collection list nicely ordered or alphabetical? ## precompile some often-used regexp for speed reasons: re_word = re.compile('[\s]') re_quotes = re.compile('[\'\"]') re_doublequote = re.compile('\"') re_equal = re.compile('\=') re_logical_and = re.compile('\sand\s', re.I) re_logical_or = re.compile('\sor\s', re.I) re_logical_not = re.compile('\snot\s', re.I) re_operators = re.compile(r'\s([\+\-\|])\s') re_pattern_wildcards_after_spaces = re.compile(r'(\s)[\*\%]+') re_pattern_single_quotes = re.compile("'(.*?)'") re_pattern_double_quotes = re.compile("\"(.*?)\"") re_pattern_regexp_quotes = re.compile("\/(.*?)\/") re_pattern_short_words = re.compile(r'([\s\"]\w{1,3})[\*\%]+') re_pattern_space = re.compile("__SPACE__") re_pattern_today = re.compile("\$TODAY\$") re_pattern_parens = re.compile(r'\([^\)]+\s+[^\)]+\)') re_unicode_lowercase_a = re.compile(unicode(r"(?u)[áàäâãå]", "utf-8")) re_unicode_lowercase_ae = re.compile(unicode(r"(?u)[æ]", "utf-8")) re_unicode_lowercase_e = re.compile(unicode(r"(?u)[éèëê]", "utf-8")) re_unicode_lowercase_i = re.compile(unicode(r"(?u)[íìïî]", "utf-8")) re_unicode_lowercase_o = re.compile(unicode(r"(?u)[óòöôõø]", "utf-8")) re_unicode_lowercase_u = re.compile(unicode(r"(?u)[úùüû]", "utf-8")) re_unicode_lowercase_y = re.compile(unicode(r"(?u)[ýÿ]", "utf-8")) re_unicode_lowercase_c = re.compile(unicode(r"(?u)[çć]", "utf-8")) re_unicode_lowercase_n = re.compile(unicode(r"(?u)[ñ]", "utf-8")) re_unicode_uppercase_a = re.compile(unicode(r"(?u)[ÁÀÄÂÃÅ]", "utf-8")) re_unicode_uppercase_ae = re.compile(unicode(r"(?u)[Æ]", "utf-8")) re_unicode_uppercase_e = re.compile(unicode(r"(?u)[ÉÈËÊ]", "utf-8")) re_unicode_uppercase_i = re.compile(unicode(r"(?u)[ÍÌÏÎ]", "utf-8")) re_unicode_uppercase_o = re.compile(unicode(r"(?u)[ÓÒÖÔÕØ]", "utf-8")) re_unicode_uppercase_u = re.compile(unicode(r"(?u)[ÚÙÜÛ]", "utf-8")) re_unicode_uppercase_y = re.compile(unicode(r"(?u)[Ý]", "utf-8")) re_unicode_uppercase_c = re.compile(unicode(r"(?u)[ÇĆ]", "utf-8")) re_unicode_uppercase_n = re.compile(unicode(r"(?u)[Ñ]", "utf-8")) re_latex_lowercase_a = re.compile("\\\\[\"H'`~^vu=k]\{?a\}?") re_latex_lowercase_ae = re.compile("\\\\ae\\{\\}?") re_latex_lowercase_e = re.compile("\\\\[\"H'`~^vu=k]\\{?e\\}?") re_latex_lowercase_i = re.compile("\\\\[\"H'`~^vu=k]\\{?i\\}?") re_latex_lowercase_o = re.compile("\\\\[\"H'`~^vu=k]\\{?o\\}?") re_latex_lowercase_u = re.compile("\\\\[\"H'`~^vu=k]\\{?u\\}?") re_latex_lowercase_y = re.compile("\\\\[\"']\\{?y\\}?") re_latex_lowercase_c = re.compile("\\\\['uc]\\{?c\\}?") re_latex_lowercase_n = re.compile("\\\\[c'~^vu]\\{?n\\}?") re_latex_uppercase_a = re.compile("\\\\[\"H'`~^vu=k]\\{?A\\}?") re_latex_uppercase_ae = re.compile("\\\\AE\\{?\\}?") re_latex_uppercase_e = re.compile("\\\\[\"H'`~^vu=k]\\{?E\\}?") re_latex_uppercase_i = re.compile("\\\\[\"H'`~^vu=k]\\{?I\\}?") re_latex_uppercase_o = re.compile("\\\\[\"H'`~^vu=k]\\{?O\\}?") re_latex_uppercase_u = re.compile("\\\\[\"H'`~^vu=k]\\{?U\\}?") re_latex_uppercase_y = re.compile("\\\\[\"']\\{?Y\\}?") re_latex_uppercase_c = re.compile("\\\\['uc]\\{?C\\}?") re_latex_uppercase_n = re.compile("\\\\[c'~^vu]\\{?N\\}?") class RestrictedCollectionDataCacher(DataCacher): def __init__(self): def cache_filler(): ret = [] try: viewcollid = acc_get_action_id(VIEWRESTRCOLL) res = run_sql("""SELECT DISTINCT ar.value FROM accROLE_accACTION_accARGUMENT raa JOIN accARGUMENT ar ON raa.id_accARGUMENT = ar.id WHERE ar.keyword = 'collection' AND raa.id_accACTION = %s""", (viewcollid,)) except Exception: # database problems, return empty cache return [] for coll in res: ret.append(coll[0]) return ret def timestamp_verifier(): return max(get_table_update_time('accROLE_accACTION_accARGUMENT'), get_table_update_time('accARGUMENT')) DataCacher.__init__(self, cache_filler, timestamp_verifier) def collection_restricted_p(collection): restricted_collection_cache.recreate_cache_if_needed() return collection in restricted_collection_cache.cache try: restricted_collection_cache.is_ok_p except Exception: restricted_collection_cache = RestrictedCollectionDataCacher() def get_permitted_restricted_collections(user_info): """Return a list of collection that are restricted but for which the user is authorized.""" restricted_collection_cache.recreate_cache_if_needed() ret = [] for collection in restricted_collection_cache.cache: if acc_authorize_action(user_info, 'viewrestrcoll', collection=collection)[0] == 0: ret.append(collection) return ret def is_user_owner_of_record(user_info, recid): """ Check if the user is owner of the record, i.e. he is the submitter and/or belongs to a owner-like group authorized to 'see' the record. @param user_info: the user_info dictionary that describe the user. @type user_info: user_info dictionary @param recid: the record identifier. @type recid: positive integer @return: True if the user is 'owner' of the record; False otherwise @rtype: bool """ authorized_emails_or_group = [] for tag in CFG_ACC_GRANT_AUTHOR_RIGHTS_TO_EMAILS_IN_TAGS: authorized_emails_or_group.extend(get_fieldvalues(recid, tag)) for email_or_group in authorized_emails_or_group: if email_or_group in user_info['group']: return True email = email_or_group.strip().lower() if user_info['email'].strip().lower() == email: return True return False def check_user_can_view_record(user_info, recid): """ Check if the user is authorized to view the given recid. The function grants access in two cases: either user has author rights on this record, or he has view rights to the primary collection this record belongs to. @param user_info: the user_info dictionary that describe the user. @type user_info: user_info dictionary @param recid: the record identifier. @type recid: positive integer @return: (0, ''), when authorization is granted, (>0, 'message') when authorization is not granted @rtype: (int, string) """ record_primary_collection = guess_primary_collection_of_a_record(recid) if collection_restricted_p(record_primary_collection): (auth_code, auth_msg) = acc_authorize_action(user_info, VIEWRESTRCOLL, collection=record_primary_collection) if auth_code == 0 or is_user_owner_of_record(user_info, recid): return (0, '') else: return (auth_code, auth_msg) else: return (0, '') class IndexStemmingDataCacher(DataCacher): """ Provides cache for stemming information for word/phrase indexes. This class is not to be used directly; use function get_index_stemming_language() instead. """ def __init__(self): def cache_filler(): try: res = run_sql("""SELECT id, stemming_language FROM idxINDEX""") except DatabaseError: # database problems, return empty cache return {} return dict(res) def timestamp_verifier(): return get_table_update_time('idxINDEX') DataCacher.__init__(self, cache_filler, timestamp_verifier) try: index_stemming_cache.is_ok_p except Exception: index_stemming_cache = IndexStemmingDataCacher() def get_index_stemming_language(index_id): """Return stemming langugage for given index.""" index_stemming_cache.recreate_cache_if_needed() return index_stemming_cache.cache[index_id] class CollectionRecListDataCacher(DataCacher): """ Provides cache for collection reclist hitsets. This class is not to be used directly; use function get_collection_reclist() instead. """ def __init__(self): def cache_filler(): ret = {} try: res = run_sql("SELECT name,reclist FROM collection") except Exception: # database problems, return empty cache return {} for name, reclist in res: ret[name] = None # this will be filled later during runtime by calling get_collection_reclist(coll) return ret def timestamp_verifier(): return get_table_update_time('collection') DataCacher.__init__(self, cache_filler, timestamp_verifier) try: if not collection_reclist_cache.is_ok_p: raise Exception except Exception: collection_reclist_cache = CollectionRecListDataCacher() def get_collection_reclist(coll): """Return hitset of recIDs that belong to the collection 'coll'.""" collection_reclist_cache.recreate_cache_if_needed() if not collection_reclist_cache.cache[coll]: # not yet it the cache, so calculate it and fill the cache: set = HitSet() query = "SELECT nbrecs,reclist FROM collection WHERE name=%s" res = run_sql(query, (coll, ), 1) if res: try: set = HitSet(res[0][1]) except: pass collection_reclist_cache.cache[coll] = set # finally, return reclist: return collection_reclist_cache.cache[coll] class SearchResultsCache(DataCacher): """ Provides temporary lazy cache for Search Results. Useful when users click on `next page'. """ def __init__(self): def cache_filler(): return {} def timestamp_verifier(): return '1970-01-01 00:00:00' # lazy cache is always okay; # its filling is governed by # CFG_WEBSEARCH_SEARCH_CACHE_SIZE DataCacher.__init__(self, cache_filler, timestamp_verifier) try: if not search_results_cache.is_ok_p: raise Exception except Exception: search_results_cache = SearchResultsCache() class CollectionI18nNameDataCacher(DataCacher): """ Provides cache for I18N collection names. This class is not to be used directly; use function get_coll_i18nname() instead. """ def __init__(self): def cache_filler(): ret = {} try: res = run_sql("SELECT c.name,cn.ln,cn.value FROM collectionname AS cn, collection AS c WHERE cn.id_collection=c.id AND cn.type='ln'") # ln=long name except Exception: # database problems return {} for c, ln, i18nname in res: if i18nname: if not ret.has_key(c): ret[c] = {} ret[c][ln] = i18nname return ret def timestamp_verifier(): return get_table_update_time('collectionname') DataCacher.__init__(self, cache_filler, timestamp_verifier) try: if not collection_i18nname_cache.is_ok_p: raise Exception except Exception: collection_i18nname_cache = CollectionI18nNameDataCacher() def get_coll_i18nname(c, ln=CFG_SITE_LANG, verify_cache_timestamp=True): """ Return nicely formatted collection name (of the name type `ln' (=long name)) for collection C in language LN. This function uses collection_i18nname_cache, but it verifies whether the cache is up-to-date first by default. This verification step is performed by checking the DB table update time. So, if you call this function 1000 times, it can get very slow because it will do 1000 table update time verifications, even though collection names change not that often. Hence the parameter VERIFY_CACHE_TIMESTAMP which, when set to False, will assume the cache is already up-to-date. This is useful namely in the generation of collection lists for the search results page. """ if verify_cache_timestamp: collection_i18nname_cache.recreate_cache_if_needed() out = c try: out = collection_i18nname_cache.cache[c][ln] except KeyError: pass # translation in LN does not exist return out class FieldI18nNameDataCacher(DataCacher): """ Provides cache for I18N field names. This class is not to be used directly; use function get_field_i18nname() instead. """ def __init__(self): def cache_filler(): ret = {} try: res = run_sql("SELECT f.name,fn.ln,fn.value FROM fieldname AS fn, field AS f WHERE fn.id_field=f.id AND fn.type='ln'") # ln=long name except Exception: # database problems, return empty cache return {} for f, ln, i18nname in res: if i18nname: if not ret.has_key(f): ret[f] = {} ret[f][ln] = i18nname return ret def timestamp_verifier(): return get_table_update_time('fieldname') DataCacher.__init__(self, cache_filler, timestamp_verifier) try: if not field_i18nname_cache.is_ok_p: raise Exception except Exception: field_i18nname_cache = FieldI18nNameDataCacher() def get_field_i18nname(f, ln=CFG_SITE_LANG, verify_cache_timestamp=True): """ Return nicely formatted field name (of type 'ln', 'long name') for field F in language LN. If VERIFY_CACHE_TIMESTAMP is set to True, then verify DB timestamp and field I18N name cache timestamp and refresh cache from the DB if needed. Otherwise don't bother checking DB timestamp and return the cached value. (This is useful when get_field_i18nname is called inside a loop.) """ if verify_cache_timestamp: field_i18nname_cache.recreate_cache_if_needed() out = f try: out = field_i18nname_cache.cache[f][ln] except KeyError: pass # translation in LN does not exist return out def get_alphabetically_ordered_collection_list(level=0, ln=CFG_SITE_LANG): """Returns nicely ordered (score respected) list of collections, more exactly list of tuples (collection name, printable collection name). Suitable for create_search_box().""" out = [] res = run_sql_cached("SELECT id,name FROM collection ORDER BY name ASC", affected_tables=['collection',]) for c_id, c_name in res: # make a nice printable name (e.g. truncate c_printable for # long collection names in given language): c_printable_fullname = get_coll_i18nname(c_name, ln, False) c_printable = wash_index_term(c_printable_fullname, 30, False) if c_printable != c_printable_fullname: c_printable = c_printable + "..." if level: c_printable = " " + level * '-' + " " + c_printable out.append([c_name, c_printable]) return out def get_nicely_ordered_collection_list(collid=1, level=0, ln=CFG_SITE_LANG): """Returns nicely ordered (score respected) list of collections, more exactly list of tuples (collection name, printable collection name). Suitable for create_search_box().""" colls_nicely_ordered = [] res = run_sql("""SELECT c.name,cc.id_son FROM collection_collection AS cc, collection AS c WHERE c.id=cc.id_son AND cc.id_dad=%s ORDER BY score DESC""", (collid, )) for c, cid in res: # make a nice printable name (e.g. truncate c_printable for # long collection names in given language): c_printable_fullname = get_coll_i18nname(c, ln, False) c_printable = wash_index_term(c_printable_fullname, 30, False) if c_printable != c_printable_fullname: c_printable = c_printable + "..." if level: c_printable = " " + level * '-' + " " + c_printable colls_nicely_ordered.append([c, c_printable]) colls_nicely_ordered = colls_nicely_ordered + get_nicely_ordered_collection_list(cid, level+1, ln=ln) return colls_nicely_ordered def get_index_id_from_field(field): """ Return index id with name corresponding to FIELD, or the first index id where the logical field code named FIELD is indexed. Return zero in case there is no index defined for this field. Example: field='author', output=4. """ out = 0 if field == '': field = 'global' # empty string field means 'global' index (field 'anyfield') # first look in the index table: res = run_sql("""SELECT id FROM idxINDEX WHERE name=%s""", (field,)) if res: out = res[0][0] return out # not found in the index table, now look in the field table: res = run_sql("""SELECT w.id FROM idxINDEX AS w, idxINDEX_field AS wf, field AS f WHERE f.code=%s AND wf.id_field=f.id AND w.id=wf.id_idxINDEX LIMIT 1""", (field,)) if res: out = res[0][0] return out def get_words_from_pattern(pattern): "Returns list of whitespace-separated words from pattern." words = {} for word in string.split(pattern): if not words.has_key(word): words[word] = 1 return words.keys() def create_basic_search_units(req, p, f, m=None, of='hb'): """Splits search pattern and search field into a list of independently searchable units. - A search unit consists of '(operator, pattern, field, type, hitset)' tuples where 'operator' is set union (|), set intersection (+) or set exclusion (-); 'pattern' is either a word (e.g. muon*) or a phrase (e.g. 'nuclear physics'); 'field' is either a code like 'title' or MARC tag like '100__a'; 'type' is the search type ('w' for word file search, 'a' for access file search). - Optionally, the function accepts the match type argument 'm'. If it is set (e.g. from advanced search interface), then it performs this kind of matching. If it is not set, then a guess is made. 'm' can have values: 'a'='all of the words', 'o'='any of the words', 'p'='phrase/substring', 'r'='regular expression', 'e'='exact value'. - Warnings are printed on req (when not None) in case of HTML output formats.""" opfts = [] # will hold (o,p,f,t,h) units # FIXME: quick hack for the journal index if f == 'journal': opfts.append(['+', p, f, 'w']) return opfts ## check arguments: is desired matching type set? if m: ## A - matching type is known; good! if m == 'e': # A1 - exact value: opfts.append(['+', p, f, 'a']) # '+' since we have only one unit elif m == 'p': # A2 - phrase/substring: opfts.append(['+', "%" + p + "%", f, 'a']) # '+' since we have only one unit elif m == 'r': # A3 - regular expression: opfts.append(['+', p, f, 'r']) # '+' since we have only one unit elif m == 'a' or m == 'w': # A4 - all of the words: p = strip_accents(p) # strip accents for 'w' mode, FIXME: delete when not needed for word in get_words_from_pattern(p): opfts.append(['+', word, f, 'w']) # '+' in all units elif m == 'o': # A5 - any of the words: p = strip_accents(p) # strip accents for 'w' mode, FIXME: delete when not needed for word in get_words_from_pattern(p): if len(opfts)==0: opfts.append(['+', word, f, 'w']) # '+' in the first unit else: opfts.append(['|', word, f, 'w']) # '|' in further units else: if of.startswith("h"): print_warning(req, "Matching type '%s' is not implemented yet." % cgi.escape(m), "Warning") opfts.append(['+', "%" + p + "%", f, 'w']) else: ## B - matching type is not known: let us try to determine it by some heuristics if f and p[0] == '"' and p[-1] == '"': ## B0 - does 'p' start and end by double quote, and is 'f' defined? => doing ACC search opfts.append(['+', p[1:-1], f, 'a']) elif (f == 'author' or f == 'exactauthor') and author_name_requires_phrase_search(p): ## B1 - do we search in author, and does 'p' contain space/comma/dot/etc? ## => doing washed ACC search opfts.append(['+', p, f, 'a']) elif f and p[0] == "'" and p[-1] == "'": ## B0bis - does 'p' start and end by single quote, and is 'f' defined? => doing ACC search opfts.append(['+', '%' + p[1:-1] + '%', f, 'a']) elif f and p[0] == "/" and p[-1] == "/": ## B0ter - does 'p' start and end by a slash, and is 'f' defined? => doing regexp search opfts.append(['+', p[1:-1], f, 'r']) elif f and string.find(p, ',') >= 0: ## B1 - does 'p' contain comma, and is 'f' defined? => doing ACC search opfts.append(['+', p, f, 'a']) elif f and str(f[0:2]).isdigit(): ## B2 - does 'f' exist and starts by two digits? => doing ACC search opfts.append(['+', p, f, 'a']) else: ## B3 - doing WRD search, but maybe ACC too # search units are separated by spaces unless the space is within single or double quotes # so, let us replace temporarily any space within quotes by '__SPACE__' p = re_pattern_single_quotes.sub(lambda x: "'"+string.replace(x.group(1), ' ', '__SPACE__')+"'", p) p = re_pattern_double_quotes.sub(lambda x: "\""+string.replace(x.group(1), ' ', '__SPACE__')+"\"", p) p = re_pattern_regexp_quotes.sub(lambda x: "/"+string.replace(x.group(1), ' ', '__SPACE__')+"/", p) # wash argument: p = re_equal.sub(":", p) p = re_logical_and.sub(" ", p) p = re_logical_or.sub(" |", p) p = re_logical_not.sub(" -", p) p = re_operators.sub(r' \1', p) for pi in string.split(p): # iterate through separated units (or items, as "pi" stands for "p item") pi = re_pattern_space.sub(" ", pi) # replace back '__SPACE__' by ' ' # firstly, determine set operator if pi[0] == '+' or pi[0] == '-' or pi[0] == '|': oi = pi[0] pi = pi[1:] else: # okay, there is no operator, so let us decide what to do by default oi = '+' # by default we are doing set intersection... # secondly, determine search pattern and field: if string.find(pi, ":") > 0: fi, pi = string.split(pi, ":", 1) # test whether fi is a real index code or a MARC-tag defined code: if fi in get_fieldcodes() or '00' <= fi[:2] <= '99': pass else: # it is not, so join it back: fi, pi = f, fi + ":" + pi else: fi, pi = f, pi # look also for old ALEPH field names: if fi and CFG_WEBSEARCH_FIELDS_CONVERT.has_key(string.lower(fi)): fi = CFG_WEBSEARCH_FIELDS_CONVERT[string.lower(fi)] # wash 'pi' argument: if re_quotes.match(pi): # B3a - quotes are found => do ACC search (phrase search) if pi[0] == '"' and pi[-1] == '"': pi = string.replace(pi, '"', '') # remove quote signs opfts.append([oi, pi, fi, 'a']) elif pi[0] == "'" and pi[-1] == "'": pi = string.replace(pi, "'", "") # remove quote signs opfts.append([oi, "%" + pi + "%", fi, 'a']) else: # unbalanced quotes, so fall back to WRD query: opfts.append([oi, pi, fi, 'w']) elif fi and str(fi[0]).isdigit() and str(fi[0]).isdigit(): # B3b - fi exists and starts by two digits => do ACC search opfts.append([oi, pi, fi, 'a']) elif fi and not get_index_id_from_field(fi) and get_field_name(fi): # B3c - logical field fi exists but there is no WRD index for fi => try ACC search opfts.append([oi, pi, fi, 'a']) elif pi.startswith('/') and pi.endswith('/'): # B3d - pi has slashes around => do regexp search opfts.append([oi, pi[1:-1], fi, 'r']) else: # B3e - general case => do WRD search pi = strip_accents(pi) # strip accents for 'w' mode, FIXME: delete when not needed for pii in get_words_from_pattern(pi): opfts.append([oi, pii, fi, 'w']) ## sanity check: for i in range(0, len(opfts)): try: pi = opfts[i][1] if pi == '*': if of.startswith("h"): print_warning(req, "Ignoring standalone wildcard word.", "Warning") del opfts[i] if pi == '' or pi == ' ': fi = opfts[i][2] if fi: if of.startswith("h"): print_warning(req, "Ignoring empty %s search term." % fi, "Warning") del opfts[i] except: pass ## return search units: return opfts def page_start(req, of, cc, aas, ln, uid, title_message=None, description='', keywords='', recID=-1, tab='', p=''): "Start page according to given output format." _ = gettext_set_language(ln) if not req or isinstance(req, cStringIO.OutputType): return # we were called from CLI if not title_message: title_message = _("Search Results") content_type = get_output_format_content_type(of) if of.startswith('x'): if of == 'xr': # we are doing RSS output req.content_type = "application/rss+xml" req.send_http_header() req.write("""\n""") else: # we are doing XML output: req.content_type = "text/xml" req.send_http_header() req.write("""\n""") elif of.startswith('t') or str(of[0:3]).isdigit(): # we are doing plain text output: req.content_type = "text/plain" req.send_http_header() elif of == "id": pass # nothing to do, we shall only return list of recIDs elif content_type == 'text/html': # we are doing HTML output: req.content_type = "text/html" req.send_http_header() if not description: description = "%s %s." % (cc, _("Search Results")) if not keywords: keywords = "%s, WebSearch, %s" % (get_coll_i18nname(CFG_SITE_NAME, ln, False), get_coll_i18nname(cc, ln, False)) ## generate RSS URL: argd = {} if req.args: argd = cgi.parse_qs(req.args) rssurl = websearch_templates.build_rss_url(argd) ## add jsmath if displaying single records (FIXME: find ## eventual better place to this code) if of.lower() in CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS: metaheaderadd = """ """ else: metaheaderadd = '' ## generate navtrail: navtrail = create_navtrail_links(cc, aas, ln) if navtrail != '': navtrail += ' > ' if (tab != '' or ((of != '' or of.lower() != 'hd') and of != 'hb')) and \ recID != -1: # If we are not in information tab in HD format, customize # the nav. trail to have a link back to main record. (Due # to the way perform_request_search() works, hb # (lowercase) is equal to hd) navtrail += ' %s' % \ (CFG_SITE_URL, recID, title_message) if (of != '' or of.lower() != 'hd') and of != 'hb': # Export format_name = of query = "SELECT name FROM format WHERE code=%s" res = run_sql(query, (of,)) if res: format_name = res[0][0] navtrail += ' > ' + format_name else: # Discussion, citations, etc. tabs tab_label = get_detailed_page_tabs(cc, ln=ln)[tab]['label'] navtrail += ' > ' + _(tab_label) else: navtrail += title_message if p: # we are serving search/browse results pages, so insert pattern: navtrail += ": " + cgi.escape(p) title_message = cgi.escape(p) + " - " + title_message ## finally, print page header: req.write(pageheaderonly(req=req, title=title_message, navtrail=navtrail, description=description, keywords=keywords, metaheaderadd=metaheaderadd, uid=uid, language=ln, navmenuid='search', navtrail_append_title_p=0, rssurl=rssurl)) req.write(websearch_templates.tmpl_search_pagestart(ln=ln)) #else: # req.send_http_header() def page_end(req, of="hb", ln=CFG_SITE_LANG): "End page according to given output format: e.g. close XML tags, add HTML footer, etc." if of == "id": return [] # empty recID list if not req: return # we were called from CLI if of.startswith('h'): req.write(websearch_templates.tmpl_search_pageend(ln = ln)) # pagebody end req.write(pagefooteronly(lastupdated=__lastupdated__, language=ln, req=req)) return def create_page_title_search_pattern_info(p, p1, p2, p3): """Create the search pattern bit for the page web page HTML header. Basically combine p and (p1,p2,p3) together so that the page header may be filled whether we are in the Simple Search or Advanced Search interface contexts.""" out = "" if p: out = p else: out = p1 if p2: out += ' ' + p2 if p3: out += ' ' + p3 return out def create_inputdate_box(name="d1", selected_year=0, selected_month=0, selected_day=0, ln=CFG_SITE_LANG): "Produces 'From Date', 'Until Date' kind of selection box. Suitable for search options." _ = gettext_set_language(ln) box = "" # day box += """<select name="%sd">""" % name box += """<option value="">%s""" % _("any day") for day in range(1, 32): box += """<option value="%02d"%s>%02d""" % (day, is_selected(day, selected_day), day) box += """</select>""" # month box += """<select name="%sm">""" % name box += """<option value="">%s""" % _("any month") for mm, month in [(1, _("January")), (2, _("February")), (3, _("March")), (4, _("April")), \ (5, _("May")), (6, _("June")), (7, _("July")), (8, _("August")), \ (9, _("September")), (10, _("October")), (11, _("November")), (12, _("December"))]: box += """<option value="%02d"%s>%s""" % (mm, is_selected(mm, selected_month), month) box += """</select>""" # year box += """<select name="%sy">""" % name box += """<option value="">%s""" % _("any year") this_year = int(time.strftime("%Y", time.localtime())) for year in range(this_year-20, this_year+1): box += """<option value="%d"%s>%d""" % (year, is_selected(year, selected_year), year) box += """</select>""" return box def create_search_box(cc, colls, p, f, rg, sf, so, sp, rm, of, ot, aas, ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, dt, jrec, ec, action=""): """Create search box for 'search again in the results page' functionality.""" # load the right message language _ = gettext_set_language(ln) # some computations cc_intl = get_coll_i18nname(cc, ln, False) cc_colID = get_colID(cc) colls_nicely_ordered = [] if cfg_nicely_ordered_collection_list: colls_nicely_ordered = get_nicely_ordered_collection_list(ln=ln) else: colls_nicely_ordered = get_alphabetically_ordered_collection_list(ln=ln) colls_nice = [] for (cx, cx_printable) in colls_nicely_ordered: if not cx.startswith("Unnamed collection"): colls_nice.append({ 'value' : cx, 'text' : cx_printable }) coll_selects = [] if colls and colls[0] != CFG_SITE_NAME: # some collections are defined, so print these first, and only then print 'add another collection' heading: for c in colls: if c: temp = [] temp.append({ 'value' : CFG_SITE_NAME, 'text' : '*** %s ***' % _("any public collection") }) # this field is used to remove the current collection from the ones to be searched. temp.append({ 'value' : '', 'text' : '*** %s ***' % _("remove this collection") }) for val in colls_nice: # print collection: if not cx.startswith("Unnamed collection"): temp.append({ 'value' : val['value'], 'text' : val['text'], 'selected' : (c == re.sub("^[\s\-]*","", val['value'])) }) coll_selects.append(temp) coll_selects.append([{ 'value' : '', 'text' : '*** %s ***' % _("add another collection") }] + colls_nice) else: # we searched in CFG_SITE_NAME, so print 'any public collection' heading coll_selects.append([{ 'value' : CFG_SITE_NAME, 'text' : '*** %s ***' % _("any public collection") }] + colls_nice) ## ranking methods ranks = [{ 'value' : '', 'text' : "- %s %s -" % (_("OR").lower (), _("rank by")), }] for (code, name) in get_bibrank_methods(cc_colID, ln): # propose found rank methods: ranks.append({ 'value' : code, 'text' : name, }) formats = [] query = """SELECT code,name FROM format WHERE visibility='1' ORDER BY name ASC""" res = run_sql(query) if res: # propose found formats: for code, name in res: formats.append({ 'value' : code, 'text' : name }) else: formats.append({'value' : 'hb', 'text' : _("HTML brief") }) # show collections in the search box? (not if there is only one # collection defined, and not if we are in light search) show_colls = True show_title = True if len(collection_reclist_cache.cache.keys()) == 1 or \ aas == -1: show_colls = False show_title = False if cc == CFG_SITE_NAME: show_title = False return websearch_templates.tmpl_search_box( ln = ln, aas = aas, cc_intl = cc_intl, cc = cc, ot = ot, sp = sp, action = action, fieldslist = get_searchwithin_fields(ln=ln, colID=cc_colID), f1 = f1, f2 = f2, f3 = f3, m1 = m1, m2 = m2, m3 = m3, p1 = p1, p2 = p2, p3 = p3, op1 = op1, op2 = op2, rm = rm, p = p, f = f, coll_selects = coll_selects, d1y = d1y, d2y = d2y, d1m = d1m, d2m = d2m, d1d = d1d, d2d = d2d, dt = dt, sort_fields = get_sortby_fields(ln=ln, colID=cc_colID), sf = sf, so = so, ranks = ranks, sc = sc, rg = rg, formats = formats, of = of, pl = pl, jrec = jrec, ec = ec, show_colls = show_colls, show_title = show_title, ) def create_navtrail_links(cc=CFG_SITE_NAME, aas=0, ln=CFG_SITE_LANG, self_p=1, tab=''): """Creates navigation trail links, i.e. links to collection ancestors (except Home collection). If aas==1, then links to Advanced Search interfaces; otherwise Simple Search. """ dads = [] for dad in get_coll_ancestors(cc): if dad != CFG_SITE_NAME: # exclude Home collection dads.append ((dad, get_coll_i18nname(dad, ln, False))) if self_p and cc != CFG_SITE_NAME: dads.append((cc, get_coll_i18nname(cc, ln, False))) return websearch_templates.tmpl_navtrail_links( aas=aas, ln=ln, dads=dads) def get_searchwithin_fields(ln='en', colID=None): """Retrieves the fields name used in the 'search within' selection box for the collection ID colID.""" res = None if colID: res = run_sql_cached("""SELECT f.code,f.name FROM field AS f, collection_field_fieldvalue AS cff WHERE cff.type='sew' AND cff.id_collection=%s AND cff.id_field=f.id ORDER BY cff.score DESC, f.name ASC""", (colID,), affected_tables=['field', 'collection_field_fieldvalue']) if not res: res = run_sql_cached("SELECT code,name FROM field ORDER BY name ASC", affected_tables=['field',]) fields = [{ 'value' : '', 'text' : get_field_i18nname("any field", ln, False) }] for field_code, field_name in res: if field_code and field_code != "anyfield": fields.append({ 'value' : field_code, 'text' : get_field_i18nname(field_name, ln, False) }) return fields def get_sortby_fields(ln='en', colID=None): """Retrieves the fields name used in the 'sort by' selection box for the collection ID colID.""" _ = gettext_set_language(ln) res = None if colID: res = run_sql_cached("""SELECT DISTINCT(f.code),f.name FROM field AS f, collection_field_fieldvalue AS cff WHERE cff.type='soo' AND cff.id_collection=%s AND cff.id_field=f.id ORDER BY cff.score DESC, f.name ASC""", (colID,), affected_tables=['field', 'collection_field_fieldvalue']) if not res: # no sort fields defined for this colID, try to take Home collection: res = run_sql_cached("""SELECT DISTINCT(f.code),f.name FROM field AS f, collection_field_fieldvalue AS cff WHERE cff.type='soo' AND cff.id_collection=%s AND cff.id_field=f.id ORDER BY cff.score DESC, f.name ASC""", (1,), affected_tables=['field', 'collection_field_fieldvalue']) if not res: # no sort fields defined for the Home collection, take all sort fields defined wherever they are: res = run_sql_cached("""SELECT DISTINCT(f.code),f.name FROM field AS f, collection_field_fieldvalue AS cff WHERE cff.type='soo' AND cff.id_field=f.id ORDER BY cff.score DESC, f.name ASC""", affected_tables=['field', 'collection_field_fieldvalue']) fields = [{ 'value' : '', 'text' : _("latest first") }] for field_code, field_name in res: if field_code and field_code != "anyfield": fields.append({ 'value' : field_code, 'text' : get_field_i18nname(field_name, ln, False) }) return fields def create_andornot_box(name='op', value='', ln='en'): "Returns HTML code for the AND/OR/NOT selection box." _ = gettext_set_language(ln) out = """ <select name="%s"> <option value="a"%s>%s <option value="o"%s>%s <option value="n"%s>%s </select> """ % (name, is_selected('a', value), _("AND"), is_selected('o', value), _("OR"), is_selected('n', value), _("AND NOT")) return out def create_matchtype_box(name='m', value='', ln='en'): "Returns HTML code for the 'match type' selection box." _ = gettext_set_language(ln) out = """ <select name="%s"> <option value="a"%s>%s <option value="o"%s>%s <option value="e"%s>%s <option value="p"%s>%s <option value="r"%s>%s </select> """ % (name, is_selected('a', value), _("All of the words:"), is_selected('o', value), _("Any of the words:"), is_selected('e', value), _("Exact phrase:"), is_selected('p', value), _("Partial phrase:"), is_selected('r', value), _("Regular expression:")) return out def is_selected(var, fld): "Checks if the two are equal, and if yes, returns ' selected'. Useful for select boxes." if type(var) is int and type(fld) is int: if var == fld: return " selected" elif str(var) == str(fld): return " selected" elif fld and len(fld)==3 and fld[0] == "w" and var == fld[1:]: return " selected" return "" def wash_colls(cc, c, split_colls=0, verbose=0): """Wash collection list by checking whether user has deselected anything under 'Narrow search'. Checks also if cc is a list or not. Return list of cc, colls_to_display, colls_to_search since the list of collections to display is different from that to search in. This is because users might have chosen 'split by collection' functionality. The behaviour of "collections to display" depends solely whether user has deselected a particular collection: e.g. if it started from 'Articles and Preprints' page, and deselected 'Preprints', then collection to display is 'Articles'. If he did not deselect anything, then collection to display is 'Articles & Preprints'. The behaviour of "collections to search in" depends on the 'split_colls' parameter: * if is equal to 1, then we can wash the colls list down and search solely in the collection the user started from; * if is equal to 0, then we are splitting to the first level of collections, i.e. collections as they appear on the page we started to search from; The function raises exception InvenioWebSearchUnknownCollectionError if cc or one of c collections is not known. """ colls_out = [] colls_out_for_display = [] # list to hold the hosted collections to be searched and displayed hosted_colls_out = [] debug = "" if verbose: debug += "<br />" debug += "<br />1) --- initial parameters ---" debug += "<br />cc : %s" % cc debug += "<br />c : %s" % c debug += "<br />" # check what type is 'cc': if type(cc) is list: for ci in cc: if collection_reclist_cache.cache.has_key(ci): # yes this collection is real, so use it: cc = ci break else: # check once if cc is real: if not collection_reclist_cache.cache.has_key(cc): if cc: raise InvenioWebSearchUnknownCollectionError(cc) else: cc = CFG_SITE_NAME # cc is not set, so replace it with Home collection # check type of 'c' argument: if type(c) is list: colls = c else: colls = [c] if verbose: debug += "<br />2) --- after check for the integrity of cc and the being or not c a list ---" debug += "<br />cc : %s" % cc debug += "<br />c : %s" % c debug += "<br />" # remove all 'unreal' collections: colls_real = [] for coll in colls: if collection_reclist_cache.cache.has_key(coll): colls_real.append(coll) else: if coll: raise InvenioWebSearchUnknownCollectionError(coll) colls = colls_real if verbose: debug += "<br />3) --- keeping only the real colls of c ---" debug += "<br />colls : %s" % colls debug += "<br />" # check if some real collections remain: if len(colls)==0: colls = [cc] if verbose: debug += "<br />4) --- in case no colls were left we use cc directly ---" debug += "<br />colls : %s" % colls debug += "<br />" # then let us check the list of non-restricted "real" sons of 'cc' and compare it to 'coll': res = run_sql("""SELECT c.name FROM collection AS c, collection_collection AS cc, collection AS ccc WHERE c.id=cc.id_son AND cc.id_dad=ccc.id AND ccc.name=%s AND cc.type='r'""", (cc,)) # list that holds all the non restricted sons of cc that are also not hosted collections l_cc_nonrestricted_sons_and_nonhosted_colls = [] res_hosted = run_sql("""SELECT c.name FROM collection AS c, collection_collection AS cc, collection AS ccc WHERE c.id=cc.id_son AND cc.id_dad=ccc.id AND ccc.name=%s AND cc.type='r' AND (c.dbquery NOT LIKE 'hostedcollection:%%' OR c.dbquery IS NULL)""", (cc,)) for row_hosted in res_hosted: l_cc_nonrestricted_sons_and_nonhosted_colls.append(row_hosted[0]) l_cc_nonrestricted_sons_and_nonhosted_colls.sort() l_cc_nonrestricted_sons = [] l_c = colls for row in res: if not collection_restricted_p(row[0]): l_cc_nonrestricted_sons.append(row[0]) l_c.sort() l_cc_nonrestricted_sons.sort() if l_cc_nonrestricted_sons == l_c: colls_out_for_display = [cc] # yep, washing permitted, it is sufficient to display 'cc' # the following elif is a hack that preserves the above funcionality when we start searching from # the frontpage with some hosted collections deselected (either by default or manually) elif set(l_cc_nonrestricted_sons_and_nonhosted_colls).issubset(set(l_c)): colls_out_for_display = colls split_colls = 0 else: colls_out_for_display = colls # nope, we need to display all 'colls' successively # remove duplicates: #colls_out_for_display_nondups=filter(lambda x, colls_out_for_display=colls_out_for_display: colls_out_for_display[x-1] not in colls_out_for_display[x:], range(1, len(colls_out_for_display)+1)) #colls_out_for_display = map(lambda x, colls_out_for_display=colls_out_for_display:colls_out_for_display[x-1], colls_out_for_display_nondups) colls_out_for_display = list(set(colls_out_for_display)) if verbose: debug += "<br />5) --- decide whether colls_out_for_diplay should be colls or is it sufficient for it to be cc; remove duplicates ---" debug += "<br />colls_out_for_display : %s" % colls_out_for_display debug += "<br />" # the following piece of code takes care of removing collections whose ancestors are going to be searched anyway # list to hold the collections to be removed colls_to_be_removed = [] # first calculate the collections that can safely be removed for coll in colls_out_for_display: for ancestor in get_coll_ancestors(coll): #if ancestor in colls_out_for_display: colls_to_be_removed.append(coll) if ancestor in colls_out_for_display and not is_hosted_collection(coll): colls_to_be_removed.append(coll) # secondly remove the collections for coll in colls_to_be_removed: colls_out_for_display.remove(coll) if verbose: debug += "<br />6) --- remove collections that have ancestors about to be search, unless they are hosted ---" debug += "<br />colls_out_for_display : %s" % colls_out_for_display debug += "<br />" # calculate the hosted collections to be searched. if colls_out_for_display == [cc]: if is_hosted_collection(cc): hosted_colls_out.append(cc) else: for coll in get_coll_sons(cc): if is_hosted_collection(coll): hosted_colls_out.append(coll) else: for coll in colls_out_for_display: if is_hosted_collection(coll): hosted_colls_out.append(coll) if verbose: debug += "<br />7) --- calculate the hosted_colls_out ---" debug += "<br />hosted_colls_out : %s" % hosted_colls_out debug += "<br />" # second, let us decide on collection splitting: if split_colls == 0: # type A - no sons are wanted colls_out = colls_out_for_display else: # type B - sons (first-level descendants) are wanted for coll in colls_out_for_display: coll_sons = get_coll_sons(coll) if coll_sons == []: colls_out.append(coll) else: for coll_son in coll_sons: if not is_hosted_collection(coll_son): colls_out.append(coll_son) #else: # colls_out = colls_out + coll_sons # remove duplicates: #colls_out_nondups=filter(lambda x, colls_out=colls_out: colls_out[x-1] not in colls_out[x:], range(1, len(colls_out)+1)) #colls_out = map(lambda x, colls_out=colls_out:colls_out[x-1], colls_out_nondups) colls_out = list(set(colls_out)) if verbose: debug += "<br />8) --- calculate the colls_out; remove duplicates ---" debug += "<br />colls_out : %s" % colls_out debug += "<br />" # remove the hosted collections from the collections to be searched if hosted_colls_out: for coll in hosted_colls_out: try: colls_out.remove(coll) except ValueError: # in case coll was not found in colls_out pass if verbose: debug += "<br />9) --- remove the hosted_colls from the colls_out ---" debug += "<br />colls_out : %s" % colls_out return (cc, colls_out_for_display, colls_out, hosted_colls_out, debug) def strip_accents(x): """Strip accents in the input phrase X (assumed in UTF-8) by replacing accented characters with their unaccented cousins (e.g. é by e). Return such a stripped X.""" x = re_latex_lowercase_a.sub("a", x) x = re_latex_lowercase_ae.sub("ae", x) x = re_latex_lowercase_e.sub("e", x) x = re_latex_lowercase_i.sub("i", x) x = re_latex_lowercase_o.sub("o", x) x = re_latex_lowercase_u.sub("u", x) x = re_latex_lowercase_y.sub("x", x) x = re_latex_lowercase_c.sub("c", x) x = re_latex_lowercase_n.sub("n", x) x = re_latex_uppercase_a.sub("A", x) x = re_latex_uppercase_ae.sub("AE", x) x = re_latex_uppercase_e.sub("E", x) x = re_latex_uppercase_i.sub("I", x) x = re_latex_uppercase_o.sub("O", x) x = re_latex_uppercase_u.sub("U", x) x = re_latex_uppercase_y.sub("Y", x) x = re_latex_uppercase_c.sub("C", x) x = re_latex_uppercase_n.sub("N", x) # convert input into Unicode string: try: y = unicode(x, "utf-8") except: return x # something went wrong, probably the input wasn't UTF-8 # asciify Latin-1 lowercase characters: y = re_unicode_lowercase_a.sub("a", y) y = re_unicode_lowercase_ae.sub("ae", y) y = re_unicode_lowercase_e.sub("e", y) y = re_unicode_lowercase_i.sub("i", y) y = re_unicode_lowercase_o.sub("o", y) y = re_unicode_lowercase_u.sub("u", y) y = re_unicode_lowercase_y.sub("y", y) y = re_unicode_lowercase_c.sub("c", y) y = re_unicode_lowercase_n.sub("n", y) # asciify Latin-1 uppercase characters: y = re_unicode_uppercase_a.sub("A", y) y = re_unicode_uppercase_ae.sub("AE", y) y = re_unicode_uppercase_e.sub("E", y) y = re_unicode_uppercase_i.sub("I", y) y = re_unicode_uppercase_o.sub("O", y) y = re_unicode_uppercase_u.sub("U", y) y = re_unicode_uppercase_y.sub("Y", y) y = re_unicode_uppercase_c.sub("C", y) y = re_unicode_uppercase_n.sub("N", y) # return UTF-8 representation of the Unicode string: return y.encode("utf-8") def wash_index_term(term, max_char_length=50, lower_term=True): """ Return washed form of the index term TERM that would be suitable for storing into idxWORD* tables. I.e., lower the TERM if LOWER_TERM is True, and truncate it safely to MAX_CHAR_LENGTH UTF-8 characters (meaning, in principle, 4*MAX_CHAR_LENGTH bytes). The function works by an internal conversion of TERM, when needed, from its input Python UTF-8 binary string format into Python Unicode format, and then truncating it safely to the given number of UTF-8 characters, without possible mis-truncation in the middle of a multi-byte UTF-8 character that could otherwise happen if we would have been working with UTF-8 binary representation directly. Note that MAX_CHAR_LENGTH corresponds to the length of the term column in idxINDEX* tables. """ if lower_term: washed_term = unicode(term, 'utf-8').lower() else: washed_term = unicode(term, 'utf-8') if len(washed_term) <= max_char_length: # no need to truncate the term, because it will fit # nicely even if it uses four-byte UTF-8 characters return washed_term.encode('utf-8') else: # truncate the term in a safe position: return washed_term[:max_char_length].encode('utf-8') def lower_index_term(term): """ Return safely lowered index term TERM. This is done by converting to UTF-8 first, because standard Python lower() function is not UTF-8 safe. To be called by both the search engine and the indexer when appropriate (e.g. before stemming). In case of problems with UTF-8 compliance, this function raises UnicodeDecodeError, so the client code may want to catch it. """ return unicode(term, 'utf-8').lower().encode('utf-8') def wash_output_format(format): """Wash output format FORMAT. Currently only prevents input like 'of=9' for backwards-compatible format that prints certain fields only. (for this task, 'of=tm' is preferred)""" if str(format[0:3]).isdigit() and len(format) != 6: # asked to print MARC tags, but not enough digits, # so let's switch back to HTML brief default return 'hb' else: return format def wash_pattern(p): """Wash pattern passed by URL. Check for sanity of the wildcard by removing wildcards if they are appended to extremely short words (1-3 letters). TODO: instead of this approximative treatment, it will be much better to introduce a temporal limit, e.g. to kill a query if it does not finish in 10 seconds.""" # strip accents: # p = strip_accents(p) # FIXME: when available, strip accents all the time # add leading/trailing whitespace for the two following wildcard-sanity checking regexps: p = " " + p + " " # replace spaces within quotes by __SPACE__ temporarily: p = re_pattern_single_quotes.sub(lambda x: "'"+string.replace(x.group(1), ' ', '__SPACE__')+"'", p) p = re_pattern_double_quotes.sub(lambda x: "\""+string.replace(x.group(1), ' ', '__SPACE__')+"\"", p) p = re_pattern_regexp_quotes.sub(lambda x: "/"+string.replace(x.group(1), ' ', '__SPACE__')+"/", p) # get rid of unquoted wildcards after spaces: p = re_pattern_wildcards_after_spaces.sub("\\1", p) # get rid of extremely short words (1-3 letters with wildcards): p = re_pattern_short_words.sub("\\1", p) # replace back __SPACE__ by spaces: p = re_pattern_space.sub(" ", p) # replace special terms: p = re_pattern_today.sub(time.strftime("%Y-%m-%d", time.localtime()), p) # remove unnecessary whitespace: p = string.strip(p) return p def wash_field(f): """Wash field passed by URL.""" # get rid of unnecessary whitespace: f = string.strip(f) # wash old-style CDS Invenio/ALEPH 'f' field argument, e.g. replaces 'wau' and 'au' by 'author' if CFG_WEBSEARCH_FIELDS_CONVERT.has_key(string.lower(f)): f = CFG_WEBSEARCH_FIELDS_CONVERT[f] return f def wash_dates(d1="", d1y=0, d1m=0, d1d=0, d2="", d2y=0, d2m=0, d2d=0): """ Take user-submitted date arguments D1 (full datetime string) or (D1Y, D1M, D1Y) year, month, day tuple and D2 or (D2Y, D2M, D2Y) and return (YYY1-M1-D2 H1:M1:S2, YYY2-M2-D2 H2:M2:S2) datetime strings in the YYYY-MM-DD HH:MM:SS format suitable for time restricted searching. Note that when both D1 and (D1Y, D1M, D1D) parameters are present, the precedence goes to D1. Ditto for D2*. Note that when (D1Y, D1M, D1D) are taken into account, some values may be missing and are completed e.g. to 01 or 12 according to whether it is the starting or the ending date. """ datetext1, datetext2 = "", "" # sanity checking: if d1 == "" and d1y == 0 and d1m == 0 and d1d == 0 and d2 == "" and d2y == 0 and d2m == 0 and d2d == 0: return ("", "") # nothing selected, so return empty values # wash first (starting) date: if d1: # full datetime string takes precedence: datetext1 = d1 else: # okay, first date passed as (year,month,day): if d1y: datetext1 += "%04d" % d1y else: datetext1 += "0000" if d1m: datetext1 += "-%02d" % d1m else: datetext1 += "-01" if d1d: datetext1 += "-%02d" % d1d else: datetext1 += "-01" datetext1 += " 00:00:00" # wash second (ending) date: if d2: # full datetime string takes precedence: datetext2 = d2 else: # okay, second date passed as (year,month,day): if d2y: datetext2 += "%04d" % d2y else: datetext2 += "9999" if d2m: datetext2 += "-%02d" % d2m else: datetext2 += "-12" if d2d: datetext2 += "-%02d" % d2d else: datetext2 += "-31" # NOTE: perhaps we should add max(datenumber) in # given month, but for our quering it's not # needed, 31 will always do datetext2 += " 00:00:00" # okay, return constructed YYYY-MM-DD HH:MM:SS datetexts: return (datetext1, datetext2) def is_hosted_collection(coll): """Check if the given collection is a hosted one; i.e. its dbquery starts with hostedcollection: Returns True if it is, False if it's not or if the result is empty or if the query failed""" res = run_sql("SELECT dbquery FROM collection WHERE name=%s", (coll, )) try: return res[0][0].startswith("hostedcollection:") except: return False def get_colID(c): "Return collection ID for collection name C. Return None if no match found." colID = None res = run_sql("SELECT id FROM collection WHERE name=%s", (c,), 1) if res: colID = res[0][0] return colID def get_coll_ancestors(coll): "Returns a list of ancestors for collection 'coll'." coll_ancestors = [] coll_ancestor = coll while 1: res = run_sql("""SELECT c.name FROM collection AS c LEFT JOIN collection_collection AS cc ON c.id=cc.id_dad LEFT JOIN collection AS ccc ON ccc.id=cc.id_son WHERE ccc.name=%s ORDER BY cc.id_dad ASC LIMIT 1""", (coll_ancestor,)) if res: coll_name = res[0][0] coll_ancestors.append(coll_name) coll_ancestor = coll_name else: break # ancestors found, return reversed list: coll_ancestors.reverse() return coll_ancestors def get_coll_sons(coll, type='r', public_only=1): """Return a list of sons (first-level descendants) of type 'type' for collection 'coll'. If public_only, then return only non-restricted son collections. """ coll_sons = [] query = "SELECT c.name FROM collection AS c "\ "LEFT JOIN collection_collection AS cc ON c.id=cc.id_son "\ "LEFT JOIN collection AS ccc ON ccc.id=cc.id_dad "\ "WHERE cc.type=%s AND ccc.name=%s" query += " ORDER BY cc.score DESC" res = run_sql(query, (type, coll)) for name in res: if not public_only or not collection_restricted_p(name[0]): coll_sons.append(name[0]) return coll_sons def get_coll_real_descendants(coll, type='_', get_hosted_colls=True): """Return a list of all descendants of collection 'coll' that are defined by a 'dbquery'. IOW, we need to decompose compound collections like "A & B" into "A" and "B" provided that "A & B" has no associated database query defined. """ coll_sons = [] res = run_sql("""SELECT c.name,c.dbquery FROM collection AS c LEFT JOIN collection_collection AS cc ON c.id=cc.id_son LEFT JOIN collection AS ccc ON ccc.id=cc.id_dad WHERE ccc.name=%s AND cc.type LIKE %s ORDER BY cc.score DESC""", (coll, type,)) for name, dbquery in res: if dbquery: # this is 'real' collection, so return it: if get_hosted_colls: coll_sons.append(name) else: if not dbquery.startswith("hostedcollection:"): coll_sons.append(name) else: # this is 'composed' collection, so recurse: coll_sons.extend(get_coll_real_descendants(name)) return coll_sons def browse_pattern(req, colls, p, f, rg, ln=CFG_SITE_LANG): """Browse either biliographic phrases or words indexes, and display it.""" # load the right message language _ = gettext_set_language(ln) ## is p enclosed in quotes? (coming from exact search) if p.startswith('"') and p.endswith('"'): p = p[1:-1] p_orig = p ## okay, "real browse" follows: ## FIXME: the maths in the get_nearest_terms_in_bibxxx is just a test if not f and string.find(p, ":") > 0: # does 'p' contain ':'? f, p = string.split(p, ":", 1) ## do we search in words indexes? if not f: return browse_in_bibwords(req, p, f) index_id = get_index_id_from_field(f) if index_id != 0: coll = HitSet() for coll_name in colls: coll |= get_collection_reclist(coll_name) browsed_phrases_in_colls = get_nearest_terms_in_idxphrase_with_collection(p, index_id, rg/2, rg/2, coll) else: browsed_phrases = get_nearest_terms_in_bibxxx(p, f, (rg+1)/2+1, (rg-1)/2+1) while not browsed_phrases: # try again and again with shorter and shorter pattern: try: p = p[:-1] browsed_phrases = get_nearest_terms_in_bibxxx(p, f, (rg+1)/2+1, (rg-1)/2+1) except: # probably there are no hits at all: req.write(_("No values found.")) return ## try to check hits in these particular collection selection: browsed_phrases_in_colls = [] if 0: for phrase in browsed_phrases: phrase_hitset = HitSet() phrase_hitsets = search_pattern("", phrase, f, 'e') for coll in colls: phrase_hitset.union_update(phrase_hitsets[coll]) if len(phrase_hitset) > 0: # okay, this phrase has some hits in colls, so add it: browsed_phrases_in_colls.append([phrase, len(phrase_hitset)]) ## were there hits in collections? if browsed_phrases_in_colls == []: if browsed_phrases != []: #print_warning(req, """<p>No match close to <em>%s</em> found in given collections. #Please try different term.<p>Displaying matches in any collection...""" % p_orig) ## try to get nbhits for these phrases in any collection: for phrase in browsed_phrases: browsed_phrases_in_colls.append([phrase, get_nbhits_in_bibxxx(phrase, f)]) ## display results now: out = websearch_templates.tmpl_browse_pattern( f=f, fn=get_field_i18nname(get_field_name(f) or f, ln, False), ln=ln, browsed_phrases_in_colls=browsed_phrases_in_colls, colls=colls, rg=rg, ) req.write(out) return def browse_in_bibwords(req, p, f, ln=CFG_SITE_LANG): """Browse inside words indexes.""" if not p: return _ = gettext_set_language(ln) urlargd = {} urlargd.update(req.argd) urlargd['action'] = 'search' nearest_box = create_nearest_terms_box(urlargd, p, f, 'w', ln=ln, intro_text_p=0) req.write(websearch_templates.tmpl_search_in_bibwords( p = p, f = f, ln = ln, nearest_box = nearest_box )) return def search_pattern(req=None, p=None, f=None, m=None, ap=0, of="id", verbose=0, ln=CFG_SITE_LANG, display_nearest_terms_box=True): """Search for complex pattern 'p' within field 'f' according to matching type 'm'. Return hitset of recIDs. The function uses multi-stage searching algorithm in case of no exact match found. See the Search Internals document for detailed description. The 'ap' argument governs whether an alternative patterns are to be used in case there is no direct hit for (p,f,m). For example, whether to replace non-alphanumeric characters by spaces if it would give some hits. See the Search Internals document for detailed description. (ap=0 forbits the alternative pattern usage, ap=1 permits it.) The 'of' argument governs whether to print or not some information to the user in case of no match found. (Usually it prints the information in case of HTML formats, otherwise it's silent). The 'verbose' argument controls the level of debugging information to be printed (0=least, 9=most). All the parameters are assumed to have been previously washed. This function is suitable as a mid-level API. """ _ = gettext_set_language(ln) hitset_empty = HitSet() # sanity check: if not p: hitset_full = HitSet(trailing_bits=1) hitset_full.discard(0) # no pattern, so return all universe return hitset_full # search stage 1: break up arguments into basic search units: if verbose and of.startswith("h"): t1 = os.times()[4] basic_search_units = create_basic_search_units(req, p, f, m, of) if verbose and of.startswith("h"): t2 = os.times()[4] print_warning(req, "Search stage 1: basic search units are: %s" % cgi.escape(repr(basic_search_units))) print_warning(req, "Search stage 1: execution took %.2f seconds." % (t2 - t1)) # search stage 2: do search for each search unit and verify hit presence: if verbose and of.startswith("h"): t1 = os.times()[4] basic_search_units_hitsets = [] #prepare hiddenfield-related.. myhiddens = CFG_BIBFORMAT_HIDDEN_TAGS can_see_hidden = False if req: user_info = collect_user_info(req) can_see_hidden = (acc_authorize_action(user_info, 'runbibedit')[0] == 0) if can_see_hidden: myhiddens = [] for idx_unit in xrange(len(basic_search_units)): bsu_o, bsu_p, bsu_f, bsu_m = basic_search_units[idx_unit] basic_search_unit_hitset = search_unit(bsu_p, bsu_f, bsu_m) #check that the user is allowed to search with this tag #if he/she tries it if bsu_f and len(bsu_f) > 1 and bsu_f[0].isdigit() and bsu_f[1].isdigit(): for htag in myhiddens: ltag = len(htag) samelenfield = bsu_f[0:ltag] if samelenfield == htag: #user searches by a hidden tag #we won't show you anything.. basic_search_unit_hitset = HitSet() if verbose >= 9 and of.startswith("h"): print_warning(req, "Pattern %s hitlist omitted since \ it queries in a hidden tag %s" % (repr(bsu_p), repr(myhiddens))) display_nearest_terms_box=False #..and stop spying, too. if verbose >= 9 and of.startswith("h"): print_warning(req, "Search stage 1: pattern %s gave hitlist %s" % (cgi.escape(bsu_p), basic_search_unit_hitset)) if len(basic_search_unit_hitset) > 0 or \ ap==0 or \ bsu_o=="|" or \ ((idx_unit+1)<len(basic_search_units) and basic_search_units[idx_unit+1][0]=="|"): # stage 2-1: this basic search unit is retained, since # either the hitset is non-empty, or the approximate # pattern treatment is switched off, or the search unit # was joined by an OR operator to preceding/following # units so we do not require that it exists basic_search_units_hitsets.append(basic_search_unit_hitset) else: # stage 2-2: no hits found for this search unit, try to replace non-alphanumeric chars inside pattern: if re.search(r'[^a-zA-Z0-9\s\:]', bsu_p): if bsu_p.startswith('"') and bsu_p.endswith('"'): # is it ACC query? bsu_pn = re.sub(r'[^a-zA-Z0-9\s\:]+', "*", bsu_p) else: # it is WRD query bsu_pn = re.sub(r'[^a-zA-Z0-9\s\:]+', " ", bsu_p) if verbose and of.startswith('h') and req: print_warning(req, "Trying (%s,%s,%s)" % (cgi.escape(bsu_pn), cgi.escape(bsu_f), cgi.escape(bsu_m))) basic_search_unit_hitset = search_pattern(req=None, p=bsu_pn, f=bsu_f, m=bsu_m, of="id", ln=ln) if len(basic_search_unit_hitset) > 0: # we retain the new unit instead if of.startswith('h'): print_warning(req, _("No exact match found for %(x_query1)s, using %(x_query2)s instead...") % \ {'x_query1': "<em>" + cgi.escape(bsu_p) + "</em>", 'x_query2': "<em>" + cgi.escape(bsu_pn) + "</em>"}) basic_search_units[idx_unit][1] = bsu_pn basic_search_units_hitsets.append(basic_search_unit_hitset) else: # stage 2-3: no hits found either, propose nearest indexed terms: if of.startswith('h') and display_nearest_terms_box: if req: if bsu_f == "recid": - print_warning(req, "Requested record does not seem to exist.") + print_warning(req, _("Requested record does not seem to exist.")) else: print_warning(req, create_nearest_terms_box(req.argd, bsu_p, bsu_f, bsu_m, ln=ln)) return hitset_empty else: # stage 2-3: no hits found either, propose nearest indexed terms: if of.startswith('h') and display_nearest_terms_box: if req: if bsu_f == "recid": - print_warning(req, "Requested record does not seem to exist.") + print_warning(req, _("Requested record does not seem to exist.")) else: print_warning(req, create_nearest_terms_box(req.argd, bsu_p, bsu_f, bsu_m, ln=ln)) return hitset_empty if verbose and of.startswith("h"): t2 = os.times()[4] for idx_unit in range(0, len(basic_search_units)): print_warning(req, "Search stage 2: basic search unit %s gave %d hits." % (basic_search_units[idx_unit][1:], len(basic_search_units_hitsets[idx_unit]))) print_warning(req, "Search stage 2: execution took %.2f seconds." % (t2 - t1)) # search stage 3: apply boolean query for each search unit: if verbose and of.startswith("h"): t1 = os.times()[4] # let the initial set be the complete universe: hitset_in_any_collection = HitSet(trailing_bits=1) hitset_in_any_collection.discard(0) for idx_unit in xrange(len(basic_search_units)): this_unit_operation = basic_search_units[idx_unit][0] this_unit_hitset = basic_search_units_hitsets[idx_unit] if this_unit_operation == '+': hitset_in_any_collection.intersection_update(this_unit_hitset) elif this_unit_operation == '-': hitset_in_any_collection.difference_update(this_unit_hitset) elif this_unit_operation == '|': hitset_in_any_collection.union_update(this_unit_hitset) else: if of.startswith("h"): print_warning(req, "Invalid set operation %s." % cgi.escape(this_unit_operation), "Error") if len(hitset_in_any_collection) == 0: # no hits found, propose alternative boolean query: if of.startswith('h') and display_nearest_terms_box: nearestterms = [] for idx_unit in range(0, len(basic_search_units)): bsu_o, bsu_p, bsu_f, bsu_m = basic_search_units[idx_unit] if bsu_p.startswith("%") and bsu_p.endswith("%"): bsu_p = "'" + bsu_p[1:-1] + "'" bsu_nbhits = len(basic_search_units_hitsets[idx_unit]) # create a similar query, but with the basic search unit only argd = {} argd.update(req.argd) argd['p'] = bsu_p argd['f'] = bsu_f nearestterms.append((bsu_p, bsu_nbhits, argd)) text = websearch_templates.tmpl_search_no_boolean_hits( ln=ln, nearestterms=nearestterms) print_warning(req, text) if verbose and of.startswith("h"): t2 = os.times()[4] print_warning(req, "Search stage 3: boolean query gave %d hits." % len(hitset_in_any_collection)) print_warning(req, "Search stage 3: execution took %.2f seconds." % (t2 - t1)) return hitset_in_any_collection def search_pattern_parenthesised(req=None, p=None, f=None, m=None, ap=0, of="id", verbose=0, ln=CFG_SITE_LANG, display_nearest_terms_box=True): """Search for complex pattern 'p' containing parenthesis within field 'f' according to matching type 'm'. Return hitset of recIDs. For more details on the parameters see 'search_pattern' """ _ = gettext_set_language(ln) # if the pattern uses SPIRES search syntax, convert it to Invenio syntax spires_syntax_converter = SpiresToInvenioSyntaxConverter() p = spires_syntax_converter.convert_query(p) # sanity check: do not call parenthesised parser for search terms # like U(1): if not re_pattern_parens.search(p): return search_pattern(req, p, f, m, ap, of, verbose, ln, display_nearest_terms_box=display_nearest_terms_box) # Try searching with parentheses try: parser = SearchQueryParenthesisedParser() # get a hitset with all recids result_hitset = HitSet(trailing_bits=1) # parse the query. The result is list of [op1, expr1, op2, expr2, ..., opN, exprN] parsing_result = parser.parse_query(p) if verbose and of.startswith("h"): print_warning(req, "Search stage 1: search_pattern_parenthesised() returned %s." % repr(parsing_result)) # go through every pattern # calculate hitset for it # combine pattern's hitset with the result using the corresponding operator for index in xrange(0, len(parsing_result)-1, 2 ): current_operator = parsing_result[index] current_pattern = parsing_result[index+1] # obtain a hitset for the current pattern current_hitset = search_pattern(req, current_pattern, f, m, ap, of, verbose, ln, display_nearest_terms_box=display_nearest_terms_box) # combine the current hitset with resulting hitset using the current operator if current_operator == '+': result_hitset = result_hitset & current_hitset elif current_operator == '-': result_hitset = result_hitset - current_hitset elif current_operator == '|': result_hitset = result_hitset | current_hitset else: assert False, "Unknown operator in search_pattern_parenthesised()" return result_hitset # If searching with parenteses fails, perform search ignoring parentheses except InvenioWebSearchQueryParserException: print_warning(req, _("Nested or mismatched parentheses detected. Ignoring all parentheses in the query...")) # remove the parentheses in the query. Current implementation removes all the parentheses, # but it could be improved to romove only these that are not insede quotes p = p.replace('(', ' ') p = p.replace(')', ' ') return search_pattern(req, p, f, m, ap, of, verbose, ln, display_nearest_terms_box=display_nearest_terms_box) def search_unit(p, f=None, m=None): """Search for basic search unit defined by pattern 'p' and field 'f' and matching type 'm'. Return hitset of recIDs. All the parameters are assumed to have been previously washed. 'p' is assumed to be already a ``basic search unit'' so that it is searched as such and is not broken up in any way. Only wildcard and span queries are being detected inside 'p'. This function is suitable as a low-level API. """ ## create empty output results set: set = HitSet() if not p: # sanity checking return set if f == 'datecreated': set = search_unit_in_bibrec(p, p, 'c') elif f == 'datemodified': set = search_unit_in_bibrec(p, p, 'm') elif m == 'a' or m == 'r': # we are doing either phrase search or regexp search index_id = get_index_id_from_field(f) if index_id != 0: set = search_unit_in_idxphrases(p, f, m) else: set = search_unit_in_bibxxx(p, f, m) elif p.startswith("cited:"): # we are doing search by the citation count set = search_unit_by_times_cited(p[6:]) else: # we are doing bibwords search by default set = search_unit_in_bibwords(p, f) return set def search_unit_in_bibwords(word, f, decompress=zlib.decompress): """Searches for 'word' inside bibwordsX table for field 'f' and returns hitset of recIDs.""" set = HitSet() # will hold output result set set_used = 0 # not-yet-used flag, to be able to circumvent set operations # deduce into which bibwordsX table we will search: stemming_language = get_index_stemming_language(get_index_id_from_field("anyfield")) bibwordsX = "idxWORD%02dF" % get_index_id_from_field("anyfield") if f: index_id = get_index_id_from_field(f) if index_id: bibwordsX = "idxWORD%02dF" % index_id stemming_language = get_index_stemming_language(index_id) else: return HitSet() # word index f does not exist # wash 'word' argument and run query: word = string.replace(word, '*', '%') # we now use '*' as the truncation character words = string.split(word, "->", 1) # check for span query if len(words) == 2: word0 = re_word.sub('', words[0]) word1 = re_word.sub('', words[1]) if stemming_language: word0 = lower_index_term(word0) word1 = lower_index_term(word1) word0 = stem(word0, stemming_language) word1 = stem(word1, stemming_language) res = run_sql("SELECT term,hitlist FROM %s WHERE term BETWEEN %%s AND %%s" % bibwordsX, (wash_index_term(word0), wash_index_term(word1))) else: if f == 'journal': pass # FIXME: quick hack for the journal index else: word = re_word.sub('', word) if stemming_language: word = lower_index_term(word) word = stem(word, stemming_language) if string.find(word, '%') >= 0: # do we have wildcard in the word? if f == 'journal': # FIXME: quick hack for the journal index # FIXME: we can run a sanity check here for all indexes res = () else: res = run_sql("SELECT term,hitlist FROM %s WHERE term LIKE %%s" % bibwordsX, (wash_index_term(word),)) else: res = run_sql("SELECT term,hitlist FROM %s WHERE term=%%s" % bibwordsX, (wash_index_term(word),)) # fill the result set: for word, hitlist in res: hitset_bibwrd = HitSet(hitlist) # add the results: if set_used: set.union_update(hitset_bibwrd) else: set = hitset_bibwrd set_used = 1 # okay, return result set: return set def search_unit_in_idxphrases(p, f, type): """Searches for phrase 'p' inside idxPHRASE*F table for field 'f' and returns hitset of recIDs found. The search type is defined by 'type' (e.g. equals to 'r' for a regexp search).""" set = HitSet() # will hold output result set set_used = 0 # not-yet-used flag, to be able to circumvent set operations # special washing for fuzzy author index: if f == 'author' or f == 'exactauthor': p = wash_author_name(p) # deduce in which idxPHRASE table we will search: idxphraseX = "idxPHRASE%02dF" % get_index_id_from_field("anyfield") if f: index_id = get_index_id_from_field(f) if index_id: idxphraseX = "idxPHRASE%02dF" % index_id else: return HitSet() # phrase index f does not exist # detect query type (exact phrase, partial phrase, regexp): if type == 'r': query_addons = "REGEXP %s" query_params = (p,) else: p = string.replace(p, '*', '%') # we now use '*' as the truncation character ps = string.split(p, "->", 1) # check for span query: if len(ps) == 2: query_addons = "BETWEEN %s AND %s" query_params = (ps[0], ps[1]) else: if string.find(p, '%') > -1: query_addons = "LIKE %s" query_params = (ps[0],) else: query_addons = "= %s" query_params = (ps[0],) # perform search: res = run_sql("SELECT term,hitlist FROM %s WHERE term %s" % (idxphraseX, query_addons), query_params) # fill the result set: for word, hitlist in res: hitset_bibphrase = HitSet(hitlist) # add the results: if set_used: set.union_update(hitset_bibphrase) else: set = hitset_bibphrase set_used = 1 # okay, return result set: return set def search_unit_in_bibxxx(p, f, type): """Searches for pattern 'p' inside bibxxx tables for field 'f' and returns hitset of recIDs found. The search type is defined by 'type' (e.g. equals to 'r' for a regexp search).""" # FIXME: quick hack for the journal index if f == 'journal': return search_unit_in_bibwords(p, f) p_orig = p # saving for eventual future 'no match' reporting query_addons = "" # will hold additional SQL code for the query query_params = () # will hold parameters for the query (their number may vary depending on TYPE argument) # wash arguments: f = string.replace(f, '*', '%') # replace truncation char '*' in field definition if type == 'r': query_addons = "REGEXP %s" query_params = (p,) else: p = string.replace(p, '*', '%') # we now use '*' as the truncation character ps = string.split(p, "->", 1) # check for span query: if len(ps) == 2: query_addons = "BETWEEN %s AND %s" query_params = (ps[0], ps[1]) else: if string.find(p, '%') > -1: query_addons = "LIKE %s" query_params = (ps[0],) else: query_addons = "= %s" query_params = (ps[0],) # construct 'tl' which defines the tag list (MARC tags) to search in: tl = [] if str(f[0]).isdigit() and str(f[1]).isdigit(): tl.append(f) # 'f' seems to be okay as it starts by two digits else: # convert old ALEPH tag names, if appropriate: (TODO: get rid of this before entering this function) if CFG_WEBSEARCH_FIELDS_CONVERT.has_key(string.lower(f)): f = CFG_WEBSEARCH_FIELDS_CONVERT[string.lower(f)] # deduce desired MARC tags on the basis of chosen 'f' tl = get_field_tags(f) if not tl: # f index does not exist, nevermind pass # okay, start search: l = [] # will hold list of recID that matched for t in tl: # deduce into which bibxxx table we will search: digit1, digit2 = int(t[0]), int(t[1]) bx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) # construct and run query: if t == "001": res = run_sql("SELECT id FROM bibrec WHERE id %s" % query_addons, query_params) else: query = "SELECT bibx.id_bibrec FROM %s AS bx LEFT JOIN %s AS bibx ON bx.id=bibx.id_bibxxx WHERE bx.value %s" % \ (bx, bibx, query_addons) if len(t) != 6 or t[-1:]=='%': # wildcard query, or only the beginning of field 't' # is defined, so add wildcard character: query += " AND bx.tag LIKE %s" res = run_sql(query, query_params + (t + '%',)) else: # exact query for 't': query += " AND bx.tag=%s" res = run_sql(query, query_params + (t,)) # fill the result set: for id_bibrec in res: if id_bibrec[0]: l.append(id_bibrec[0]) # check no of hits found: nb_hits = len(l) # okay, return result set: set = HitSet(l) return set def search_unit_in_bibrec(datetext1, datetext2, type='c'): """ Return hitset of recIDs found that were either created or modified (according to 'type' arg being 'c' or 'm') from datetext1 until datetext2, inclusive. Does not pay attention to pattern, collection, anything. Useful to intersect later on with the 'real' query. """ set = HitSet() if type.startswith("m"): type = "modification_date" else: type = "creation_date" # by default we are searching for creation dates if datetext1 == datetext2: res = run_sql("SELECT id FROM bibrec WHERE %s LIKE %%s" % (type,), (datetext1 + '%',)) else: res = run_sql("SELECT id FROM bibrec WHERE %s>=%%s AND %s<=%%s" % (type, type), (datetext1, datetext2)) for row in res: set += row[0] return set def search_unit_by_times_cited(p): """ Return histset of recIDs found that are cited P times. Usually P looks like '10->23'. """ numstr = '"'+p+'"' #this is sort of stupid but since we may need to #get the records that do _not_ have cites, we have to #know the ids of all records, too #but this is needed only if bsu_p is 0 or 0 or 0->0 allrecs = [] if p == 0 or p == "0" or \ p.startswith("0->") or p.endswith("->0"): allrecs = HitSet(run_sql_cached("SELECT id FROM bibrec", affected_tables=['bibrec'])) return get_records_with_num_cites(numstr, allrecs) def intersect_results_with_collrecs(req, hitset_in_any_collection, colls, ap=0, of="hb", verbose=0, ln=CFG_SITE_LANG, display_nearest_terms_box=True): """Return dict of hitsets given by intersection of hitset with the collection universes.""" _ = gettext_set_language(ln) # search stage 4: intersect with the collection universe: if verbose and of.startswith("h"): t1 = os.times()[4] results = {} results_nbhits = 0 for coll in colls: results[coll] = hitset_in_any_collection & get_collection_reclist(coll) results_nbhits += len(results[coll]) if results_nbhits == 0: # no hits found, try to search in Home: results_in_Home = hitset_in_any_collection & get_collection_reclist(CFG_SITE_NAME) if len(results_in_Home) > 0: # some hits found in Home, so propose this search: if of.startswith("h") and display_nearest_terms_box: url = websearch_templates.build_search_url(req.argd, cc=CFG_SITE_NAME, c=[]) print_warning(req, _("No match found in collection %(x_collection)s. Other public collections gave %(x_url_open)s%(x_nb_hits)d hits%(x_url_close)s.") %\ {'x_collection': '<em>' + string.join([get_coll_i18nname(coll, ln, False) for coll in colls], ', ') + '</em>', 'x_url_open': '<a class="nearestterms" href="%s">' % (url), 'x_nb_hits': len(results_in_Home), 'x_url_close': '</a>'}) results = {} else: # no hits found in Home, recommend different search terms: if of.startswith("h") and display_nearest_terms_box: print_warning(req, _("No public collection matched your query. " "If you were looking for a non-public document, please choose " "the desired restricted collection first.")) results = {} if verbose and of.startswith("h"): t2 = os.times()[4] print_warning(req, "Search stage 4: intersecting with collection universe gave %d hits." % results_nbhits) print_warning(req, "Search stage 4: execution took %.2f seconds." % (t2 - t1)) return results def intersect_results_with_hitset(req, results, hitset, ap=0, aptext="", of="hb"): """Return intersection of search 'results' (a dict of hitsets with collection as key) with the 'hitset', i.e. apply 'hitset' intersection to each collection within search 'results'. If the final 'results' set is to be empty, and 'ap' (approximate pattern) is true, and then print the `warningtext' and return the original 'results' set unchanged. If 'ap' is false, then return empty results set. """ if ap: results_ap = copy.deepcopy(results) else: results_ap = {} # will return empty dict in case of no hits found nb_total = 0 for coll in results.keys(): results[coll].intersection_update(hitset) nb_total += len(results[coll]) if nb_total == 0: if of.startswith("h"): print_warning(req, aptext) results = results_ap return results def create_similarly_named_authors_link_box(author_name, ln=CFG_SITE_LANG): """Return a box similar to ``Not satisfied...'' one by proposing author searches for similar names. Namely, take AUTHOR_NAME and the first initial of the firstame (after comma) and look into author index whether authors with e.g. middle names exist. Useful mainly for CERN Library that sometimes contains name forms like Ellis-N, Ellis-Nick, Ellis-Nicolas all denoting the same person. The box isn't proposed if no similarly named authors are found to exist. """ # return nothing if not configured: if CFG_WEBSEARCH_CREATE_SIMILARLY_NAMED_AUTHORS_LINK_BOX == 0: return "" # return empty box if there is no initial: if re.match(r'[^ ,]+, [^ ]', author_name) is None: return "" # firstly find name comma initial: author_name_to_search = re.sub(r'^([^ ,]+, +[^ ,]).*$', '\\1', author_name) # secondly search for similar name forms: similar_author_names = {} for name in author_name_to_search, strip_accents(author_name_to_search): for tag in get_field_tags("author"): # deduce into which bibxxx table we will search: digit1, digit2 = int(tag[0]), int(tag[1]) bx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) if len(tag) != 6 or tag[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character: res = run_sql("""SELECT bx.value FROM %s AS bx WHERE bx.value LIKE %%s AND bx.tag LIKE %%s""" % bx, (name + "%", tag + "%")) else: res = run_sql("""SELECT bx.value FROM %s AS bx WHERE bx.value LIKE %%s AND bx.tag=%%s""" % bx, (name + "%", tag)) for row in res: similar_author_names[row[0]] = 1 # remove the original name and sort the list: try: del similar_author_names[author_name] except KeyError: pass # thirdly print the box: out = "" if similar_author_names: out_authors = similar_author_names.keys() out_authors.sort() tmp_authors = [] for out_author in out_authors: nbhits = get_nbhits_in_bibxxx(out_author, "author") if nbhits: tmp_authors.append((out_author, nbhits)) out += websearch_templates.tmpl_similar_author_names( authors=tmp_authors, ln=ln) return out def create_nearest_terms_box(urlargd, p, f, t='w', n=5, ln=CFG_SITE_LANG, intro_text_p=True): """Return text box containing list of 'n' nearest terms above/below 'p' for the field 'f' for matching type 't' (words/phrases) in language 'ln'. Propose new searches according to `urlargs' with the new words. If `intro_text_p' is true, then display the introductory message, otherwise print only the nearest terms in the box content. """ # load the right message language _ = gettext_set_language(ln) out = "" nearest_terms = [] if not p: # sanity check p = "." index_id = get_index_id_from_field(f) # look for nearest terms: if t == 'w': nearest_terms = get_nearest_terms_in_bibwords(p, f, n, n) if not nearest_terms: return _("No word index is available for %s.") % \ ('<em>' + cgi.escape(get_field_i18nname(get_field_name(f) or f, ln, False)) + '</em>') else: nearest_terms = [] if index_id: nearest_terms = get_nearest_terms_in_idxphrase(p, index_id, n, n) if f == 'datecreated' or f == 'datemodified': nearest_terms = get_nearest_terms_in_bibrec(p, f, n, n) if not nearest_terms: nearest_terms = get_nearest_terms_in_bibxxx(p, f, n, n) if not nearest_terms: return _("No phrase index is available for %s.") % \ ('<em>' + cgi.escape(get_field_i18nname(get_field_name(f) or f, ln, False)) + '</em>') terminfo = [] for term in nearest_terms: if t == 'w': hits = get_nbhits_in_bibwords(term, f) else: if index_id: hits = get_nbhits_in_idxphrases(term, f) elif f == 'datecreated' or f == 'datemodified': hits = get_nbhits_in_bibrec(term, f) else: hits = get_nbhits_in_bibxxx(term, f) argd = {} argd.update(urlargd) # check which fields contained the requested parameter, and replace it. for (px, fx) in ('p', 'f'), ('p1', 'f1'), ('p2', 'f2'), ('p3', 'f3'): if px in argd: argd_px = argd[px] if t == 'w': # p was stripped of accents, to do the same: argd_px = strip_accents(argd_px) if f == argd[fx] or f == "anyfield" or f == "": if string.find(argd_px, p) > -1: argd[px] = string.replace(argd_px, p, term) break else: if string.find(argd_px, f+':'+p) > -1: argd[px] = string.replace(argd_px, f+':'+p, f+':'+term) break elif string.find(argd_px, f+':"'+p+'"') > -1: argd[px] = string.replace(argd_px, f+':"'+p+'"', f+':"'+term+'"') break terminfo.append((term, hits, argd)) intro = "" if intro_text_p: # add full leading introductory text if f: intro = _("Search term %(x_term)s inside index %(x_index)s did not match any record. Nearest terms in any collection are:") % \ {'x_term': "<em>" + cgi.escape(p.startswith("%") and p.endswith("%") and p[1:-1] or p) + "</em>", 'x_index': "<em>" + cgi.escape(get_field_i18nname(get_field_name(f) or f, ln, False)) + "</em>"} else: intro = _("Search term %s did not match any record. Nearest terms in any collection are:") % \ ("<em>" + cgi.escape(p.startswith("%") and p.endswith("%") and p[1:-1] or p) + "</em>") return websearch_templates.tmpl_nearest_term_box(p=p, ln=ln, f=f, terminfo=terminfo, intro=intro) def get_nearest_terms_in_bibwords(p, f, n_below, n_above): """Return list of +n -n nearest terms to word `p' in index for field `f'.""" nearest_words = [] # will hold the (sorted) list of nearest words to return # deduce into which bibwordsX table we will search: bibwordsX = "idxWORD%02dF" % get_index_id_from_field("anyfield") if f: index_id = get_index_id_from_field(f) if index_id: bibwordsX = "idxWORD%02dF" % index_id else: return nearest_words # firstly try to get `n' closest words above `p': res = run_sql("SELECT term FROM %s WHERE term<%%s ORDER BY term DESC LIMIT %%s" % bibwordsX, (p, n_above)) for row in res: nearest_words.append(row[0]) nearest_words.reverse() # secondly insert given word `p': nearest_words.append(p) # finally try to get `n' closest words below `p': res = run_sql("SELECT term FROM %s WHERE term>%%s ORDER BY term ASC LIMIT %%s" % bibwordsX, (p, n_below)) for row in res: nearest_words.append(row[0]) return nearest_words def get_nearest_terms_in_idxphrase(p, index_id, n_below, n_above): """Browse (-n_above, +n_below) closest bibliographic phrases for the given pattern p in the given field idxPHRASE table, regardless of collection. Return list of [phrase1, phrase2, ... , phrase_n].""" if CFG_INSPIRE_SITE and index_id == 3: # FIXME: workaround due to new fuzzy index return [p,] idxphraseX = "idxPHRASE%02dF" % index_id res_above = run_sql("SELECT term FROM %s WHERE term<%%s ORDER BY term DESC LIMIT %%s" % idxphraseX, (p, n_above)) res_above = map(lambda x: x[0], res_above) res_above.reverse() res_below = run_sql("SELECT term FROM %s WHERE term>=%%s ORDER BY term ASC LIMIT %%s" % idxphraseX, (p, n_below)) res_below = map(lambda x: x[0], res_below) return res_above + res_below def get_nearest_terms_in_idxphrase_with_collection(p, index_id, n_below, n_above, collection): """Browse (-n_above, +n_below) closest bibliographic phrases for the given pattern p in the given field idxPHRASE table, considering the collection (HitSet). Return list of [(phrase1, hitset), (phrase2, hitset), ... , (phrase_n, hitset)].""" idxphraseX = "idxPHRASE%02dF" % index_id res_above = run_sql("SELECT term,hitlist FROM %s WHERE term<%%s ORDER BY term DESC LIMIT %%s" % idxphraseX, (p, n_above * 3)) res_above = [(term, HitSet(hitlist) & collection) for term, hitlist in res_above] res_above = [(term, len(hitlist)) for term, hitlist in res_above if hitlist] res_below = run_sql("SELECT term,hitlist FROM %s WHERE term>=%%s ORDER BY term ASC LIMIT %%s" % idxphraseX, (p, n_below * 3)) res_below = [(term, HitSet(hitlist) & collection) for term, hitlist in res_below] res_below = [(term, len(hitlist)) for term, hitlist in res_below if hitlist] res_above.reverse() return res_above[-n_above:] + res_below[:n_below] def get_nearest_terms_in_bibxxx(p, f, n_below, n_above): """Browse (-n_above, +n_below) closest bibliographic phrases for the given pattern p in the given field f, regardless of collection. Return list of [phrase1, phrase2, ... , phrase_n].""" ## determine browse field: if not f and string.find(p, ":") > 0: # does 'p' contain ':'? f, p = string.split(p, ":", 1) # FIXME: quick hack for the journal index if f == 'journal': return get_nearest_terms_in_bibwords(p, f, n_below, n_above) ## We are going to take max(n_below, n_above) as the number of ## values to ferch from bibXXx. This is needed to work around ## MySQL UTF-8 sorting troubles in 4.0.x. Proper solution is to ## use MySQL 4.1.x or our own idxPHRASE in the future. index_id = get_index_id_from_field(f) if index_id: return get_nearest_terms_in_idxphrase(p, index_id, n_below, n_above) n_fetch = 2*max(n_below, n_above) ## construct 'tl' which defines the tag list (MARC tags) to search in: tl = [] if str(f[0]).isdigit() and str(f[1]).isdigit(): tl.append(f) # 'f' seems to be okay as it starts by two digits else: # deduce desired MARC tags on the basis of chosen 'f' tl = get_field_tags(f) ## start browsing to fetch list of hits: browsed_phrases = {} # will hold {phrase1: 1, phrase2: 1, ..., phraseN: 1} dict of browsed phrases (to make them unique) # always add self to the results set: browsed_phrases[p.startswith("%") and p.endswith("%") and p[1:-1] or p] = 1 for t in tl: # deduce into which bibxxx table we will search: digit1, digit2 = int(t[0]), int(t[1]) bx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) # firstly try to get `n' closest phrases above `p': if len(t) != 6 or t[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character: res = run_sql("""SELECT bx.value FROM %s AS bx WHERE bx.value<%%s AND bx.tag LIKE %%s ORDER BY bx.value DESC LIMIT %%s""" % bx, (p, t + "%", n_fetch)) else: res = run_sql("""SELECT bx.value FROM %s AS bx WHERE bx.value<%%s AND bx.tag=%%s ORDER BY bx.value DESC LIMIT %%s""" % bx, (p, t, n_fetch)) for row in res: browsed_phrases[row[0]] = 1 # secondly try to get `n' closest phrases equal to or below `p': if len(t) != 6 or t[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character: res = run_sql("""SELECT bx.value FROM %s AS bx WHERE bx.value>=%%s AND bx.tag LIKE %%s ORDER BY bx.value ASC LIMIT %%s""" % bx, (p, t + "%", n_fetch)) else: res = run_sql("""SELECT bx.value FROM %s AS bx WHERE bx.value>=%%s AND bx.tag=%%s ORDER BY bx.value ASC LIMIT %%s""" % bx, (p, t, n_fetch)) for row in res: browsed_phrases[row[0]] = 1 # select first n words only: (this is needed as we were searching # in many different tables and so aren't sure we have more than n # words right; this of course won't be needed when we shall have # one ACC table only for given field): phrases_out = browsed_phrases.keys() phrases_out.sort(lambda x, y: cmp(string.lower(strip_accents(x)), string.lower(strip_accents(y)))) # find position of self: try: idx_p = phrases_out.index(p) except: idx_p = len(phrases_out)/2 # return n_above and n_below: return phrases_out[max(0, idx_p-n_above):idx_p+n_below] def get_nearest_terms_in_bibrec(p, f, n_below, n_above): """Return list of nearest terms and counts from bibrec table. p is usually a date, and f either datecreated or datemodified. Note: below/above count is very approximative, not really respected. """ col = 'creation_date' if f == 'datemodified': col = 'modification_date' res_above = run_sql("""SELECT DATE_FORMAT(%s,'%%%%Y-%%%%m-%%%%d %%%%H:%%%%i:%%%%s') FROM bibrec WHERE %s < %%s ORDER BY %s ASC LIMIT %%s""" % (col, col, col), (p, n_above)) res_below = run_sql("""SELECT DATE_FORMAT(%s,'%%%%Y-%%%%m-%%%%d %%%%H:%%%%i:%%%%s') FROM bibrec WHERE %s > %%s ORDER BY %s ASC LIMIT %%s""" % (col, col, col), (p, n_below)) out = set([]) for row in res_above: out.add(row[0]) for row in res_below: out.add(row[0]) return list(out) def get_nbhits_in_bibrec(term, f): """Return number of hits in bibrec table. term is usually a date, and f is either 'datecreated' or 'datemodified'.""" col = 'creation_date' if f == 'datemodified': col = 'modification_date' res = run_sql("SELECT COUNT(*) FROM bibrec WHERE %s LIKE %%s" % (col,), (term + '%',)) return res[0][0] def get_nbhits_in_bibwords(word, f): """Return number of hits for word 'word' inside words index for field 'f'.""" out = 0 # deduce into which bibwordsX table we will search: bibwordsX = "idxWORD%02dF" % get_index_id_from_field("anyfield") if f: index_id = get_index_id_from_field(f) if index_id: bibwordsX = "idxWORD%02dF" % index_id else: return 0 if word: res = run_sql("SELECT hitlist FROM %s WHERE term=%%s" % bibwordsX, (word,)) for hitlist in res: out += len(HitSet(hitlist[0])) return out def get_nbhits_in_idxphrases(word, f): """Return number of hits for word 'word' inside phrase index for field 'f'.""" out = 0 # deduce into which bibwordsX table we will search: idxphraseX = "idxPHRASE%02dF" % get_index_id_from_field("anyfield") if f: index_id = get_index_id_from_field(f) if index_id: idxphraseX = "idxPHRASE%02dF" % index_id else: return 0 if word: res = run_sql("SELECT hitlist FROM %s WHERE term=%%s" % idxphraseX, (word,)) for hitlist in res: out += len(HitSet(hitlist[0])) return out def get_nbhits_in_bibxxx(p, f): """Return number of hits for word 'word' inside words index for field 'f'.""" ## determine browse field: if not f and string.find(p, ":") > 0: # does 'p' contain ':'? f, p = string.split(p, ":", 1) # FIXME: quick hack for the journal index if f == 'journal': return get_nbhits_in_bibwords(p, f) ## construct 'tl' which defines the tag list (MARC tags) to search in: tl = [] if str(f[0]).isdigit() and str(f[1]).isdigit(): tl.append(f) # 'f' seems to be okay as it starts by two digits else: # deduce desired MARC tags on the basis of chosen 'f' tl = get_field_tags(f) # start searching: recIDs = {} # will hold dict of {recID1: 1, recID2: 1, ..., } (unique recIDs, therefore) for t in tl: # deduce into which bibxxx table we will search: digit1, digit2 = int(t[0]), int(t[1]) bx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) if len(t) != 6 or t[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character: res = run_sql("""SELECT bibx.id_bibrec FROM %s AS bibx, %s AS bx WHERE bx.value=%%s AND bx.tag LIKE %%s AND bibx.id_bibxxx=bx.id""" % (bibx, bx), (p, t + "%")) else: res = run_sql("""SELECT bibx.id_bibrec FROM %s AS bibx, %s AS bx WHERE bx.value=%%s AND bx.tag=%%s AND bibx.id_bibxxx=bx.id""" % (bibx, bx), (p, t)) for row in res: recIDs[row[0]] = 1 return len(recIDs) def get_mysql_recid_from_aleph_sysno(sysno): """Returns DB's recID for ALEPH sysno passed in the argument (e.g. "002379334CER"). Returns None in case of failure.""" out = None res = run_sql("""SELECT bb.id_bibrec FROM bibrec_bib97x AS bb, bib97x AS b WHERE b.value=%s AND b.tag='970__a' AND bb.id_bibxxx=b.id""", (sysno,)) if res: out = res[0][0] return out def guess_primary_collection_of_a_record(recID): """Return primary collection name a record recid belongs to, by testing 980 identifier. May lead to bad guesses when a collection is defined dynamically via dbquery. In that case, return 'CFG_SITE_NAME'.""" out = CFG_SITE_NAME dbcollids = get_fieldvalues(recID, "980__a") if dbcollids: dbquery = "collection:" + dbcollids[0] res = run_sql("SELECT name FROM collection WHERE dbquery=%s", (dbquery,)) if res: out = res[0][0] if CFG_CERN_SITE: # dirty hack for ATLAS collections at CERN: if out in ('ATLAS Communications', 'ATLAS Internal Notes'): for alternative_collection in ('ATLAS Communications Physics', 'ATLAS Communications General', 'ATLAS Internal Notes Physics', 'ATLAS Internal Notes General',): if recID in get_collection_reclist(alternative_collection): out = alternative_collection break return out _re_collection_url = re.compile('/collection/(.+)') def guess_collection_of_a_record(recID, referer=None): """Return collection name a record recid belongs to, by first testing the referer URL if provided and otherwise returning the primary collection.""" if referer: dummy, hostname, path, dummy, query, dummy = urlparse.urlparse(referer) g = _re_collection_url.match(path) if g: name = urllib.unquote_plus(g.group(1)) if recID in get_collection_reclist(name): return name elif path.startswith('/search'): query = cgi.parse_qs(query) for name in query.get('cc', []) + query.get('c', []): if recID in get_collection_reclist(name): return name return guess_primary_collection_of_a_record(recID) def get_all_collections_of_a_record(recID): """Return all the collection names a record belongs to. Note this function is O(n_collections).""" ret = [] for name in collection_reclist_cache.cache.keys(): if recID in get_collection_reclist(name): ret.append(name) return ret def get_tag_name(tag_value, prolog="", epilog=""): """Return tag name from the known tag value, by looking up the 'tag' table. Return empty string in case of failure. Example: input='100__%', output=first author'.""" out = "" res = run_sql_cached("SELECT name FROM tag WHERE value=%s", (tag_value,), affected_tables=['tag',]) if res: out = prolog + res[0][0] + epilog return out def get_fieldcodes(): """Returns a list of field codes that may have been passed as 'search options' in URL. Example: output=['subject','division'].""" out = [] res = run_sql_cached("SELECT DISTINCT(code) FROM field", affected_tables=['field',]) for row in res: out.append(row[0]) return out def get_field_name(code): """Return the corresponding field_name given the field code. e.g. reportnumber -> report number.""" res = run_sql_cached("SELECT name FROM field WHERE code=%s", (code, ), affected_tables=['field',]) if res: return res[0][0] else: return "" def get_field_tags(field): """Returns a list of MARC tags for the field code 'field'. Returns empty list in case of error. Example: field='author', output=['100__%','700__%'].""" out = [] query = """SELECT t.value FROM tag AS t, field_tag AS ft, field AS f WHERE f.code=%s AND ft.id_field=f.id AND t.id=ft.id_tag ORDER BY ft.score DESC""" res = run_sql(query, (field, )) for val in res: out.append(val[0]) return out def get_fieldvalues(recIDs, tag, repetitive_values=True): """ Return list of field values for field TAG for the given record ID or list of record IDs. (RECIDS can be both an integer or a list of integers.) If REPETITIVE_VALUES is set to True, then return all values even if they are doubled. If set to False, then return unique values only. """ out = [] if isinstance(recIDs, (int, long)): recIDs =[recIDs,] if not isinstance(recIDs, (list, tuple)): return [] if len(recIDs) == 0: return [] if tag == "001___": # we have asked for tag 001 (=recID) that is not stored in bibXXx tables out = [str(recID) for recID in recIDs] else: # we are going to look inside bibXXx tables digits = tag[0:2] try: intdigits = int(digits) if intdigits < 0 or intdigits > 99: raise ValueError except ValueError: # invalid tag value asked for return [] bx = "bib%sx" % digits bibx = "bibrec_bib%sx" % digits queryparam = [] for recID in recIDs: queryparam.append(recID) if not repetitive_values: queryselect = "DISTINCT(bx.value)" else: queryselect = "bx.value" query = "SELECT %s FROM %s AS bx, %s AS bibx WHERE bibx.id_bibrec IN (%s) " \ " AND bx.id=bibx.id_bibxxx AND bx.tag LIKE %%s " \ " ORDER BY bibx.field_number, bx.tag ASC" % \ (queryselect, bx, bibx, ("%s,"*len(queryparam))[:-1]) res = run_sql(query, tuple(queryparam) + (tag,)) for row in res: out.append(row[0]) return out def get_fieldvalues_alephseq_like(recID, tags_in, can_see_hidden=False): """Return buffer of ALEPH sequential-like textual format with fields found in the list TAGS_IN for record RECID. If can_see_hidden is True, just print everything. Otherwise hide fields from CFG_BIBFORMAT_HIDDEN_TAGS. """ out = "" if type(tags_in) is not list: tags_in = [tags_in,] if len(tags_in) == 1 and len(tags_in[0]) == 6: ## case A: one concrete subfield asked, so print its value if found ## (use with care: can mislead if field has multiple occurrences) out += string.join(get_fieldvalues(recID, tags_in[0]),"\n") else: ## case B: print our "text MARC" format; works safely all the time # find out which tags to output: dict_of_tags_out = {} if not tags_in: for i in range(0, 10): for j in range(0, 10): dict_of_tags_out["%d%d%%" % (i, j)] = 1 else: for tag in tags_in: if len(tag) == 0: for i in range(0, 10): for j in range(0, 10): dict_of_tags_out["%d%d%%" % (i, j)] = 1 elif len(tag) == 1: for j in range(0, 10): dict_of_tags_out["%s%d%%" % (tag, j)] = 1 elif len(tag) < 5: dict_of_tags_out["%s%%" % tag] = 1 elif tag >= 6: dict_of_tags_out[tag[0:5]] = 1 tags_out = dict_of_tags_out.keys() tags_out.sort() # search all bibXXx tables as needed: for tag in tags_out: digits = tag[0:2] try: intdigits = int(digits) if intdigits < 0 or intdigits > 99: raise ValueError except ValueError: # invalid tag value asked for continue if tag.startswith("001") or tag.startswith("00%"): if out: out += "\n" out += "%09d %s %d" % (recID, "001__", recID) bx = "bib%sx" % digits bibx = "bibrec_bib%sx" % digits query = "SELECT b.tag,b.value,bb.field_number FROM %s AS b, %s AS bb "\ "WHERE bb.id_bibrec=%%s AND b.id=bb.id_bibxxx AND b.tag LIKE %%s"\ "ORDER BY bb.field_number, b.tag ASC" % (bx, bibx) res = run_sql(query, (recID, str(tag)+'%')) # go through fields: field_number_old = -999 field_old = "" for row in res: field, value, field_number = row[0], row[1], row[2] ind1, ind2 = field[3], field[4] printme = True #check the stuff in hiddenfields if not can_see_hidden: for htag in CFG_BIBFORMAT_HIDDEN_TAGS: ltag = len(htag) samelenfield = field[0:ltag] if samelenfield == htag: printme = False if ind1 == "_": ind1 = "" if ind2 == "_": ind2 = "" # print field tag if printme: if field_number != field_number_old or field[:-1] != field_old[:-1]: if out: out += "\n" out += "%09d %s " % (recID, field[:5]) field_number_old = field_number field_old = field # print subfield value if field[0:2] == "00" and field[-1:] == "_": out += value else: out += "$$%s%s" % (field[-1:], value) return out def record_exists(recID): """Return 1 if record RECID exists. Return 0 if it doesn't exist. Return -1 if it exists but is marked as deleted. """ out = 0 res = run_sql("SELECT id FROM bibrec WHERE id=%s", (recID,), 1) if res: recID = int(recID) # record exists; now check whether it isn't marked as deleted: dbcollids = get_fieldvalues(recID, "980__%") if ("DELETED" in dbcollids) or (CFG_CERN_SITE and "DUMMY" in dbcollids): out = -1 # exists, but marked as deleted else: out = 1 # exists fine return out def record_empty(recID): """ Is this record empty, e.g. has only 001, waiting for integration? @param recID: the record identifier. @type recID: int @return: 1 if the record is empty, 0 otherwise. @rtype: int """ record = get_record(recID) if record is None or len(record) < 2: return 1 else: return 0 def record_public_p(recID): """Return 1 if the record is public, i.e. if it can be found in the Home collection. Return 0 otherwise. """ return recID in get_collection_reclist(CFG_SITE_NAME) def get_creation_date(recID, fmt="%Y-%m-%d"): "Returns the creation date of the record 'recID'." out = "" res = run_sql("SELECT DATE_FORMAT(creation_date,%s) FROM bibrec WHERE id=%s", (fmt, recID), 1) if res: out = res[0][0] return out def get_modification_date(recID, fmt="%Y-%m-%d"): "Returns the date of last modification for the record 'recID'." out = "" res = run_sql("SELECT DATE_FORMAT(modification_date,%s) FROM bibrec WHERE id=%s", (fmt, recID), 1) if res: out = res[0][0] return out def print_warning(req, msg, type='', prologue='<br />', epilogue='<br />'): "Prints warning message and flushes output." if req and msg: req.write(websearch_templates.tmpl_print_warning( msg = msg, type = type, prologue = prologue, epilogue = epilogue, )) return def print_search_info(p, f, sf, so, sp, rm, of, ot, collection=CFG_SITE_NAME, nb_found=-1, jrec=1, rg=10, aas=0, ln=CFG_SITE_LANG, p1="", p2="", p3="", f1="", f2="", f3="", m1="", m2="", m3="", op1="", op2="", sc=1, pl_in_url="", d1y=0, d1m=0, d1d=0, d2y=0, d2m=0, d2d=0, dt="", cpu_time=-1, middle_only=0): """Prints stripe with the information on 'collection' and 'nb_found' results and CPU time. Also, prints navigation links (beg/next/prev/end) inside the results set. If middle_only is set to 1, it will only print the middle box information (beg/netx/prev/end/etc) links. This is suitable for displaying navigation links at the bottom of the search results page.""" out = "" # sanity check: if jrec < 1: jrec = 1 if jrec > nb_found: jrec = max(nb_found-rg+1, 1) return websearch_templates.tmpl_print_search_info( ln = ln, collection = collection, aas = aas, collection_name = get_coll_i18nname(collection, ln, False), collection_id = get_colID(collection), middle_only = middle_only, rg = rg, nb_found = nb_found, sf = sf, so = so, rm = rm, of = of, ot = ot, p = p, f = f, p1 = p1, p2 = p2, p3 = p3, f1 = f1, f2 = f2, f3 = f3, m1 = m1, m2 = m2, m3 = m3, op1 = op1, op2 = op2, pl_in_url = pl_in_url, d1y = d1y, d1m = d1m, d1d = d1d, d2y = d2y, d2m = d2m, d2d = d2d, dt = dt, jrec = jrec, sc = sc, sp = sp, all_fieldcodes = get_fieldcodes(), cpu_time = cpu_time, ) def print_hosted_search_info(p, f, sf, so, sp, rm, of, ot, collection=CFG_SITE_NAME, nb_found=-1, jrec=1, rg=10, aas=0, ln=CFG_SITE_LANG, p1="", p2="", p3="", f1="", f2="", f3="", m1="", m2="", m3="", op1="", op2="", sc=1, pl_in_url="", d1y=0, d1m=0, d1d=0, d2y=0, d2m=0, d2d=0, dt="", cpu_time=-1, middle_only=0): """Prints stripe with the information on 'collection' and 'nb_found' results and CPU time. Also, prints navigation links (beg/next/prev/end) inside the results set. If middle_only is set to 1, it will only print the middle box information (beg/netx/prev/end/etc) links. This is suitable for displaying navigation links at the bottom of the search results page.""" out = "" # sanity check: if jrec < 1: jrec = 1 if jrec > nb_found: jrec = max(nb_found-rg+1, 1) return websearch_templates.tmpl_print_hosted_search_info( ln = ln, collection = collection, aas = aas, collection_name = get_coll_i18nname(collection, ln, False), collection_id = get_colID(collection), middle_only = middle_only, rg = rg, nb_found = nb_found, sf = sf, so = so, rm = rm, of = of, ot = ot, p = p, f = f, p1 = p1, p2 = p2, p3 = p3, f1 = f1, f2 = f2, f3 = f3, m1 = m1, m2 = m2, m3 = m3, op1 = op1, op2 = op2, pl_in_url = pl_in_url, d1y = d1y, d1m = d1m, d1d = d1d, d2y = d2y, d2m = d2m, d2d = d2d, dt = dt, jrec = jrec, sc = sc, sp = sp, all_fieldcodes = get_fieldcodes(), cpu_time = cpu_time, ) def print_results_overview(req, colls, results_final_nb_total, results_final_nb, cpu_time, ln=CFG_SITE_LANG, ec=[], hosted_colls_potential_results_p=False): """Prints results overview box with links to particular collections below.""" out = "" new_colls = [] for coll in colls: new_colls.append({ 'id': get_colID(coll), 'code': coll, 'name': get_coll_i18nname(coll, ln, False), }) return websearch_templates.tmpl_print_results_overview( ln = ln, results_final_nb_total = results_final_nb_total, results_final_nb = results_final_nb, cpu_time = cpu_time, colls = new_colls, ec = ec, hosted_colls_potential_results_p = hosted_colls_potential_results_p, ) def print_hosted_results(url_and_engine, ln=CFG_SITE_LANG, of=None, req=None, no_records_found=False, search_timed_out=False, limit=CFG_EXTERNAL_COLLECTION_MAXRESULTS): """Prints the full results of a hosted collection""" if of.startswith("h"): if no_records_found: return "<br />No results found." if search_timed_out: return "<br />The search engine did not respond in time." return websearch_templates.tmpl_print_hosted_results( url_and_engine=url_and_engine, ln=ln, of=of, req=req, limit=limit ) def sort_records(req, recIDs, sort_field='', sort_order='d', sort_pattern='', verbose=0, of='hb', ln=CFG_SITE_LANG): """Sort records in 'recIDs' list according sort field 'sort_field' in order 'sort_order'. If more than one instance of 'sort_field' is found for a given record, try to choose that that is given by 'sort pattern', for example "sort by report number that starts by CERN-PS". Note that 'sort_field' can be field code like 'author' or MARC tag like '100__a' directly.""" _ = gettext_set_language(ln) ## check arguments: if not sort_field: return recIDs if len(recIDs) > CFG_WEBSEARCH_NB_RECORDS_TO_SORT: if of.startswith('h'): print_warning(req, _("Sorry, sorting is allowed on sets of up to %d records only. Using default sort order.") % CFG_WEBSEARCH_NB_RECORDS_TO_SORT, "Warning") return recIDs sort_fields = string.split(sort_field, ",") recIDs_dict = {} recIDs_out = [] ## first deduce sorting MARC tag out of the 'sort_field' argument: tags = [] for sort_field in sort_fields: if sort_field and str(sort_field[0:2]).isdigit(): # sort_field starts by two digits, so this is probably a MARC tag already tags.append(sort_field) else: # let us check the 'field' table query = """SELECT DISTINCT(t.value) FROM tag AS t, field_tag AS ft, field AS f WHERE f.code=%s AND ft.id_field=f.id AND t.id=ft.id_tag ORDER BY ft.score DESC""" res = run_sql(query, (sort_field, )) if res: for row in res: tags.append(row[0]) else: if of.startswith('h'): print_warning(req, _("Sorry, %s does not seem to be a valid sort option. Choosing title sort instead.") % cgi.escape(sort_field), "Error") tags.append("245__a") if verbose >= 3: print_warning(req, "Sorting by tags %s." % cgi.escape(repr(tags))) if sort_pattern: print_warning(req, "Sorting preferentially by %s." % cgi.escape(sort_pattern)) ## check if we have sorting tag defined: if tags: # fetch the necessary field values: for recID in recIDs: val = "" # will hold value for recID according to which sort vals = [] # will hold all values found in sorting tag for recID for tag in tags: vals.extend(get_fieldvalues(recID, tag)) if sort_pattern: # try to pick that tag value that corresponds to sort pattern bingo = 0 for v in vals: if v.lower().startswith(sort_pattern.lower()): # bingo! bingo = 1 val = v break if not bingo: # sort_pattern not present, so add other vals after spaces val = sort_pattern + " " + string.join(vals) else: # no sort pattern defined, so join them all together val = string.join(vals) val = strip_accents(val.lower()) # sort values regardless of accents and case if recIDs_dict.has_key(val): recIDs_dict[val].append(recID) else: recIDs_dict[val] = [recID] # sort them: recIDs_dict_keys = recIDs_dict.keys() recIDs_dict_keys.sort() # now that keys are sorted, create output array: for k in recIDs_dict_keys: for s in recIDs_dict[k]: recIDs_out.append(s) # ascending or descending? if sort_order == 'a': recIDs_out.reverse() # okay, we are done return recIDs_out else: # good, no sort needed return recIDs def print_records(req, recIDs, jrec=1, rg=10, format='hb', ot='', ln=CFG_SITE_LANG, relevances=[], relevances_prologue="(", relevances_epilogue="%%)", decompress=zlib.decompress, search_pattern='', print_records_prologue_p=True, print_records_epilogue_p=True, verbose=0, tab=''): """ Prints list of records 'recIDs' formatted according to 'format' in groups of 'rg' starting from 'jrec'. Assumes that the input list 'recIDs' is sorted in reverse order, so it counts records from tail to head. A value of 'rg=-9999' means to print all records: to be used with care. Print also list of RELEVANCES for each record (if defined), in between RELEVANCE_PROLOGUE and RELEVANCE_EPILOGUE. Print prologue and/or epilogue specific to 'format' if 'print_records_prologue_p' and/or print_records_epilogue_p' are True. """ # load the right message language _ = gettext_set_language(ln) # sanity checking: if req is None: return # get user_info (for formatting based on user) if isinstance(req, cStringIO.OutputType): user_info = {} else: user_info = collect_user_info(req) if len(recIDs): nb_found = len(recIDs) if rg == -9999: # print all records rg = nb_found else: rg = abs(rg) if jrec < 1: # sanity checks jrec = 1 if jrec > nb_found: jrec = max(nb_found-rg+1, 1) # will print records from irec_max to irec_min excluded: irec_max = nb_found - jrec irec_min = nb_found - jrec - rg if irec_min < 0: irec_min = -1 if irec_max >= nb_found: irec_max = nb_found - 1 #req.write("%s:%d-%d" % (recIDs, irec_min, irec_max)) if format.startswith('x'): # print header if needed if print_records_prologue_p: print_records_prologue(req, format) # print records recIDs_to_print = [recIDs[x] for x in range(irec_max, irec_min, -1)] format_records(recIDs_to_print, format, ln=ln, search_pattern=search_pattern, record_separator="\n", user_info=user_info, req=req) # print footer if needed if print_records_epilogue_p: print_records_epilogue(req, format) elif format.startswith('t') or str(format[0:3]).isdigit(): # we are doing plain text output: for irec in range(irec_max, irec_min, -1): x = print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern, user_info=user_info, verbose=verbose) req.write(x) if x: req.write('\n') elif format == 'excel': recIDs_to_print = [recIDs[x] for x in range(irec_max, irec_min, -1)] create_excel(recIDs=recIDs_to_print, req=req, ln=ln, ot=ot) else: # we are doing HTML output: if format == 'hp' or format.startswith("hb_") or format.startswith("hd_"): # portfolio and on-the-fly formats: for irec in range(irec_max, irec_min, -1): req.write(print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern, user_info=user_info, verbose=verbose)) elif format.startswith("hb"): # HTML brief format: display_add_to_basket = True if user_info: if user_info['email'] == 'guest': if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS > 4: display_add_to_basket = False else: if not user_info['precached_usebaskets']: display_add_to_basket = False req.write(websearch_templates.tmpl_record_format_htmlbrief_header( ln = ln)) for irec in range(irec_max, irec_min, -1): row_number = jrec+irec_max-irec recid = recIDs[irec] if relevances and relevances[irec]: relevance = relevances[irec] else: relevance = '' record = print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern, user_info=user_info, verbose=verbose) req.write(websearch_templates.tmpl_record_format_htmlbrief_body( ln = ln, recid = recid, row_number = row_number, relevance = relevance, record = record, relevances_prologue = relevances_prologue, relevances_epilogue = relevances_epilogue, display_add_to_basket = display_add_to_basket )) req.write(websearch_templates.tmpl_record_format_htmlbrief_footer( ln = ln, display_add_to_basket = display_add_to_basket)) elif format.startswith("hd"): # HTML detailed format: for irec in range(irec_max, irec_min, -1): unordered_tabs = get_detailed_page_tabs(get_colID(guess_primary_collection_of_a_record(recIDs[irec])), recIDs[irec], ln=ln) ordered_tabs_id = [(tab_id, values['order']) for (tab_id, values) in unordered_tabs.iteritems()] ordered_tabs_id.sort(lambda x,y: cmp(x[1],y[1])) link_ln = '' if ln != CFG_SITE_LANG: link_ln = '?ln=%s' % ln if CFG_WEBSEARCH_USE_ALEPH_SYSNOS: recid_to_display = get_fieldvalues(recIDs[irec], CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG)[0] else: recid_to_display = recIDs[irec] citedbynum = 0 #num of citations, to be shown in the cit tab references = -1 #num of references citedbynum = get_cited_by_count(recid_to_display) reftag = "" reftags = get_field_tags("reference") if reftags: reftag = reftags[0] tmprec = get_record(recid_to_display) if reftag and len(reftag) > 4: references = len(record_get_field_instances(tmprec, reftag[0:3], reftag[3], reftag[4])) tabs = [(unordered_tabs[tab_id]['label'], \ '%s/record/%s/%s%s' % (CFG_SITE_URL, recid_to_display, tab_id, link_ln), \ tab_id == tab, unordered_tabs[tab_id]['enabled']) \ for (tab_id, order) in ordered_tabs_id if unordered_tabs[tab_id]['visible'] == True] # load content if tab == 'usage': req.write(webstyle_templates.detailed_record_container_top(recIDs[irec], tabs, ln, citationnum=citedbynum, referencenum=references)) r = calculate_reading_similarity_list(recIDs[irec], "downloads") downloadsimilarity = None downloadhistory = None #if r: # downloadsimilarity = r if CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS: downloadhistory = create_download_history_graph_and_box(recIDs[irec], ln) r = calculate_reading_similarity_list(recIDs[irec], "pageviews") viewsimilarity = None if r: viewsimilarity = r content = websearch_templates.tmpl_detailed_record_statistics(recIDs[irec], ln, downloadsimilarity=downloadsimilarity, downloadhistory=downloadhistory, viewsimilarity=viewsimilarity) req.write(content) req.write(webstyle_templates.detailed_record_container_bottom(recIDs[irec], tabs, ln)) elif tab == 'citations': recid = recIDs[irec] req.write(webstyle_templates.detailed_record_container_top(recid, tabs, ln, citationnum=citedbynum, referencenum=references)) req.write(websearch_templates.tmpl_detailed_record_citations_prologue(recid, ln)) # Citing citinglist = calculate_cited_by_list(recid) req.write(websearch_templates.tmpl_detailed_record_citations_citing_list(recid, ln, citinglist=citinglist)) # Self-cited selfcited = get_self_cited_by(recid) req.write(websearch_templates.tmpl_detailed_record_citations_self_cited(recid, ln, selfcited=selfcited, citinglist=citinglist)) # Co-cited s = calculate_co_cited_with_list(recid) cociting = None if s: cociting = s req.write(websearch_templates.tmpl_detailed_record_citations_co_citing(recid, ln, cociting=cociting)) # Citation history, if needed citationhistory = None if citinglist: citationhistory = create_citation_history_graph_and_box(recid, ln) #debug if verbose > 3: print_warning(req, "Citation graph debug: " + \ str(len(citationhistory))) req.write(websearch_templates.tmpl_detailed_record_citations_citation_history(recid, ln, citationhistory)) req.write(websearch_templates.tmpl_detailed_record_citations_epilogue(recid, ln)) req.write(webstyle_templates.detailed_record_container_bottom(recid, tabs, ln)) elif tab == 'references': req.write(webstyle_templates.detailed_record_container_top(recIDs[irec], tabs, ln, citationnum=citedbynum, referencenum=references)) req.write(format_record(recIDs[irec], 'HDREF', ln=ln, user_info=user_info, verbose=verbose)) req.write(webstyle_templates.detailed_record_container_bottom(recIDs[irec], tabs, ln)) elif tab == 'keywords': from bibclassify_webinterface import \ record_get_keywords, get_sorting_options, \ generate_keywords, get_keywords_body from invenio.webinterface_handler import wash_urlargd form = req.form argd = wash_urlargd(form, { 'generate': (str, 'no'), 'sort': (str, 'occurrences'), 'type': (str, 'tagcloud'), 'numbering': (str, 'off'), }) recid = recIDs[irec] req.write(webstyle_templates.detailed_record_container_top(recid, tabs, ln, citationnum=citedbynum, referencenum=references)) if argd['generate'] == 'yes': # The user asked to generate the keywords. keywords = generate_keywords(req, recid) else: # Get the keywords contained in the MARC. keywords = record_get_keywords(recid, argd) if keywords: req.write(get_sorting_options(argd, keywords)) elif argd['sort'] == 'related' and not keywords: req.write('You may want to run BibIndex.') # Output the keywords or the generate button. get_keywords_body(keywords, req, recid, argd) req.write(webstyle_templates.detailed_record_container_bottom(recid, tabs, ln)) else: # Metadata tab req.write(webstyle_templates.detailed_record_container_top(recIDs[irec], tabs, ln, show_short_rec_p=False, citationnum=citedbynum, referencenum=references)) creationdate = None modificationdate = None if record_exists(recIDs[irec]) == 1: creationdate = get_creation_date(recIDs[irec]) modificationdate = get_modification_date(recIDs[irec]) content = print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern, user_info=user_info, verbose=verbose) content = websearch_templates.tmpl_detailed_record_metadata( recID = recIDs[irec], ln = ln, format = format, creationdate = creationdate, modificationdate = modificationdate, content = content) req.write(content) req.write(webstyle_templates.detailed_record_container_bottom(recIDs[irec], tabs, ln, creationdate=creationdate, modificationdate=modificationdate, show_short_rec_p=False)) if len(tabs) > 0: # Add the mini box at bottom of the page if CFG_WEBCOMMENT_ALLOW_REVIEWS: from invenio.webcomment import get_mini_reviews reviews = get_mini_reviews(recid = recIDs[irec], ln=ln) else: reviews = '' actions = format_record(recIDs[irec], 'HDACT', ln=ln, user_info=user_info, verbose=verbose) files = format_record(recIDs[irec], 'HDFILE', ln=ln, user_info=user_info, verbose=verbose) req.write(webstyle_templates.detailed_record_mini_panel(recIDs[irec], ln, format, files=files, reviews=reviews, actions=actions)) else: # Other formats for irec in range(irec_max, irec_min, -1): req.write(print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern, user_info=user_info, verbose=verbose)) else: print_warning(req, _("Use different search terms.")) def print_records_prologue(req, format): """ Print the appropriate prologue for list of records in the given format. """ prologue = "" # no prologue needed for HTML or Text formats if format.startswith('xm'): prologue = websearch_templates.tmpl_xml_marc_prologue() elif format.startswith('xn'): prologue = websearch_templates.tmpl_xml_nlm_prologue() elif format.startswith('xw'): prologue = websearch_templates.tmpl_xml_refworks_prologue() elif format.startswith('xr'): prologue = websearch_templates.tmpl_xml_rss_prologue() elif format.startswith('xe'): prologue = websearch_templates.tmpl_xml_endnote_prologue() elif format.startswith('xo'): prologue = websearch_templates.tmpl_xml_mods_prologue() elif format.startswith('x'): prologue = websearch_templates.tmpl_xml_default_prologue() req.write(prologue) def print_records_epilogue(req, format): """ Print the appropriate epilogue for list of records in the given format. """ epilogue = "" # no epilogue needed for HTML or Text formats if format.startswith('xm'): epilogue = websearch_templates.tmpl_xml_marc_epilogue() elif format.startswith('xn'): epilogue = websearch_templates.tmpl_xml_nlm_epilogue() elif format.startswith('xw'): epilogue = websearch_templates.tmpl_xml_refworks_epilogue() elif format.startswith('xr'): epilogue = websearch_templates.tmpl_xml_rss_epilogue() elif format.startswith('xe'): epilogue = websearch_templates.tmpl_xml_endnote_epilogue() elif format.startswith('xo'): epilogue = websearch_templates.tmpl_xml_mods_epilogue() elif format.startswith('x'): epilogue = websearch_templates.tmpl_xml_default_epilogue() req.write(epilogue) def get_record(recid): """Directly the record object corresponding to the recid.""" from marshal import loads, dumps from zlib import compress, decompress if CFG_BIBUPLOAD_SERIALIZE_RECORD_STRUCTURE: value = run_sql('SELECT value FROM bibfmt WHERE id_bibrec=%s AND FORMAT=\'recstruct\'', (recid, )) if value: try: return loads(decompress(value[0][0])) except: ### In case of corruption, let's rebuild it! pass return create_record(print_record(recid, 'xm'))[0] def print_record(recID, format='hb', ot='', ln=CFG_SITE_LANG, decompress=zlib.decompress, search_pattern=None, user_info=None, verbose=0): """Prints record 'recID' formatted accoding to 'format'.""" if format == 'recstruct': return get_record(recID) _ = gettext_set_language(ln) #check from user information if the user has the right to see hidden fields/tags in the #records as well can_see_hidden = (acc_authorize_action(user_info, 'runbibedit')[0] == 0) out = "" # sanity check: record_exist_p = record_exists(recID) if record_exist_p == 0: # doesn't exist return out # New Python BibFormat procedure for formatting # Old procedure follows further below # We must still check some special formats, but these # should disappear when BibFormat improves. if not (CFG_BIBFORMAT_USE_OLD_BIBFORMAT \ or format.lower().startswith('t') \ or format.lower().startswith('hm') \ or str(format[0:3]).isdigit() \ or ot): # Unspecified format is hd if format == '': format = 'hd' if record_exist_p == -1 and get_output_format_content_type(format) == 'text/html': # HTML output displays a default value for deleted records. # Other format have to deal with it. out += _("The record has been deleted.") else: out += call_bibformat(recID, format, ln, search_pattern=search_pattern, user_info=user_info, verbose=verbose) # at the end of HTML brief mode, print the "Detailed record" functionality: if format.lower().startswith('hb') and \ format.lower() != 'hb_p': out += websearch_templates.tmpl_print_record_brief_links( ln = ln, recID = recID, ) return out # Old PHP BibFormat procedure for formatting # print record opening tags, if needed: if format == "marcxml" or format == "oai_dc": out += " <record>\n" out += " <header>\n" for oai_id in get_fieldvalues(recID, CFG_OAI_ID_FIELD): out += " <identifier>%s</identifier>\n" % oai_id out += " <datestamp>%s</datestamp>\n" % get_modification_date(recID) out += " </header>\n" out += " <metadata>\n" if format.startswith("xm") or format == "marcxml": # look for detailed format existence: query = "SELECT value FROM bibfmt WHERE id_bibrec=%s AND format=%s" res = run_sql(query, (recID, format), 1) if res and record_exist_p == 1: # record 'recID' is formatted in 'format', so print it out += "%s" % decompress(res[0][0]) else: # record 'recID' is not formatted in 'format' -- they are not in "bibfmt" table; so fetch all the data from "bibXXx" tables: if format == "marcxml": out += """ <record xmlns="http://www.loc.gov/MARC21/slim">\n""" out += " <controlfield tag=\"001\">%d</controlfield>\n" % int(recID) elif format.startswith("xm"): out += """ <record>\n""" out += " <controlfield tag=\"001\">%d</controlfield>\n" % int(recID) if record_exist_p == -1: # deleted record, so display only OAI ID and 980: oai_ids = get_fieldvalues(recID, CFG_OAI_ID_FIELD) if oai_ids: out += "<datafield tag=\"%s\" ind1=\"%s\" ind2=\"%s\"><subfield code=\"%s\">%s</subfield></datafield>\n" % \ (CFG_OAI_ID_FIELD[0:3], CFG_OAI_ID_FIELD[3:4], CFG_OAI_ID_FIELD[4:5], CFG_OAI_ID_FIELD[5:6], oai_ids[0]) out += "<datafield tag=\"980\" ind1=\"\" ind2=\"\"><subfield code=\"c\">DELETED</subfield></datafield>\n" else: # controlfields query = "SELECT b.tag,b.value,bb.field_number FROM bib00x AS b, bibrec_bib00x AS bb "\ "WHERE bb.id_bibrec=%s AND b.id=bb.id_bibxxx AND b.tag LIKE '00%%' "\ "ORDER BY bb.field_number, b.tag ASC" res = run_sql(query, (recID, )) for row in res: field, value = row[0], row[1] value = encode_for_xml(value) out += """ <controlfield tag="%s" >%s</controlfield>\n""" % \ (encode_for_xml(field[0:3]), value) # datafields i = 1 # Do not process bib00x and bibrec_bib00x, as # they are controlfields. So start at bib01x and # bibrec_bib00x (and set i = 0 at the end of # first loop) for digit1 in range(0, 10): for digit2 in range(i, 10): bx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) query = "SELECT b.tag,b.value,bb.field_number FROM %s AS b, %s AS bb "\ "WHERE bb.id_bibrec=%%s AND b.id=bb.id_bibxxx AND b.tag LIKE %%s"\ "ORDER BY bb.field_number, b.tag ASC" % (bx, bibx) res = run_sql(query, (recID, str(digit1)+str(digit2)+'%')) field_number_old = -999 field_old = "" for row in res: field, value, field_number = row[0], row[1], row[2] ind1, ind2 = field[3], field[4] if ind1 == "_" or ind1 == "": ind1 = " " if ind2 == "_" or ind2 == "": ind2 = " " # print field tag, unless hidden printme = True if not can_see_hidden: for htag in CFG_BIBFORMAT_HIDDEN_TAGS: ltag = len(htag) samelenfield = field[0:ltag] if samelenfield == htag: printme = False if printme: if field_number != field_number_old or field[:-1] != field_old[:-1]: if field_number_old != -999: out += """ </datafield>\n""" out += """ <datafield tag="%s" ind1="%s" ind2="%s">\n""" % \ (encode_for_xml(field[0:3]), encode_for_xml(ind1), encode_for_xml(ind2)) field_number_old = field_number field_old = field # print subfield value value = encode_for_xml(value) out += """ <subfield code="%s">%s</subfield>\n""" % \ (encode_for_xml(field[-1:]), value) # all fields/subfields printed in this run, so close the tag: if field_number_old != -999: out += """ </datafield>\n""" i = 0 # Next loop should start looking at bib%0 and bibrec_bib00x # we are at the end of printing the record: out += " </record>\n" elif format == "xd" or format == "oai_dc": # XML Dublin Core format, possibly OAI -- select only some bibXXx fields: out += """ <dc xmlns="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://purl.org/dc/elements/1.1/ http://www.openarchives.org/OAI/1.1/dc.xsd">\n""" if record_exist_p == -1: out += "" else: for f in get_fieldvalues(recID, "041__a"): out += " <language>%s</language>\n" % f for f in get_fieldvalues(recID, "100__a"): out += " <creator>%s</creator>\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "700__a"): out += " <creator>%s</creator>\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "245__a"): out += " <title>%s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "65017a"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "8564_u"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "520__a"): out += " %s\n" % encode_for_xml(f) out += " %s\n" % get_creation_date(recID) out += " \n" elif len(format) == 6 and str(format[0:3]).isdigit(): # user has asked to print some fields only if format == "001": out += "%s\n" % (format, recID, format) else: vals = get_fieldvalues(recID, format) for val in vals: out += "%s\n" % (format, val, format) elif format.startswith('t'): ## user directly asked for some tags to be displayed only if record_exist_p == -1: out += get_fieldvalues_alephseq_like(recID, ["001", CFG_OAI_ID_FIELD, "980"], can_see_hidden) else: out += get_fieldvalues_alephseq_like(recID, ot, can_see_hidden) elif format == "hm": if record_exist_p == -1: out += "\n
    " + cgi.escape(get_fieldvalues_alephseq_like(recID, ["001", CFG_OAI_ID_FIELD, "980"], can_see_hidden)) + "
    " else: out += "\n
    " + cgi.escape(get_fieldvalues_alephseq_like(recID, ot, can_see_hidden)) + "
    " elif format.startswith("h") and ot: ## user directly asked for some tags to be displayed only if record_exist_p == -1: out += "\n
    " + get_fieldvalues_alephseq_like(recID, ["001", CFG_OAI_ID_FIELD, "980"], can_see_hidden) + "
    " else: out += "\n
    " + get_fieldvalues_alephseq_like(recID, ot, can_see_hidden) + "
    " elif format == "hd": # HTML detailed format if record_exist_p == -1: out += _("The record has been deleted.") else: # look for detailed format existence: query = "SELECT value FROM bibfmt WHERE id_bibrec=%s AND format=%s" res = run_sql(query, (recID, format), 1) if res: # record 'recID' is formatted in 'format', so print it out += "%s" % decompress(res[0][0]) else: # record 'recID' is not formatted in 'format', so try to call BibFormat on the fly or use default format: out_record_in_format = call_bibformat(recID, format, ln, search_pattern=search_pattern, user_info=user_info, verbose=verbose) if out_record_in_format: out += out_record_in_format else: out += websearch_templates.tmpl_print_record_detailed( ln = ln, recID = recID, ) elif format.startswith("hb_") or format.startswith("hd_"): # underscore means that HTML brief/detailed formats should be called on-the-fly; suitable for testing formats if record_exist_p == -1: out += _("The record has been deleted.") else: out += call_bibformat(recID, format, ln, search_pattern=search_pattern, user_info=user_info, verbose=verbose) elif format.startswith("hx"): # BibTeX format, called on the fly: if record_exist_p == -1: out += _("The record has been deleted.") else: out += call_bibformat(recID, format, ln, search_pattern=search_pattern, user_info=user_info, verbose=verbose) elif format.startswith("hs"): # for citation/download similarity navigation links: if record_exist_p == -1: out += _("The record has been deleted.") else: out += '' % websearch_templates.build_search_url(recid=recID, ln=ln) # firstly, title: titles = get_fieldvalues(recID, "245__a") if titles: for title in titles: out += "%s" % title else: # usual title not found, try conference title: titles = get_fieldvalues(recID, "111__a") if titles: for title in titles: out += "%s" % title else: # just print record ID: out += "%s %d" % (get_field_i18nname("record ID", ln, False), recID) out += "" # secondly, authors: authors = get_fieldvalues(recID, "100__a") + get_fieldvalues(recID, "700__a") if authors: out += " - %s" % authors[0] if len(authors) > 1: out += " et al" # thirdly publication info: publinfos = get_fieldvalues(recID, "773__s") if not publinfos: publinfos = get_fieldvalues(recID, "909C4s") if not publinfos: publinfos = get_fieldvalues(recID, "037__a") if not publinfos: publinfos = get_fieldvalues(recID, "088__a") if publinfos: out += " - %s" % publinfos[0] else: # fourthly publication year (if not publication info): years = get_fieldvalues(recID, "773__y") if not years: years = get_fieldvalues(recID, "909C4y") if not years: years = get_fieldvalues(recID, "260__c") if years: out += " (%s)" % years[0] else: # HTML brief format by default if record_exist_p == -1: out += _("The record has been deleted.") else: query = "SELECT value FROM bibfmt WHERE id_bibrec=%s AND format=%s" res = run_sql(query, (recID, format)) if res: # record 'recID' is formatted in 'format', so print it out += "%s" % decompress(res[0][0]) else: # record 'recID' is not formatted in 'format', so try to call BibFormat on the fly: or use default format: if CFG_WEBSEARCH_CALL_BIBFORMAT: out_record_in_format = call_bibformat(recID, format, ln, search_pattern=search_pattern, user_info=user_info, verbose=verbose) if out_record_in_format: out += out_record_in_format else: out += websearch_templates.tmpl_print_record_brief( ln = ln, recID = recID, ) else: out += websearch_templates.tmpl_print_record_brief( ln = ln, recID = recID, ) # at the end of HTML brief mode, print the "Detailed record" functionality: if format == 'hp' or format.startswith("hb_") or format.startswith("hd_"): pass # do nothing for portfolio and on-the-fly formats else: out += websearch_templates.tmpl_print_record_brief_links( ln = ln, recID = recID, ) # print record closing tags, if needed: if format == "marcxml" or format == "oai_dc": out += " \n" out += " \n" return out def call_bibformat(recID, format="HD", ln=CFG_SITE_LANG, search_pattern=None, user_info=None, verbose=0): """ Calls BibFormat and returns formatted record. BibFormat will decide by itself if old or new BibFormat must be used. """ from invenio.bibformat_utils import get_pdf_snippets keywords = [] if search_pattern is not None: units = create_basic_search_units(None, str(search_pattern), None) keywords = [unit[1] for unit in units if unit[0] != '-'] out = format_record(recID, of=format, ln=ln, search_pattern=keywords, user_info=user_info, verbose=verbose) if CFG_WEBSEARCH_FULLTEXT_SNIPPETS and user_info and \ 'fulltext' in user_info['uri']: # check snippets only if URL contains fulltext # FIXME: make it work for CLI too, via new function arg if keywords: snippets = get_pdf_snippets(recID, keywords) if snippets: out += snippets return out def log_query(hostname, query_args, uid=-1): """ Log query into the query and user_query tables. Return id_query or None in case of problems. """ id_query = None if uid >= 0: # log the query only if uid is reasonable res = run_sql("SELECT id FROM query WHERE urlargs=%s", (query_args,), 1) try: id_query = res[0][0] except: id_query = run_sql("INSERT INTO query (type, urlargs) VALUES ('r', %s)", (query_args,)) if id_query: run_sql("INSERT INTO user_query (id_user, id_query, hostname, date) VALUES (%s, %s, %s, %s)", (uid, id_query, hostname, time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))) return id_query def log_query_info(action, p, f, colls, nb_records_found_total=-1): """Write some info to the log file for later analysis.""" try: log = open(CFG_LOGDIR + "/search.log", "a") log.write(time.strftime("%Y%m%d%H%M%S#", time.localtime())) log.write(action+"#") log.write(p+"#") log.write(f+"#") for coll in colls[:-1]: log.write("%s," % coll) log.write("%s#" % colls[-1]) log.write("%d" % nb_records_found_total) log.write("\n") log.close() except: pass return ### CALLABLES def perform_request_search(req=None, cc=CFG_SITE_NAME, c=None, p="", f="", rg=CFG_WEBSEARCH_DEF_RECORDS_IN_GROUPS, sf="", so="d", sp="", rm="", of="id", ot="", aas=0, p1="", f1="", m1="", op1="", p2="", f2="", m2="", op2="", p3="", f3="", m3="", sc=0, jrec=0, recid=-1, recidb=-1, sysno="", id=-1, idb=-1, sysnb="", action="", d1="", d1y=0, d1m=0, d1d=0, d2="", d2y=0, d2m=0, d2d=0, dt="", verbose=0, ap=0, ln=CFG_SITE_LANG, ec=None, tab=""): """Perform search or browse request, without checking for authentication. Return list of recIDs found, if of=id. Otherwise create web page. The arguments are as follows: req - mod_python Request class instance. cc - current collection (e.g. "ATLAS"). The collection the user started to search/browse from. c - collection list (e.g. ["Theses", "Books"]). The collections user may have selected/deselected when starting to search from 'cc'. p - pattern to search for (e.g. "ellis and muon or kaon"). f - field to search within (e.g. "author"). rg - records in groups of (e.g. "10"). Defines how many hits per collection in the search results page are displayed. sf - sort field (e.g. "title"). so - sort order ("a"=ascending, "d"=descending). sp - sort pattern (e.g. "CERN-") -- in case there are more values in a sort field, this argument tells which one to prefer rm - ranking method (e.g. "jif"). Defines whether results should be ranked by some known ranking method. of - output format (e.g. "hb"). Usually starting "h" means HTML output (and "hb" for HTML brief, "hd" for HTML detailed), "x" means XML output, "t" means plain text output, "id" means no output at all but to return list of recIDs found. (Suitable for high-level API.) ot - output only these MARC tags (e.g. "100,700,909C0b"). Useful if only some fields are to be shown in the output, e.g. for library to control some fields. aas - advanced search ("0" means no, "1" means yes). Whether search was called from within the advanced search interface. p1 - first pattern to search for in the advanced search interface. Much like 'p'. f1 - first field to search within in the advanced search interface. Much like 'f'. m1 - first matching type in the advanced search interface. ("a" all of the words, "o" any of the words, "e" exact phrase, "p" partial phrase, "r" regular expression). op1 - first operator, to join the first and the second unit in the advanced search interface. ("a" add, "o" or, "n" not). p2 - second pattern to search for in the advanced search interface. Much like 'p'. f2 - second field to search within in the advanced search interface. Much like 'f'. m2 - second matching type in the advanced search interface. ("a" all of the words, "o" any of the words, "e" exact phrase, "p" partial phrase, "r" regular expression). op2 - second operator, to join the second and the third unit in the advanced search interface. ("a" add, "o" or, "n" not). p3 - third pattern to search for in the advanced search interface. Much like 'p'. f3 - third field to search within in the advanced search interface. Much like 'f'. m3 - third matching type in the advanced search interface. ("a" all of the words, "o" any of the words, "e" exact phrase, "p" partial phrase, "r" regular expression). sc - split by collection ("0" no, "1" yes). Governs whether we want to present the results in a single huge list, or splitted by collection. jrec - jump to record (e.g. "234"). Used for navigation inside the search results. recid - display record ID (e.g. "20000"). Do not search/browse but go straight away to the Detailed record page for the given recID. recidb - display record ID bis (e.g. "20010"). If greater than 'recid', then display records from recid to recidb. Useful for example for dumping records from the database for reformatting. sysno - display old system SYS number (e.g. ""). If you migrate to CDS Invenio from another system, and store your old SYS call numbers, you can use them instead of recid if you wish so. id - the same as recid, in case recid is not set. For backwards compatibility. idb - the same as recid, in case recidb is not set. For backwards compatibility. sysnb - the same as sysno, in case sysno is not set. For backwards compatibility. action - action to do. "SEARCH" for searching, "Browse" for browsing. Default is to search. d1 - first datetime in full YYYY-mm-dd HH:MM:DD format (e.g. "1998-08-23 12:34:56"). Useful for search limits on creation/modification date (see 'dt' argument below). Note that 'd1' takes precedence over d1y, d1m, d1d if these are defined. d1y - first date's year (e.g. "1998"). Useful for search limits on creation/modification date. d1m - first date's month (e.g. "08"). Useful for search limits on creation/modification date. d1d - first date's day (e.g. "23"). Useful for search limits on creation/modification date. d2 - second datetime in full YYYY-mm-dd HH:MM:DD format (e.g. "1998-09-02 12:34:56"). Useful for search limits on creation/modification date (see 'dt' argument below). Note that 'd2' takes precedence over d2y, d2m, d2d if these are defined. d2y - second date's year (e.g. "1998"). Useful for search limits on creation/modification date. d2m - second date's month (e.g. "09"). Useful for search limits on creation/modification date. d2d - second date's day (e.g. "02"). Useful for search limits on creation/modification date. dt - first and second date's type (e.g. "c"). Specifies whether to search in creation dates ("c") or in modification dates ("m"). When dt is not set and d1* and d2* are set, the default is "c". verbose - verbose level (0=min, 9=max). Useful to print some internal information on the searching process in case something goes wrong. ap - alternative patterns (0=no, 1=yes). In case no exact match is found, the search engine can try alternative patterns e.g. to replace non-alphanumeric characters by a boolean query. ap defines if this is wanted. ln - language of the search interface (e.g. "en"). Useful for internationalization. ec - list of external search engines to search as well (e.g. "SPIRES HEP"). """ selected_external_collections_infos = None # wash output format: of = wash_output_format(of) # raise an exception when trying to print out html from the cli if of.startswith("h"): assert req # for every search engine request asking for an HTML output, we # first regenerate cache of collection and field I18N names if # needed; so that later we won't bother checking timestamps for # I18N names at all: if of.startswith("h"): collection_i18nname_cache.recreate_cache_if_needed() field_i18nname_cache.recreate_cache_if_needed() # wash all arguments requiring special care try: (cc, colls_to_display, colls_to_search, hosted_colls, wash_colls_debug) = wash_colls(cc, c, sc, verbose) # which colls to search and to display? except InvenioWebSearchUnknownCollectionError, exc: colname = exc.colname if of.startswith("h"): page_start(req, of, cc, aas, ln, getUid(req), websearch_templates.tmpl_collection_not_found_page_title(colname, ln)) req.write(websearch_templates.tmpl_collection_not_found_page_body(colname, ln)) return page_end(req, of, ln) elif of == "id": return [] elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) return page_end(req, of, ln) else: return page_end(req, of, ln) p = wash_pattern(p) f = wash_field(f) p1 = wash_pattern(p1) f1 = wash_field(f1) p2 = wash_pattern(p2) f2 = wash_field(f2) p3 = wash_pattern(p3) f3 = wash_field(f3) datetext1, datetext2 = wash_dates(d1, d1y, d1m, d1d, d2, d2y, d2m, d2d) # wash ranking method: if not is_method_valid(None, rm): rm = "" _ = gettext_set_language(ln) # backwards compatibility: id, idb, sysnb -> recid, recidb, sysno (if applicable) if sysnb != "" and sysno == "": sysno = sysnb if id > 0 and recid == -1: recid = id if idb > 0 and recidb == -1: recidb = idb # TODO deduce passed search limiting criterias (if applicable) pl, pl_in_url = "", "" # no limits by default if action != "browse" and req and not isinstance(req, cStringIO.OutputType) \ and req.args: # we do not want to add options while browsing or while calling via command-line fieldargs = cgi.parse_qs(req.args) for fieldcode in get_fieldcodes(): if fieldargs.has_key(fieldcode): for val in fieldargs[fieldcode]: pl += "+%s:\"%s\" " % (fieldcode, val) pl_in_url += "&%s=%s" % (urllib.quote(fieldcode), urllib.quote(val)) # deduce recid from sysno argument (if applicable): if sysno: # ALEPH SYS number was passed, so deduce DB recID for the record: recid = get_mysql_recid_from_aleph_sysno(sysno) if recid is None: recid = 0 # use recid 0 to indicate that this sysno does not exist # deduce collection we are in (if applicable): if recid > 0: referer = None if req: referer = req.headers_in.get('Referer') cc = guess_collection_of_a_record(recid, referer) # deduce user id (if applicable): try: uid = getUid(req) except: uid = 0 ## 0 - start output if recid >= 0: # recid can be 0 if deduced from sysno and if such sysno does not exist ## 1 - detailed record display title, description, keywords = \ websearch_templates.tmpl_record_page_header_content(req, recid, ln) if req is not None and not req.header_only: page_start(req, of, cc, aas, ln, uid, title, description, keywords, recid, tab) # Default format is hb but we are in detailed -> change 'of' if of == "hb": of = "hd" if record_exists(recid): if recidb <= recid: # sanity check recidb = recid + 1 if of == "id": return [recidx for recidx in range(recid, recidb) if record_exists(recidx)] else: print_records(req, range(recid, recidb), -1, -9999, of, ot, ln, search_pattern=p, verbose=verbose, tab=tab) if req and of.startswith("h"): # register detailed record page view event client_ip_address = str(req.remote_ip) register_page_view_event(recid, uid, client_ip_address) else: # record does not exist if of == "id": return [] elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) elif of.startswith("h"): if req.header_only: raise apache.SERVER_RETURN, apache.HTTP_NOT_FOUND else: print_warning(req, _("Requested record does not seem to exist.")) elif action == "browse": ## 2 - browse needed of = 'hb' page_start(req, of, cc, aas, ln, uid, _("Browse"), p=create_page_title_search_pattern_info(p, p1, p2, p3)) req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, aas, ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, dt, jrec, ec, action)) try: if aas == 1 or (p1 or p2 or p3): browse_pattern(req, colls_to_search, p1, f1, rg, ln) browse_pattern(req, colls_to_search, p2, f2, rg, ln) browse_pattern(req, colls_to_search, p3, f3, rg, ln) else: browse_pattern(req, colls_to_search, p, f, rg, ln) except: register_exception(req=req, alert_admin=True) if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) return page_end(req, of, ln) elif rm and p.startswith("recid:"): ## 3-ter - similarity search or citation search needed if req and not req.header_only: page_start(req, of, cc, aas, ln, uid, _("Search Results"), p=create_page_title_search_pattern_info(p, p1, p2, p3)) if of.startswith("h"): req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, aas, ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, dt, jrec, ec, action)) if record_exists(p[6:]) != 1: # record does not exist if of.startswith("h"): if req.header_only: raise apache.SERVER_RETURN, apache.HTTP_NOT_FOUND else: - print_warning(req, "Requested record does not seem to exist.") + print_warning(req, _("Requested record does not seem to exist.")) if of == "id": return [] elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) else: # record well exists, so find similar ones to it t1 = os.times()[4] results_similar_recIDs, results_similar_relevances, results_similar_relevances_prologue, results_similar_relevances_epilogue, results_similar_comments = \ rank_records(rm, 0, get_collection_reclist(cc), string.split(p), verbose) if results_similar_recIDs: t2 = os.times()[4] cpu_time = t2 - t1 if of.startswith("h"): req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, cc, len(results_similar_recIDs), jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time)) print_warning(req, results_similar_comments) print_records(req, results_similar_recIDs, jrec, rg, of, ot, ln, results_similar_relevances, results_similar_relevances_prologue, results_similar_relevances_epilogue, search_pattern=p, verbose=verbose) elif of=="id": return results_similar_recIDs elif of.startswith("x"): print_records(req, results_similar_recIDs, jrec, rg, of, ot, ln, results_similar_relevances, results_similar_relevances_prologue, results_similar_relevances_epilogue, search_pattern=p, verbose=verbose) else: # rank_records failed and returned some error message to display: if of.startswith("h"): print_warning(req, results_similar_relevances_prologue) print_warning(req, results_similar_relevances_epilogue) print_warning(req, results_similar_comments) if of == "id": return [] elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) elif p.startswith("cocitedwith:"): #WAS EXPERIMENTAL ## 3-terter - cited by search needed page_start(req, of, cc, aas, ln, uid, _("Search Results"), p=create_page_title_search_pattern_info(p, p1, p2, p3)) if of.startswith("h"): req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, aas, ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, dt, jrec, ec, action)) recID = p[12:] if record_exists(recID) != 1: # record does not exist if of.startswith("h"): - print_warning(req, "Requested record does not seem to exist.") + print_warning(req, _("Requested record does not seem to exist.")) if of == "id": return [] elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) else: # record well exists, so find co-cited ones: t1 = os.times()[4] results_cocited_recIDs = map(lambda x: x[0], calculate_co_cited_with_list(int(recID))) if results_cocited_recIDs: t2 = os.times()[4] cpu_time = t2 - t1 if of.startswith("h"): req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, CFG_SITE_NAME, len(results_cocited_recIDs), jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time)) print_records(req, results_cocited_recIDs, jrec, rg, of, ot, ln, search_pattern=p, verbose=verbose) elif of=="id": return results_cocited_recIDs elif of.startswith("x"): print_records(req, results_cocited_recIDs, jrec, rg, of, ot, ln, search_pattern=p, verbose=verbose) else: # cited rank_records failed and returned some error message to display: if of.startswith("h"): print_warning(req, "nothing found") if of == "id": return [] elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) else: ## 3 - common search needed query_in_cache = False query_representation_in_cache = repr((p,f,colls_to_search)) page_start(req, of, cc, aas, ln, uid, p=create_page_title_search_pattern_info(p, p1, p2, p3)) if of.startswith("h") and verbose and wash_colls_debug: print_warning(req, "wash_colls debugging info : %s" % wash_colls_debug) # search into the hosted collections only if the output format is html or xml if hosted_colls and (of.startswith("h") or of.startswith("x")) and not p.startswith("recid:"): # hosted_colls_results : the hosted collections' searches that did not timeout # hosted_colls_timeouts : the hosted collections' searches that timed out and will be searched later on again (hosted_colls_results, hosted_colls_timeouts) = calculate_hosted_collections_results(req, [p, p1, p2, p3], f, hosted_colls, verbose, ln, CFG_HOSTED_COLLECTION_TIMEOUT_ANTE_SEARCH) # successful searches if hosted_colls_results: hosted_colls_true_results = [] for result in hosted_colls_results: # if the number of results is None or 0 (or False) then just do nothing if result[1] == None or result[1] == False: # these are the searches the returned no or zero results if verbose: print_warning(req, "Hosted collections (perform_search_request): %s returned no results" % result[0][1].name) else: # these are the searches that actually returned results on time hosted_colls_true_results.append(result) if verbose: print_warning(req, "Hosted collections (perform_search_request): %s returned %s results in %s seconds" % (result[0][1].name, result[1], result[2])) else: if verbose: print_warning(req, "Hosted collections (perform_search_request): there were no hosted collections results to be printed at this time") if hosted_colls_timeouts: if verbose: for timeout in hosted_colls_timeouts: print_warning(req, "Hosted collections (perform_search_request): %s timed out and will be searched again later" % timeout[0][1].name) # we need to know for later use if there were any hosted collections to be searched even if they weren't in the end elif hosted_colls and ((not (of.startswith("h") or of.startswith("x"))) or p.startswith("recid:")): (hosted_colls_results, hosted_colls_timeouts) = (None, None) else: if verbose: print_warning(req, "Hosted collections (perform_search_request): there were no hosted collections to be searched") ## let's define some useful boolean variables: # True means there are actual or potential hosted collections results to be printed hosted_colls_actual_or_potential_results_p = not (not hosted_colls or not ((hosted_colls_results and hosted_colls_true_results) or hosted_colls_timeouts)) # True means there are hosted collections timeouts to take care of later # (useful for more accurate printing of results later) hosted_colls_potential_results_p = not (not hosted_colls or not hosted_colls_timeouts) # True means we only have hosted collections to deal with only_hosted_colls_actual_or_potential_results_p = not colls_to_search and hosted_colls_actual_or_potential_results_p if of.startswith("h"): req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, aas, ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, dt, jrec, ec, action)) t1 = os.times()[4] results_in_any_collection = HitSet() if aas == 1 or (p1 or p2 or p3): ## 3A - advanced search try: results_in_any_collection = search_pattern_parenthesised(req, p1, f1, m1, ap=ap, of=of, verbose=verbose, ln=ln) if len(results_in_any_collection) == 0: if of.startswith("h"): perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) return page_end(req, of, ln) if p2: results_tmp = search_pattern_parenthesised(req, p2, f2, m2, ap=ap, of=of, verbose=verbose, ln=ln) if op1 == "a": # add results_in_any_collection.intersection_update(results_tmp) elif op1 == "o": # or results_in_any_collection.union_update(results_tmp) elif op1 == "n": # not results_in_any_collection.difference_update(results_tmp) else: if of.startswith("h"): print_warning(req, "Invalid set operation %s." % cgi.escape(op1), "Error") if len(results_in_any_collection) == 0: if of.startswith("h"): perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) return page_end(req, of, ln) if p3: results_tmp = search_pattern_parenthesised(req, p3, f3, m3, ap=ap, of=of, verbose=verbose, ln=ln) if op2 == "a": # add results_in_any_collection.intersection_update(results_tmp) elif op2 == "o": # or results_in_any_collection.union_update(results_tmp) elif op2 == "n": # not results_in_any_collection.difference_update(results_tmp) else: if of.startswith("h"): print_warning(req, "Invalid set operation %s." % cgi.escape(op2), "Error") except: register_exception(req=req, alert_admin=True) if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) return page_end(req, of, ln) else: ## 3B - simple search if search_results_cache.cache.has_key(query_representation_in_cache): # query is not in the cache already, so reuse it: query_in_cache = True results_in_any_collection = search_results_cache.cache[query_representation_in_cache] if verbose and of.startswith("h"): print_warning(req, "Search stage 0: query found in cache, reusing cached results.") else: try: # added the display_nearest_terms_box parameter to avoid printing out the "Nearest terms in any collection" # recommendations when there are results only in the hosted collections. Also added the if clause to avoid # searching in case we know we only have actual or potential hosted collections results if not only_hosted_colls_actual_or_potential_results_p: results_in_any_collection = search_pattern_parenthesised(req, p, f, ap=ap, of=of, verbose=verbose, ln=ln, display_nearest_terms_box=not hosted_colls_actual_or_potential_results_p) except: register_exception(req=req, alert_admin=True) if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) return page_end(req, of, ln) if len(results_in_any_collection) == 0 and not hosted_colls_actual_or_potential_results_p: if of.startswith("h"): perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) return page_end(req, of, ln) # store this search query results into search results cache if needed: if CFG_WEBSEARCH_SEARCH_CACHE_SIZE and not query_in_cache: if len(search_results_cache.cache) > CFG_WEBSEARCH_SEARCH_CACHE_SIZE: search_results_cache.clear() search_results_cache.cache[query_representation_in_cache] = results_in_any_collection if verbose and of.startswith("h"): print_warning(req, "Search stage 3: storing query results in cache.") # search stage 4: intersection with collection universe: try: # added the display_nearest_terms_box parameter to avoid printing out the "Nearest terms in any collection" # recommendations when there results only in the hosted collections. Also added the if clause to avoid # searching in case we know since the last stage that we have no results in any collection if len(results_in_any_collection) != 0: results_final = intersect_results_with_collrecs(req, results_in_any_collection, colls_to_search, ap, of, verbose, ln, display_nearest_terms_box=not hosted_colls_actual_or_potential_results_p) else: results_final = {} except: register_exception(req=req, alert_admin=True) if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) return page_end(req, of, ln) if results_final == {} and not hosted_colls_actual_or_potential_results_p: if of.startswith("h"): perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) if of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) return page_end(req, of, ln) # search stage 5: apply search option limits and restrictions: if datetext1 != "" and results_final != {}: if verbose and of.startswith("h"): print_warning(req, "Search stage 5: applying time etc limits, from %s until %s..." % (datetext1, datetext2)) try: results_final = intersect_results_with_hitset(req, results_final, search_unit_in_bibrec(datetext1, datetext2, dt), ap, aptext= _("No match within your time limits, " "discarding this condition..."), of=of) except: register_exception(req=req, alert_admin=True) if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) return page_end(req, of, ln) if results_final == {} and not hosted_colls_actual_or_potential_results_p: if of.startswith("h"): perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) #if of.startswith("x"): # # Print empty, but valid XML # print_records_prologue(req, of) # print_records_epilogue(req, of) return page_end(req, of, ln) if pl and results_final != {}: pl = wash_pattern(pl) if verbose and of.startswith("h"): print_warning(req, "Search stage 5: applying search pattern limit %s..." % cgi.escape(pl)) try: results_final = intersect_results_with_hitset(req, results_final, search_pattern_parenthesised(req, pl, ap=0, ln=ln), ap, aptext=_("No match within your search limits, " "discarding this condition..."), of=of) except: register_exception(req=req, alert_admin=True) if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) return page_end(req, of, ln) if results_final == {} and not hosted_colls_actual_or_potential_results_p: if of.startswith("h"): perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) if of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) return page_end(req, of, ln) t2 = os.times()[4] cpu_time = t2 - t1 ## search stage 6: display results: results_final_nb_total = 0 results_final_nb = {} # will hold number of records found in each collection # (in simple dict to display overview more easily) for coll in results_final.keys(): results_final_nb[coll] = len(results_final[coll]) #results_final_nb_total += results_final_nb[coll] # Now let us calculate results_final_nb_total more precisely, # in order to get the total number of "distinct" hits across # searched collections; this is useful because a record might # have been attributed to more than one primary collection; so # we have to avoid counting it multiple times. The price to # pay for this accuracy of results_final_nb_total is somewhat # increased CPU time. if results_final.keys() == 1: # only one collection; no need to union them results_final_for_all_selected_colls = results_final.values()[0] results_final_nb_total = results_final_nb.values()[0] else: # okay, some work ahead to union hits across collections: results_final_for_all_selected_colls = HitSet() for coll in results_final.keys(): results_final_for_all_selected_colls.union_update(results_final[coll]) results_final_nb_total = len(results_final_for_all_selected_colls) #if hosted_colls and (of.startswith("h") or of.startswith("x")): if hosted_colls_actual_or_potential_results_p: if hosted_colls_results: for result in hosted_colls_true_results: colls_to_search.append(result[0][1].name) results_final_nb[result[0][1].name] = result[1] results_final_nb_total += result[1] cpu_time += result[2] if hosted_colls_timeouts: for timeout in hosted_colls_timeouts: colls_to_search.append(timeout[1].name) # use -963 as a special number to identify the collections that timed out results_final_nb[timeout[1].name] = -963 # we continue past this point only if there is a hosted collection that has timed out and might offer potential results if results_final_nb_total ==0 and not hosted_colls_potential_results_p: if of.startswith("h"): print_warning(req, "No match found, please enter different search terms.") elif of.startswith("x"): # Print empty, but valid XML print_records_prologue(req, of) print_records_epilogue(req, of) else: # yes, some hits found: good! # collection list may have changed due to not-exact-match-found policy so check it out: for coll in results_final.keys(): if coll not in colls_to_search: colls_to_search.append(coll) # print results overview: if of == "id": # we have been asked to return list of recIDs recIDs = list(results_final_for_all_selected_colls) if sf: # do we have to sort? recIDs = sort_records(req, recIDs, sf, so, sp, verbose, of) elif rm: # do we have to rank? results_final_for_all_colls_rank_records_output = rank_records(rm, 0, results_final_for_all_selected_colls, string.split(p) + string.split(p1) + string.split(p2) + string.split(p3), verbose) if results_final_for_all_colls_rank_records_output[0]: recIDs = results_final_for_all_colls_rank_records_output[0] return recIDs elif of.startswith("h"): if of not in ['hcs']: # added the hosted_colls_potential_results_p parameter to help print out the overview more accurately req.write(print_results_overview(req, colls_to_search, results_final_nb_total, results_final_nb, cpu_time, ln, ec, hosted_colls_potential_results_p=hosted_colls_potential_results_p)) selected_external_collections_infos = print_external_results_overview(req, cc, [p, p1, p2, p3], f, ec, verbose, ln) # print number of hits found for XML outputs: if of.startswith("x"): req.write("\n" % results_final_nb_total) # print records: if of in ['hcs']: # feed the current search to be summarized: from invenio.search_engine_summarizer import summarize_records summarize_records(results_final_for_all_selected_colls, 'hcs', ln, p, f, req) else: if len(colls_to_search)>1: cpu_time = -1 # we do not want to have search time printed on each collection print_records_prologue(req, of) for coll in colls_to_search: if results_final.has_key(coll) and len(results_final[coll]): if of.startswith("h"): req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, coll, results_final_nb[coll], jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time)) results_final_recIDs = list(results_final[coll]) results_final_relevances = [] results_final_relevances_prologue = "" results_final_relevances_epilogue = "" if sf: # do we have to sort? results_final_recIDs = sort_records(req, results_final_recIDs, sf, so, sp, verbose, of) elif rm: # do we have to rank? results_final_recIDs_ranked, results_final_relevances, results_final_relevances_prologue, results_final_relevances_epilogue, results_final_comments = \ rank_records(rm, 0, results_final[coll], string.split(p) + string.split(p1) + string.split(p2) + string.split(p3), verbose) if of.startswith("h"): print_warning(req, results_final_comments) if results_final_recIDs_ranked: results_final_recIDs = results_final_recIDs_ranked else: # rank_records failed and returned some error message to display: print_warning(req, results_final_relevances_prologue) print_warning(req, results_final_relevances_epilogue) print_records(req, results_final_recIDs, jrec, rg, of, ot, ln, results_final_relevances, results_final_relevances_prologue, results_final_relevances_epilogue, search_pattern=p, print_records_prologue_p=False, print_records_epilogue_p=False, verbose=verbose) if of.startswith("h"): req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, coll, results_final_nb[coll], jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time, 1)) #if hosted_colls and (of.startswith("h") or of.startswith("x")): if hosted_colls_actual_or_potential_results_p: if hosted_colls_results: # TODO: add a verbose message here for result in hosted_colls_true_results: if of.startswith("h"): req.write(print_hosted_search_info(p, f, sf, so, sp, rm, of, ot, result[0][1].name, results_final_nb[result[0][1].name], jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time)) req.write(print_hosted_results(url_and_engine=result[0], ln=ln, of=of, req=req, limit=rg)) if of.startswith("h"): req.write(print_hosted_search_info(p, f, sf, so, sp, rm, of, ot, result[0][1].name, results_final_nb[result[0][1].name], jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time, 1)) if hosted_colls_timeouts: # TODO: add a verbose message here # TODO: check if verbose messages still work when dealing with (re)calculations of timeouts (hosted_colls_timeouts_results, hosted_colls_timeouts_timeouts) = do_calculate_hosted_collections_results(req, ln, None, verbose, None, hosted_colls_timeouts, CFG_HOSTED_COLLECTION_TIMEOUT_POST_SEARCH) if hosted_colls_timeouts_results: hosted_colls_timeouts_true_results = [] for result in hosted_colls_timeouts_results: if result[1] == None or result[1] == False: ## these are the searches the returned no or zero results ## also print a nearest terms box, in case this is the only ## collection being searched and it returns no results? if of.startswith("h"): req.write(print_hosted_search_info(p, f, sf, so, sp, rm, of, ot, result[0][1].name, -963, jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time)) req.write(print_hosted_results(url_and_engine=result[0], ln=ln, of=of, req=req, no_records_found=True, limit=rg)) req.write(print_hosted_search_info(p, f, sf, so, sp, rm, of, ot, result[0][1].name, -963, jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time, 1)) else: # these are the searches that actually returned results on time if of.startswith("h"): req.write(print_hosted_search_info(p, f, sf, so, sp, rm, of, ot, result[0][1].name, result[1], jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time)) req.write(print_hosted_results(url_and_engine=result[0], ln=ln, of=of, req=req, limit=rg)) if of.startswith("h"): req.write(print_hosted_search_info(p, f, sf, so, sp, rm, of, ot, result[0][1].name, result[1], jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time, 1)) if hosted_colls_timeouts_timeouts: for timeout in hosted_colls_timeouts_timeouts: if of.startswith("h"): req.write(print_hosted_search_info(p, f, sf, so, sp, rm, of, ot, timeout[1].name, -963, jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time)) req.write(print_hosted_results(url_and_engine=timeout[0], ln=ln, of=of, req=req, search_timed_out=True, limit=rg)) req.write(print_hosted_search_info(p, f, sf, so, sp, rm, of, ot, timeout[1].name, -963, jrec, rg, aas, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time, 1)) print_records_epilogue(req, of) if f == "author" and of.startswith("h"): req.write(create_similarly_named_authors_link_box(p, ln)) # log query: try: id_query = log_query(req.remote_host, req.args, uid) if of.startswith("h") and id_query: if not of in ['hcs']: # display alert/RSS teaser for non-summary formats: user_info = collect_user_info(req) display_email_alert_part = True if user_info: if user_info['email'] == 'guest': if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS > 4: display_email_alert_part = False else: if not user_info['precached_usealerts']: display_email_alert_part = False req.write(websearch_templates.tmpl_alert_rss_teaser_box_for_query(id_query, \ ln=ln, display_email_alert_part=display_email_alert_part)) except: # do not log query if req is None (used by CLI interface) pass log_query_info("ss", p, f, colls_to_search, results_final_nb_total) # External searches if of.startswith("h"): if not of in ['hcs']: perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos) return page_end(req, of, ln) def perform_request_cache(req, action="show"): """Manipulates the search engine cache.""" req.content_type = "text/html" req.send_http_header() req.write("") out = "" out += "

    Search Cache

    " # clear cache if requested: if action == "clear": search_results_cache.clear() req.write(out) # show collection reclist cache: out = "

    Collection reclist cache

    " out += "- collection table last updated: %s" % get_table_update_time('collection') out += "
    - reclist cache timestamp: %s" % collection_reclist_cache.timestamp out += "
    - reclist cache contents:" out += "
    " for coll in collection_reclist_cache.cache.keys(): if collection_reclist_cache.cache[coll]: out += "%s (%d)
    " % (coll, len(collection_reclist_cache.cache[coll])) out += "
    " req.write(out) # show search results cache: out = "

    Search Cache

    " out += "- search cache usage: %d queries cached (max. ~%d)" % \ (len(search_results_cache.cache), CFG_WEBSEARCH_SEARCH_CACHE_SIZE) if len(search_results_cache.cache): out += "
    - search cache contents:" out += "
    " for query, hitset in search_results_cache.cache.items(): out += "
    %s ... %s" % (query, hitset) out += """

    clear search results cache""" % CFG_SITE_URL out += "

    " req.write(out) # show field i18nname cache: out = "

    Field I18N names cache

    " out += "- fieldname table last updated: %s" % get_table_update_time('fieldname') out += "
    - i18nname cache timestamp: %s" % field_i18nname_cache.timestamp out += "
    - i18nname cache contents:" out += "
    " for field in field_i18nname_cache.cache.keys(): for ln in field_i18nname_cache.cache[field].keys(): out += "%s, %s = %s
    " % (field, ln, field_i18nname_cache.cache[field][ln]) out += "
    " req.write(out) # show collection i18nname cache: out = "

    Collection I18N names cache

    " out += "- collectionname table last updated: %s" % get_table_update_time('collectionname') out += "
    - i18nname cache timestamp: %s" % collection_i18nname_cache.timestamp out += "
    - i18nname cache contents:" out += "
    " for coll in collection_i18nname_cache.cache.keys(): for ln in collection_i18nname_cache.cache[coll].keys(): out += "%s, %s = %s
    " % (coll, ln, collection_i18nname_cache.cache[coll][ln]) out += "
    " req.write(out) req.write("") return "\n" def perform_request_log(req, date=""): """Display search log information for given date.""" req.content_type = "text/html" req.send_http_header() req.write("") req.write("

    Search Log

    ") if date: # case A: display stats for a day yyyymmdd = string.atoi(date) req.write("

    Date: %d

    " % yyyymmdd) req.write("""""") req.write("" % ("No.", "Time", "Pattern", "Field", "Collection", "Number of Hits")) # read file: p = os.popen("grep ^%d %s/search.log" % (yyyymmdd, CFG_LOGDIR), 'r') lines = p.readlines() p.close() # process lines: i = 0 for line in lines: try: datetime, aas, p, f, c, nbhits = string.split(line,"#") i += 1 req.write("" \ % (i, datetime[8:10], datetime[10:12], datetime[12:], p, f, c, nbhits)) except: pass # ignore eventual wrong log lines req.write("
    %s%s%s%s%s%s
    #%d%s:%s:%s%s%s%s%s
    ") else: # case B: display summary stats per day yyyymm01 = int(time.strftime("%Y%m01", time.localtime())) yyyymmdd = int(time.strftime("%Y%m%d", time.localtime())) req.write("""""") req.write("" % ("Day", "Number of Queries")) for day in range(yyyymm01, yyyymmdd + 1): p = os.popen("grep -c ^%d %s/search.log" % (day, CFG_LOGDIR), 'r') for line in p.readlines(): req.write("""""" % \ (day, CFG_SITE_URL, day, line)) p.close() req.write("
    %s%s
    %s%s
    ") req.write("") return "\n" def get_most_popular_field_values(recids, tags, exclude_values=None, count_repetitive_values=True): """ Analyze RECIDS and look for TAGS and return most popular values and the frequency with which they occur sorted according to descending frequency. If a value is found in EXCLUDE_VALUES, then do not count it. If COUNT_REPETITIVE_VALUES is True, then we count every occurrence of value in the tags. If False, then we count the value only once regardless of the number of times it may appear in a record. (But, if the same value occurs in another record, we count it, of course.) Example: >>> get_most_popular_field_values(range(11,20), '980__a') (('PREPRINT', 10), ('THESIS', 7), ...) >>> get_most_popular_field_values(range(11,20), ('100__a', '700__a')) (('Ellis, J', 10), ('Ellis, N', 7), ...) >>> get_most_popular_field_values(range(11,20), ('100__a', '700__a'), ('Ellis, J')) (('Ellis, N', 7), ...) """ def _get_most_popular_field_values_helper_sorter(val1, val2): "Compare VAL1 and VAL2 according to, firstly, frequency, then secondly, alphabetically." compared_via_frequencies = cmp(valuefreqdict[val2], valuefreqdict[val1]) if compared_via_frequencies == 0: return cmp(val1.lower(), val2.lower()) else: return compared_via_frequencies valuefreqdict = {} ## sanity check: if not exclude_values: exclude_values = [] if isinstance(tags, str): tags = (tags,) ## find values to count: vals_to_count = [] displaytmp = {} if count_repetitive_values: # counting technique A: can look up many records at once: (very fast) for tag in tags: vals_to_count.extend(get_fieldvalues(recids, tag)) else: # counting technique B: must count record-by-record: (slow) for recid in recids: vals_in_rec = [] for tag in tags: for val in get_fieldvalues(recid, tag, False): vals_in_rec.append(val) # do not count repetitive values within this record # (even across various tags, so need to unify again): dtmp = {} for val in vals_in_rec: dtmp[val.lower()] = 1 displaytmp[val.lower()] = val vals_in_rec = dtmp.keys() vals_to_count.extend(vals_in_rec) ## are we to exclude some of found values? for val in vals_to_count: if val not in exclude_values: if valuefreqdict.has_key(val): valuefreqdict[val] += 1 else: valuefreqdict[val] = 1 ## sort by descending frequency of values: out = () vals = valuefreqdict.keys() vals.sort(_get_most_popular_field_values_helper_sorter) for val in vals: tmpdisplv = '' if displaytmp.has_key(val): tmpdisplv = displaytmp[val] else: tmpdisplv = val out += (tmpdisplv, valuefreqdict[val]), return out def profile(p="", f="", c=CFG_SITE_NAME): """Profile search time.""" import profile import pstats profile.run("perform_request_search(p='%s',f='%s', c='%s')" % (p, f, c), "perform_request_search_profile") p = pstats.Stats("perform_request_search_profile") p.strip_dirs().sort_stats("cumulative").print_stats() return 0 ## test cases: #print wash_colls(CFG_SITE_NAME,"Library Catalogue", 0) #print wash_colls("Periodicals & Progress Reports",["Periodicals","Progress Reports"], 0) #print wash_field("wau") #print print_record(20,"tm","001,245") #print create_opft_search_units(None, "PHE-87-13","reportnumber") #print ":"+wash_pattern("* and % doo * %")+":\n" #print ":"+wash_pattern("*")+":\n" #print ":"+wash_pattern("ellis* ell* e*%")+":\n" #print run_sql("SELECT name,dbquery from collection") #print get_index_id("author") #print get_coll_ancestors("Theses") #print get_coll_sons("Articles & Preprints") #print get_coll_real_descendants("Articles & Preprints") #print get_collection_reclist("Theses") #print log(sys.stdin) #print search_unit_in_bibrec('2002-12-01','2002-12-12') #print get_nearest_terms_in_bibxxx("ellis", "author", 5, 5) #print call_bibformat(68, "HB_FLY") #print get_fieldvalues(10, "980__a") #print get_fieldvalues_alephseq_like(10,"001___") #print get_fieldvalues_alephseq_like(10,"980__a") #print get_fieldvalues_alephseq_like(10,"foo") #print get_fieldvalues_alephseq_like(10,"-1") #print get_fieldvalues_alephseq_like(10,"99") #print get_fieldvalues_alephseq_like(10,["001", "980"]) ## profiling: #profile("of the this") #print perform_request_search(p="ellis") diff --git a/modules/websearch/lib/websearch_regression_tests.py b/modules/websearch/lib/websearch_regression_tests.py index 2b870d8d7..bde8ab399 100644 --- a/modules/websearch/lib/websearch_regression_tests.py +++ b/modules/websearch/lib/websearch_regression_tests.py @@ -1,1583 +1,1582 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. # pylint: disable-msg=C0301 # pylint: disable-msg=E1102 """WebSearch module regression tests.""" __revision__ = "$Id$" import unittest import re import urlparse, cgi import sys if sys.hexversion < 0x2040000: # pylint: disable-msg=W0622 from sets import Set as set # pylint: enable-msg=W0622 from mechanize import Browser, LinkNotFoundError, HTTPError from invenio.config import CFG_SITE_URL, CFG_SITE_NAME, CFG_SITE_LANG from invenio.testutils import make_test_suite, \ run_test_suite, \ make_url, test_web_page_content, \ merge_error_messages from invenio.urlutils import same_urls_p from invenio.search_engine import perform_request_search, \ guess_primary_collection_of_a_record, guess_collection_of_a_record, \ collection_restricted_p, get_permitted_restricted_collections, \ get_fieldvalues def parse_url(url): parts = urlparse.urlparse(url) query = cgi.parse_qs(parts[4], True) return parts[2].split('/')[1:], query class WebSearchWebPagesAvailabilityTest(unittest.TestCase): """Check WebSearch web pages whether they are up or not.""" def test_search_interface_pages_availability(self): """websearch - availability of search interface pages""" baseurl = CFG_SITE_URL + '/' _exports = ['', 'collection/Poetry', 'collection/Poetry?as=1'] error_messages = [] for url in [baseurl + page for page in _exports]: error_messages.extend(test_web_page_content(url)) if error_messages: self.fail(merge_error_messages(error_messages)) return def test_search_results_pages_availability(self): """websearch - availability of search results pages""" baseurl = CFG_SITE_URL + '/search' _exports = ['', '?c=Poetry', '?p=ellis', '/cache', '/log'] error_messages = [] for url in [baseurl + page for page in _exports]: error_messages.extend(test_web_page_content(url)) if error_messages: self.fail(merge_error_messages(error_messages)) return def test_search_detailed_record_pages_availability(self): """websearch - availability of search detailed record pages""" baseurl = CFG_SITE_URL + '/record/' _exports = ['', '1', '1/', '1/files', '1/files/'] error_messages = [] for url in [baseurl + page for page in _exports]: error_messages.extend(test_web_page_content(url)) if error_messages: self.fail(merge_error_messages(error_messages)) return def test_browse_results_pages_availability(self): """websearch - availability of browse results pages""" baseurl = CFG_SITE_URL + '/search' _exports = ['?p=ellis&f=author&action_browse=Browse'] error_messages = [] for url in [baseurl + page for page in _exports]: error_messages.extend(test_web_page_content(url)) if error_messages: self.fail(merge_error_messages(error_messages)) return def test_help_page_availability(self): """websearch - availability of Help Central page""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/help', expected_text="Help Central")) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/help/?ln=fr', expected_text="Centre d'aide")) def test_search_tips_page_availability(self): """websearch - availability of Search Tips""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/help/search-tips', expected_text="Search Tips")) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/help/search-tips?ln=fr', expected_text="Conseils de recherche")) def test_search_guide_page_availability(self): """websearch - availability of Search Guide""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/help/search-guide', expected_text="Search Guide")) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/help/search-guide?ln=fr', expected_text="Guide de recherche")) class WebSearchTestLegacyURLs(unittest.TestCase): """ Check that the application still responds to legacy URLs for navigating, searching and browsing.""" def test_legacy_collections(self): """ websearch - collections handle legacy urls """ browser = Browser() def check(legacy, new, browser=browser): browser.open(legacy) got = browser.geturl() self.failUnless(same_urls_p(got, new), got) # Use the root URL unless we need more check(make_url('/', c=CFG_SITE_NAME), make_url('/', ln=CFG_SITE_LANG)) # Other collections are redirected in the /collection area check(make_url('/', c='Poetry'), make_url('/collection/Poetry', ln=CFG_SITE_LANG)) # Drop unnecessary arguments, like ln and as (when they are # the default value) args = {'as': 0} check(make_url('/', c='Poetry', **args), make_url('/collection/Poetry', ln=CFG_SITE_LANG)) # Otherwise, keep them args = {'as': 1, 'ln': CFG_SITE_LANG} check(make_url('/', c='Poetry', **args), make_url('/collection/Poetry', **args)) # Support the /index.py addressing too check(make_url('/index.py', c='Poetry'), make_url('/collection/Poetry', ln=CFG_SITE_LANG)) def test_legacy_search(self): """ websearch - search queries handle legacy urls """ browser = Browser() def check(legacy, new, browser=browser): browser.open(legacy) got = browser.geturl() self.failUnless(same_urls_p(got, new), got) # /search.py is redirected on /search # Note that `as' is a reserved word in Python 2.5 check(make_url('/search.py', p='nuclear', ln='en') + 'as=1', make_url('/search', p='nuclear', ln='en') + 'as=1') # direct recid searches are redirected to /record check(make_url('/search.py', recid=1, ln='es'), make_url('/record/1', ln='es')) def test_legacy_search_help_link(self): """websearch - legacy Search Help page link""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/help/search/index.en.html', expected_text="Help Central")) def test_legacy_search_tips_link(self): """websearch - legacy Search Tips page link""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/help/search/tips.fr.html', expected_text="Conseils de recherche")) def test_legacy_search_guide_link(self): """websearch - legacy Search Guide page link""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/help/search/guide.en.html', expected_text="Search Guide")) class WebSearchTestRecord(unittest.TestCase): """ Check the interface of the /record results """ def test_format_links(self): """ websearch - check format links for records """ browser = Browser() # We open the record in all known HTML formats for hformat in ('hd', 'hx', 'hm'): browser.open(make_url('/record/1', of=hformat)) if hformat == 'hd': # hd format should have a link to the following # formats for oformat in ('hx', 'hm', 'xm', 'xd'): target = make_url('/record/1/export/%s?ln=en' % oformat) try: browser.find_link(url=target) except LinkNotFoundError: self.fail('link %r should be in page' % target) else: # non-hd HTML formats should have a link back to # the main detailed record target = make_url('/record/1') try: browser.find_link(url=target) except LinkNotFoundError: self.fail('link %r should be in page' % target) return def test_exported_formats(self): """ websearch - check formats exported through /record/1/export/ URLs""" browser = Browser() self.assertEqual([], test_web_page_content(make_url('/record/1/export/hm'), expected_text='245__ $$aALEPH experiment')) self.assertEqual([], test_web_page_content(make_url('/record/1/export/hd'), expected_text='ALEPH experiment')) self.assertEqual([], test_web_page_content(make_url('/record/1/export/xm'), expected_text='ALEPH experiment')) self.assertEqual([], test_web_page_content(make_url('/record/1/export/xd'), expected_text='ALEPH experiment')) self.assertEqual([], test_web_page_content(make_url('/record/1/export/hs'), expected_text='ALEPH experiment' % \ CFG_SITE_LANG)) self.assertEqual([], test_web_page_content(make_url('/record/1/export/hx'), expected_text='title = "ALEPH experiment')) self.assertEqual([], test_web_page_content(make_url('/record/1/export/t?ot=245'), expected_text='245__ $$aALEPH experiment')) self.assertNotEqual([], test_web_page_content(make_url('/record/1/export/t?ot=245'), expected_text='001__')) self.assertEqual([], test_web_page_content(make_url('/record/1/export/h?ot=245'), expected_text='245__ $$aALEPH experiment')) self.assertNotEqual([], test_web_page_content(make_url('/record/1/export/h?ot=245'), expected_text='001__')) return class WebSearchTestCollections(unittest.TestCase): def test_traversal_links(self): """ websearch - traverse all the publications of a collection """ browser = Browser() try: for aas in (0, 1): args = {'as': aas} browser.open(make_url('/collection/Preprints', **args)) - for jrec in (11, 21, 11, 27): + for jrec in (11, 21, 11, 28): args = {'jrec': jrec, 'cc': 'Preprints'} if aas: args['as'] = aas url = make_url('/search', **args) try: browser.follow_link(url=url) except LinkNotFoundError: args['ln'] = CFG_SITE_LANG url = make_url('/search', **args) browser.follow_link(url=url) except LinkNotFoundError: self.fail('no link %r in %r' % (url, browser.geturl())) def test_collections_links(self): """ websearch - enter in collections and subcollections """ browser = Browser() def tryfollow(url): cur = browser.geturl() body = browser.response().read() try: browser.follow_link(url=url) except LinkNotFoundError: print body self.fail("in %r: could not find %r" % ( cur, url)) return for aas in (0, 1): if aas: kargs = {'as': 1} else: kargs = {} kargs['ln'] = CFG_SITE_LANG # We navigate from immediate son to immediate son... browser.open(make_url('/', **kargs)) tryfollow(make_url('/collection/Articles%20%26%20Preprints', **kargs)) tryfollow(make_url('/collection/Articles', **kargs)) # But we can also jump to a grandson immediately browser.back() browser.back() tryfollow(make_url('/collection/ALEPH', **kargs)) return def test_records_links(self): """ websearch - check the links toward records in leaf collections """ browser = Browser() browser.open(make_url('/collection/Preprints')) def harvest(): """ Parse all the links in the page, and check that for each link to a detailed record, we also have the corresponding link to the similar records.""" records = set() similar = set() for link in browser.links(): path, q = parse_url(link.url) if not path: continue if path[0] == 'record': records.add(int(path[1])) continue if path[0] == 'search': if not q.get('rm') == ['wrd']: continue recid = q['p'][0].split(':')[1] similar.add(int(recid)) self.failUnlessEqual(records, similar) return records # We must have 10 links to the corresponding /records found = harvest() self.failUnlessEqual(len(found), 10) # When clicking on the "Search" button, we must also have # these 10 links on the records. browser.select_form(name="search") browser.submit() found = harvest() self.failUnlessEqual(len(found), 10) return class WebSearchTestBrowse(unittest.TestCase): def test_browse_field(self): """ websearch - check that browsing works """ browser = Browser() browser.open(make_url('/')) browser.select_form(name='search') browser['f'] = ['title'] browser.submit(name='action_browse') def collect(): # We'll get a few links to search for the actual hits, plus a # link to the following results. res = [] for link in browser.links(url_regex=re.compile(CFG_SITE_URL + r'/search\?')): if link.text == 'Advanced Search': continue dummy, q = parse_url(link.url) res.append((link, q)) return res # if we follow the last link, we should get another # batch. There is an overlap of one item. batch_1 = collect() browser.follow_link(link=batch_1[-1][0]) batch_2 = collect() # FIXME: we cannot compare the whole query, as the collection # set is not equal self.failUnlessEqual(batch_1[-2][1]['p'], batch_2[0][1]['p']) class WebSearchTestOpenURL(unittest.TestCase): def test_isbn_01(self): """ websearch - isbn query via OpenURL 0.1""" browser = Browser() # We do a precise search in an isolated collection browser.open(make_url('/openurl', isbn='0387940758')) dummy, current_q = parse_url(browser.geturl()) self.failUnlessEqual(current_q, { 'sc' : ['1'], 'p' : ['isbn:"0387940758"'], 'of' : ['hd'] }) def test_isbn_10_rft_id(self): """ websearch - isbn query via OpenURL 1.0 - rft_id""" browser = Browser() # We do a precise search in an isolated collection browser.open(make_url('/openurl', rft_id='urn:ISBN:0387940758')) dummy, current_q = parse_url(browser.geturl()) self.failUnlessEqual(current_q, { 'sc' : ['1'], 'p' : ['isbn:"0387940758"'], 'of' : ['hd'] }) def test_isbn_10(self): """ websearch - isbn query via OpenURL 1.0""" browser = Browser() # We do a precise search in an isolated collection browser.open(make_url('/openurl?rft.isbn=0387940758')) dummy, current_q = parse_url(browser.geturl()) self.failUnlessEqual(current_q, { 'sc' : ['1'], 'p' : ['isbn:"0387940758"'], 'of' : ['hd'] }) class WebSearchTestSearch(unittest.TestCase): def test_hits_in_other_collection(self): """ websearch - check extension of a query to the home collection """ browser = Browser() # We do a precise search in an isolated collection browser.open(make_url('/collection/ISOLDE', ln='en')) browser.select_form(name='search') browser['f'] = ['author'] browser['p'] = 'matsubara' browser.submit() dummy, current_q = parse_url(browser.geturl()) link = browser.find_link(text_regex=re.compile('.*hit', re.I)) dummy, target_q = parse_url(link.url) # the target query should be the current query without any c # or cc specified. for f in ('cc', 'c', 'action_search'): if f in current_q: del current_q[f] self.failUnlessEqual(current_q, target_q) def test_nearest_terms(self): """ websearch - provide a list of nearest terms """ browser = Browser() browser.open(make_url('')) # Search something weird browser.select_form(name='search') browser['p'] = 'gronf' browser.submit() dummy, original = parse_url(browser.geturl()) for to_drop in ('cc', 'action_search', 'f'): if to_drop in original: del original[to_drop] if 'ln' not in original: original['ln'] = [CFG_SITE_LANG] # we should get a few searches back, which are identical # except for the p field being substituted (and the cc field # being dropped). if 'cc' in original: del original['cc'] for link in browser.links(url_regex=re.compile(CFG_SITE_URL + r'/search\?')): if link.text == 'Advanced Search': continue dummy, target = parse_url(link.url) if 'ln' not in target: target['ln'] = [CFG_SITE_LANG] original['p'] = [link.text] self.failUnlessEqual(original, target) return def test_switch_to_simple_search(self): """ websearch - switch to simple search """ browser = Browser() args = {'as': 1} browser.open(make_url('/collection/ISOLDE', **args)) browser.select_form(name='search') browser['p1'] = 'tandem' browser['f1'] = ['title'] browser.submit() browser.follow_link(text='Simple Search') dummy, q = parse_url(browser.geturl()) self.failUnlessEqual(q, {'cc': ['ISOLDE'], 'p': ['tandem'], 'f': ['title'], 'ln': ['en']}) def test_switch_to_advanced_search(self): """ websearch - switch to advanced search """ browser = Browser() browser.open(make_url('/collection/ISOLDE')) browser.select_form(name='search') browser['p'] = 'tandem' browser['f'] = ['title'] browser.submit() browser.follow_link(text='Advanced Search') dummy, q = parse_url(browser.geturl()) self.failUnlessEqual(q, {'cc': ['ISOLDE'], 'p1': ['tandem'], 'f1': ['title'], 'as': ['1'], 'ln' : ['en']}) def test_no_boolean_hits(self): """ websearch - check the 'no boolean hits' proposed links """ browser = Browser() browser.open(make_url('')) browser.select_form(name='search') browser['p'] = 'quasinormal muon' browser.submit() dummy, q = parse_url(browser.geturl()) for to_drop in ('cc', 'action_search', 'f'): if to_drop in q: del q[to_drop] for bsu in ('quasinormal', 'muon'): l = browser.find_link(text=bsu) q['p'] = bsu if not same_urls_p(l.url, make_url('/search', **q)): self.fail(repr((l.url, make_url('/search', **q)))) def test_similar_authors(self): """ websearch - test similar authors box """ browser = Browser() browser.open(make_url('')) browser.select_form(name='search') browser['p'] = 'Ellis, R K' browser['f'] = ['author'] browser.submit() l = browser.find_link(text="Ellis, R S") self.failUnless(same_urls_p(l.url, make_url('/search', p="Ellis, R S", f='author', ln='en'))) class WebSearchNearestTermsTest(unittest.TestCase): """Check various alternatives of searches leading to the nearest terms box.""" def test_nearest_terms_box_in_okay_query(self): """ websearch - no nearest terms box for a successful query """ self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis', expected_text="jump to record")) def test_nearest_terms_box_in_unsuccessful_simple_query(self): """ websearch - nearest terms box for unsuccessful simple query """ self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellisz', expected_text="Nearest terms in any collection are", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=embed", expected_link_label='embed')) def test_nearest_terms_box_in_unsuccessful_simple_accented_query(self): """ websearch - nearest terms box for unsuccessful accented query """ self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=elliszà', expected_text="Nearest terms in any collection are", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=embed", expected_link_label='embed')) def test_nearest_terms_box_in_unsuccessful_structured_query(self): """ websearch - nearest terms box for unsuccessful structured query """ self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellisz&f=author', expected_text="Nearest terms in any collection are", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=fabbro&f=author", expected_link_label='fabbro')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=author%3Aellisz', expected_text="Nearest terms in any collection are", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=author%3Afabbro", expected_link_label='fabbro')) def test_nearest_terms_box_in_unsuccessful_phrase_query(self): """ websearch - nearest terms box for unsuccessful phrase query """ self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=author%3A%22Ellis%2C+Z%22', expected_text="Nearest terms in any collection are", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=author%3A%22Enqvist%2C+K%22", expected_link_label='Enqvist, K')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=%22ellisz%22&f=author', expected_text="Nearest terms in any collection are", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=%22Enqvist%2C+K%22&f=author", expected_link_label='Enqvist, K')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=%22elliszà%22&f=author', expected_text="Nearest terms in any collection are", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=%22Enqvist%2C+K%22&f=author", expected_link_label='Enqvist, K')) def test_nearest_terms_box_in_unsuccessful_boolean_query(self): """ websearch - nearest terms box for unsuccessful boolean query """ self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=title%3Aellisz+author%3Aellisz', expected_text="Nearest terms in any collection are", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=title%3Aenergi+author%3Aellisz", expected_link_label='energi')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=title%3Aenergi+author%3Aenergie', expected_text="Nearest terms in any collection are", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=title%3Aenergi+author%3Aenqvist", expected_link_label='enqvist')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?ln=en&p=title%3Aellisz+author%3Aellisz&f=keyword', expected_text="Nearest terms in any collection are", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=title%3Aenergi+author%3Aellisz&f=keyword", expected_link_label='energi')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?ln=en&p=title%3Aenergi+author%3Aenergie&f=keyword', expected_text="Nearest terms in any collection are", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=title%3Aenergi+author%3Aenqvist&f=keyword", expected_link_label='enqvist')) class WebSearchBooleanQueryTest(unittest.TestCase): """Check various boolean queries.""" def test_successful_boolean_query(self): """ websearch - successful boolean query """ self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis+muon', expected_text="records found", expected_link_label="Detailed record")) def test_unsuccessful_boolean_query_where_all_individual_terms_match(self): """ websearch - unsuccessful boolean query where all individual terms match """ self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis+muon+letter', expected_text="Boolean query returned no hits. Please combine your search terms differently.")) class WebSearchAuthorQueryTest(unittest.TestCase): """Check various author-related queries.""" def test_propose_similar_author_names_box(self): """ websearch - propose similar author names box """ self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=Ellis%2C+R&f=author', expected_text="See also: similar author names", expected_link_target=CFG_SITE_URL+"/search?ln=en&p=Ellis%2C+R+K&f=author", expected_link_label="Ellis, R K")) def test_do_not_propose_similar_author_names_box(self): """ websearch - do not propose similar author names box """ errmsgs = test_web_page_content(CFG_SITE_URL + '/search?p=author%3A%22Ellis%2C+R%22', expected_link_target=CFG_SITE_URL+"/search?ln=en&p=Ellis%2C+R+K&f=author", expected_link_label="Ellis, R K") if errmsgs[0].find("does not contain link to") > -1: pass else: self.fail("Should not propose similar author names box.") return class WebSearchSearchEnginePythonAPITest(unittest.TestCase): """Check typical search engine Python API calls on the demo data.""" def test_search_engine_python_api_for_failed_query(self): """websearch - search engine Python API for failed query""" self.assertEqual([], perform_request_search(p='aoeuidhtns')) def test_search_engine_python_api_for_successful_query(self): """websearch - search engine Python API for successful query""" self.assertEqual([8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 47], perform_request_search(p='ellis')) def test_search_engine_python_api_for_existing_record(self): """websearch - search engine Python API for existing record""" self.assertEqual([8], perform_request_search(recid=8)) def test_search_engine_python_api_for_nonexisting_record(self): """websearch - search engine Python API for non-existing record""" self.assertEqual([], perform_request_search(recid=1234567809)) def test_search_engine_python_api_for_nonexisting_collection(self): """websearch - search engine Python API for non-existing collection""" self.assertEqual([], perform_request_search(c='Foo')) def test_search_engine_python_api_for_range_of_records(self): """websearch - search engine Python API for range of records""" self.assertEqual([1, 2, 3, 4, 5, 6, 7, 8, 9], perform_request_search(recid=1, recidb=10)) def test_search_engine_python_api_ranked_by_citation(self): """websearch - search engine Python API for citation ranking""" self.assertEqual([82, 83, 87, 89], perform_request_search(p='recid:81', rm='citation')) def test_search_engine_python_api_textmarc(self): """websearch - search engine Python API for Text MARC output""" # we are testing example from /help/hacking/search-engine-api import cStringIO tmp = cStringIO.StringIO() perform_request_search(req=tmp, p='higgs', of='tm', ot=['100', '700']) out = tmp.getvalue() tmp.close() self.assertEqual(out, """\ 000000085 100__ $$aGirardello, L$$uINFN$$uUniversita di Milano-Bicocca 000000085 700__ $$aPorrati, Massimo 000000085 700__ $$aZaffaroni, A 000000001 100__ $$aPhotolab """) class WebSearchSearchEngineWebAPITest(unittest.TestCase): """Check typical search engine Web API calls on the demo data.""" def test_search_engine_web_api_for_failed_query(self): """websearch - search engine Web API for failed query""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=aoeuidhtns&of=id', expected_text="[]")) def test_search_engine_web_api_for_successful_query(self): """websearch - search engine Web API for successful query""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis&of=id', expected_text="[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 47]")) def test_search_engine_web_api_for_existing_record(self): """websearch - search engine Web API for existing record""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?recid=8&of=id', expected_text="[8]")) def test_search_engine_web_api_for_nonexisting_record(self): """websearch - search engine Web API for non-existing record""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?recid=123456789&of=id', expected_text="[]")) def test_search_engine_web_api_for_nonexisting_collection(self): """websearch - search engine Web API for non-existing collection""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?c=Foo&of=id', expected_text="[]")) def test_search_engine_web_api_for_range_of_records(self): """websearch - search engine Web API for range of records""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?recid=1&recidb=10&of=id', expected_text="[1, 2, 3, 4, 5, 6, 7, 8, 9]")) class WebSearchRestrictedCollectionTest(unittest.TestCase): """Test of the restricted Theses collection behaviour.""" def test_restricted_collection_interface_page(self): """websearch - restricted collection interface page body""" # there should be no Latest additions box for restricted collections self.assertNotEqual([], test_web_page_content(CFG_SITE_URL + '/collection/Theses', expected_text="Latest additions")) def test_restricted_search_as_anonymous_guest(self): """websearch - restricted collection not searchable by anonymous guest""" browser = Browser() browser.open(CFG_SITE_URL + '/search?c=Theses') response = browser.response().read() if response.find("If you think you have right to access it, please authenticate yourself.") > -1: pass else: self.fail("Oops, searching restricted collection without password should have redirected to login dialog.") return def test_restricted_search_as_authorized_person(self): """websearch - restricted collection searchable by authorized person""" browser = Browser() browser.open(CFG_SITE_URL + '/search?c=Theses') browser.select_form(nr=0) browser['p_un'] = 'jekyll' browser['p_pw'] = 'j123ekyll' browser.submit() if browser.response().read().find("records found") > -1: pass else: self.fail("Oops, Dr. Jekyll should be able to search Theses collection.") def test_restricted_search_as_unauthorized_person(self): """websearch - restricted collection not searchable by unauthorized person""" browser = Browser() browser.open(CFG_SITE_URL + '/search?c=Theses') browser.select_form(nr=0) browser['p_un'] = 'hyde' browser['p_pw'] = 'h123yde' browser.submit() # Mr. Hyde should not be able to connect: if browser.response().read().find("Authorization failure") <= -1: # if we got here, things are broken: self.fail("Oops, Mr.Hyde should not be able to search Theses collection.") def test_restricted_detailed_record_page_as_anonymous_guest(self): """websearch - restricted detailed record page not accessible to guests""" browser = Browser() browser.open(CFG_SITE_URL + '/record/35') if browser.response().read().find("You can use your nickname or your email address to login.") > -1: pass else: self.fail("Oops, searching restricted collection without password should have redirected to login dialog.") return def test_restricted_detailed_record_page_as_authorized_person(self): """websearch - restricted detailed record page accessible to authorized person""" browser = Browser() browser.open(CFG_SITE_URL + '/youraccount/login') browser.select_form(nr=0) browser['p_un'] = 'jekyll' browser['p_pw'] = 'j123ekyll' browser.submit() browser.open(CFG_SITE_URL + '/record/35') # Dr. Jekyll should be able to connect # (add the pw to the whole CFG_SITE_URL because we shall be # redirected to '/reordrestricted/'): if browser.response().read().find("A High-performance Video Browsing System") > -1: pass else: self.fail("Oops, Dr. Jekyll should be able to access restricted detailed record page.") def test_restricted_detailed_record_page_as_unauthorized_person(self): """websearch - restricted detailed record page not accessible to unauthorized person""" browser = Browser() browser.open(CFG_SITE_URL + '/youraccount/login') browser.select_form(nr=0) browser['p_un'] = 'hyde' browser['p_pw'] = 'h123yde' browser.submit() browser.open(CFG_SITE_URL + '/record/35') # Mr. Hyde should not be able to connect: if browser.response().read().find('You are not authorized') <= -1: # if we got here, things are broken: self.fail("Oops, Mr.Hyde should not be able to access restricted detailed record page.") def test_collection_restricted_p(self): """websearch - collection_restricted_p""" self.failUnless(collection_restricted_p('Theses'), True) self.failIf(collection_restricted_p('Books & Reports')) def test_get_permitted_restricted_collections(self): """websearch - get_permitted_restricted_collections""" from invenio.webuser import get_uid_from_email, collect_user_info self.assertEqual(get_permitted_restricted_collections(collect_user_info(get_uid_from_email('jekyll@cds.cern.ch'))), ['Theses']) self.assertEqual(get_permitted_restricted_collections(collect_user_info(get_uid_from_email('hyde@cds.cern.ch'))), []) class WebSearchRestrictedPicturesTest(unittest.TestCase): """ Check whether restricted pictures on the demo site can be accessed well by people who have rights to access them. """ def test_restricted_pictures_guest(self): """websearch - restricted pictures not available to guest""" error_messages = test_web_page_content(CFG_SITE_URL + '/record/1/files/0106015_01.jpg', - expected_text=['This file is restricted', - 'You are not authorized']) + expected_text=['This file is restricted. If you think you have right to access it, please authenticate yourself.']) if error_messages: self.fail(merge_error_messages(error_messages)) def test_restricted_pictures_romeo(self): """websearch - restricted pictures available to Romeo""" error_messages = test_web_page_content(CFG_SITE_URL + '/record/1/files/0106015_01.jpg', username='romeo', password='r123omeo', expected_text=[], unexpected_text=['This file is restricted', 'You are not authorized']) if error_messages: self.fail(merge_error_messages(error_messages)) def test_restricted_pictures_hyde(self): """websearch - restricted pictures not available to Mr. Hyde""" error_messages = test_web_page_content(CFG_SITE_URL + '/record/1/files/0106015_01.jpg', username='hyde', password='h123yde', expected_text=['This file is restricted', 'You are not authorized']) if error_messages: self.fail(merge_error_messages(error_messages)) class WebSearchRSSFeedServiceTest(unittest.TestCase): """Test of the RSS feed service.""" def test_rss_feed_service(self): """websearch - RSS feed service""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/rss', expected_text=' -1: self.fail("Oops, when split by collection is off, " "results overview should not be present.") if body.find('') == -1: self.fail("Oops, when split by collection is off, " "Atlantis collection should be found.") if body.find('') > -1: self.fail("Oops, when split by collection is off, " "Multimedia & Arts should not be found.") try: browser.find_link(url='#15') self.fail("Oops, when split by collection is off, " "a link to Multimedia & Arts should not be found.") except LinkNotFoundError: pass def test_results_overview_split_on(self): """websearch - results overview box when split by collection is on""" browser = Browser() browser.open(CFG_SITE_URL + '/search?p=of&sc=1') body = browser.response().read() if body.find("Results overview") == -1: self.fail("Oops, when split by collection is on, " "results overview should be present.") if body.find('') > -1: self.fail("Oops, when split by collection is on, " "Atlantis collection should not be found.") if body.find('') == -1: self.fail("Oops, when split by collection is on, " "Multimedia & Arts should be found.") try: browser.find_link(url='#15') except LinkNotFoundError: self.fail("Oops, when split by collection is on, " "a link to Multimedia & Arts should be found.") class WebSearchSortResultsTest(unittest.TestCase): """Test of the search results page's sorting capability.""" def test_sort_results_default(self): """websearch - search results sorting, default method""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=of&f=title&rg=1', - expected_text="[hep-th/9809057]")) + expected_text="[TESLA-FEL-99-07]")) def test_sort_results_ascending(self): """websearch - search results sorting, ascending field""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=of&f=title&rg=1&sf=reportnumber&so=a', expected_text="ISOLTRAP")) def test_sort_results_descending(self): """websearch - search results sorting, descending field""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=of&f=title&rg=1&sf=reportnumber&so=d', - expected_text=" [SCAN-9605071]")) + expected_text=" [TESLA-FEL-99-07]")) def test_sort_results_sort_pattern(self): """websearch - search results sorting, preferential sort pattern""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=of&f=title&rg=1&sf=reportnumber&so=d&sp=cern', expected_text="[CERN-TH-2002-069]")) class WebSearchSearchResultsXML(unittest.TestCase): """Test search results in various output""" def test_search_results_xm_output_split_on(self): """ websearch - check document element of search results in xm output (split by collection on)""" browser = Browser() browser.open(CFG_SITE_URL + '/search?sc=1&of=xm') body = browser.response().read() num_doc_element = body.count("") if num_doc_element == 0: self.fail("Oops, no document element " "found in search results.") elif num_doc_element > 1: self.fail("Oops, multiple document elements " "found in search results.") num_doc_element = body.count("") if num_doc_element == 0: self.fail("Oops, no document element " "found in search results.") elif num_doc_element > 1: self.fail("Oops, multiple document elements " "found in search results.") def test_search_results_xm_output_split_off(self): """ websearch - check document element of search results in xm output (split by collection off)""" browser = Browser() browser.open(CFG_SITE_URL + '/search?sc=0&of=xm') body = browser.response().read() num_doc_element = body.count("") if num_doc_element == 0: self.fail("Oops, no document element " "found in search results.") elif num_doc_element > 1: self.fail("Oops, multiple document elements " "found in search results.") num_doc_element = body.count("") if num_doc_element == 0: self.fail("Oops, no document element " "found in search results.") elif num_doc_element > 1: self.fail("Oops, multiple document elements " "found in search results.") def test_search_results_xd_output_split_on(self): """ websearch - check document element of search results in xd output (split by collection on)""" browser = Browser() browser.open(CFG_SITE_URL + '/search?sc=1&of=xd') body = browser.response().read() num_doc_element = body.count("" "found in search results.") elif num_doc_element > 1: self.fail("Oops, multiple document elements " "found in search results.") num_doc_element = body.count("") if num_doc_element == 0: self.fail("Oops, no document element " "found in search results.") elif num_doc_element > 1: self.fail("Oops, multiple document elements " "found in search results.") def test_search_results_xd_output_split_off(self): """ websearch - check document element of search results in xd output (split by collection off)""" browser = Browser() browser.open(CFG_SITE_URL + '/search?sc=0&of=xd') body = browser.response().read() num_doc_element = body.count("") if num_doc_element == 0: self.fail("Oops, no document element " "found in search results.") elif num_doc_element > 1: self.fail("Oops, multiple document elements " "found in search results.") num_doc_element = body.count("") if num_doc_element == 0: self.fail("Oops, no document element " "found in search results.") elif num_doc_element > 1: self.fail("Oops, multiple document elements " "found in search results.") class WebSearchUnicodeQueryTest(unittest.TestCase): """Test of the search results for queries containing Unicode characters.""" def test_unicode_word_query(self): """websearch - Unicode word query""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=title%3A%CE%99%CE%B8%CE%AC%CE%BA%CE%B7', expected_text="[76]")) def test_unicode_word_query_not_found_term(self): """websearch - Unicode word query, not found term""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=title%3A%CE%99%CE%B8', expected_text="ιθάκη")) def test_unicode_exact_phrase_query(self): """websearch - Unicode exact phrase query""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=title%3A%22%CE%99%CE%B8%CE%AC%CE%BA%CE%B7%22', expected_text="[76]")) def test_unicode_partial_phrase_query(self): """websearch - Unicode partial phrase query""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=title%3A%27%CE%B7%27', expected_text="[76]")) def test_unicode_regexp_query(self): """websearch - Unicode regexp query""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=title%3A%2F%CE%B7%2F', expected_text="[76]")) class WebSearchMARCQueryTest(unittest.TestCase): """Test of the search results for queries containing physical MARC tags.""" def test_single_marc_tag_exact_phrase_query(self): """websearch - single MARC tag, exact phrase query (100__a)""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=100__a%3A%22Ellis%2C+J%22', expected_text="[9, 14, 18]")) def test_single_marc_tag_partial_phrase_query(self): """websearch - single MARC tag, partial phrase query (245__b)""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=245__b%3A%27and%27', expected_text="[28]")) def test_many_marc_tags_partial_phrase_query(self): """websearch - many MARC tags, partial phrase query (245)""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=245%3A%27and%27', expected_text="[1, 8, 9, 14, 15, 20, 22, 24, 28, 33, 47, 48, 49, 51, 53, 64, 69, 71, 79, 82, 83, 85, 91, 96]")) def test_single_marc_tag_regexp_query(self): """websearch - single MARC tag, regexp query""" # NOTE: regexp queries for physical MARC tags (e.g. 245:/and/) # are not treated by the search engine by purpose. But maybe # we should support them?! self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=245%3A%2Fand%2F', expected_text="[]")) class WebSearchExtSysnoQueryTest(unittest.TestCase): """Test of queries using external system numbers.""" def test_existing_sysno_html_output(self): """websearch - external sysno query, existing sysno, HTML output""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?sysno=000289446CER', expected_text="The wall of the cave")) def test_existing_sysno_id_output(self): """websearch - external sysno query, existing sysno, ID output""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?sysno=000289446CER&of=id', expected_text="[95]")) def test_nonexisting_sysno_html_output(self): """websearch - external sysno query, non-existing sysno, HTML output""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?sysno=000289446CERRRR', expected_text="Requested record does not seem to exist.")) def test_nonexisting_sysno_id_output(self): """websearch - external sysno query, non-existing sysno, ID output""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?sysno=000289446CERRRR&of=id', expected_text="[]")) class WebSearchResultsRecordGroupingTest(unittest.TestCase): """Test search results page record grouping (rg).""" def test_search_results_rg_guest(self): """websearch - search results, records in groups of, guest""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?rg=17', expected_text="1 - 17")) def test_search_results_rg_nonguest(self): """websearch - search results, records in groups of, non-guest""" # This test used to fail due to saved user preference fetching # not overridden by URL rg argument. self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?rg=17', username='admin', expected_text="1 - 17")) class WebSearchSpecialTermsQueryTest(unittest.TestCase): """Test of the search results for queries containing special terms.""" def test_special_terms_u1(self): """websearch - query for special terms, U(1)""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=U%281%29', expected_text="[57, 79, 80, 88]")) def test_special_terms_u1_and_sl(self): """websearch - query for special terms, U(1) SL(2,Z)""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=U%281%29+SL%282%2CZ%29', expected_text="[88]")) def test_special_terms_u1_and_sl_or(self): """websearch - query for special terms, U(1) OR SL(2,Z)""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=U%281%29+OR+SL%282%2CZ%29', expected_text="[57, 79, 80, 88]")) def test_special_terms_u1_and_sl_or_parens(self): """websearch - query for special terms, (U(1) OR SL(2,Z))""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=%28U%281%29+OR+SL%282%2CZ%29%29', - expected_text="[57, 79, 80, 88]")) + expected_text="[57, 79, 88]")) class WebSearchJournalQueryTest(unittest.TestCase): """Test of the search results for journal pubinfo queries.""" def test_query_journal_title_only(self): """websearch - journal publication info query, title only""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&f=journal&p=Phys.+Lett.+B', expected_text="[77, 78, 85, 87]")) def test_query_journal_full_pubinfo(self): """websearch - journal publication info query, full reference""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&f=journal&p=Phys.+Lett.+B+531+%282002%29+301', expected_text="[78]")) class WebSearchStemmedIndexQueryTest(unittest.TestCase): """Test of the search results for queries using stemmed indexes.""" def test_query_stemmed_lowercase(self): """websearch - stemmed index query, lowercase""" # note that dasse/Dasse is stemmed into dass/Dass, as expected self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=dasse', expected_text="[25, 26]")) def test_query_stemmed_uppercase(self): """websearch - stemmed index query, uppercase""" # ... but note also that DASSE is stemmed into DASSE(!); so # the test would fail if the search engine would not lower the # query term. (Something that is not necessary for # non-stemmed indexes.) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?of=id&p=DASSE', expected_text="[25, 26]")) class WebSearchSummarizerTest(unittest.TestCase): """Test of the search results summarizer functions.""" def test_most_popular_field_values_singletag(self): """websearch - most popular field values, simple tag""" from invenio.search_engine import get_most_popular_field_values - self.assertEqual((('PREPRINT', 36), ('ARTICLE', 27), ('BOOK', 14), ('THESIS', 8), ('PICTURE', 7), ('POETRY', 2), ('REPORT', 2)), + self.assertEqual((('PREPRINT', 37), ('ARTICLE', 28), ('BOOK', 14), ('THESIS', 8), ('PICTURE', 7), ('POETRY', 2), ('REPORT', 2), ('ATLANTISTIMESNEWS', 1)), get_most_popular_field_values(range(0,100), '980__a')) def test_most_popular_field_values_singletag_multiexclusion(self): """websearch - most popular field values, simple tag, multiple exclusions""" from invenio.search_engine import get_most_popular_field_values - self.assertEqual((('PREPRINT', 36), ('ARTICLE', 27), ('BOOK', 14), ('REPORT', 2)), + self.assertEqual((('PREPRINT', 37), ('ARTICLE', 28), ('BOOK', 14), ('REPORT', 2), ('ATLANTISTIMESNEWS', 1)), get_most_popular_field_values(range(0,100), '980__a', ('THESIS', 'PICTURE', 'POETRY'))) def test_most_popular_field_values_multitag(self): """websearch - most popular field values, multiple tags""" from invenio.search_engine import get_most_popular_field_values self.assertEqual((('Ellis, J', 3), ('Enqvist, K', 1), ('Ibanez, L E', 1), ('Nanopoulos, D V', 1), ('Ross, G G', 1)), get_most_popular_field_values((9, 14, 18), ('100__a', '700__a'))) def test_most_popular_field_values_multitag_singleexclusion(self): """websearch - most popular field values, multiple tags, single exclusion""" from invenio.search_engine import get_most_popular_field_values self.assertEqual((('Enqvist, K', 1), ('Ibanez, L E', 1), ('Nanopoulos, D V', 1), ('Ross, G G', 1)), get_most_popular_field_values((9, 14, 18), ('100__a', '700__a'), ('Ellis, J'))) def test_most_popular_field_values_multitag_countrepetitive(self): """websearch - most popular field values, multiple tags, counting repetitive occurrences""" from invenio.search_engine import get_most_popular_field_values self.assertEqual((('THESIS', 2), ('REPORT', 1)), get_most_popular_field_values((41,), ('690C_a', '980__a'), count_repetitive_values=True)) self.assertEqual((('REPORT', 1), ('THESIS', 1)), get_most_popular_field_values((41,), ('690C_a', '980__a'), count_repetitive_values=False)) def test_ellis_citation_summary(self): """websearch - query ellis, citation summary output format""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis&of=hcs', expected_text="Less known papers (1-9)", expected_link_target=CFG_SITE_URL+"/search?p=ellis%20cited%3A1-%3E9&rm=citation", expected_link_label='1')) class WebSearchRecordCollectionGuessTest(unittest.TestCase): """Primary collection guessing tests.""" def test_guess_primary_collection_of_a_record(self): """websearch - guess_primary_collection_of_a_record""" self.assertEqual(guess_primary_collection_of_a_record(96), 'Articles') def test_guess_collection_of_a_record(self): """websearch - guess_collection_of_a_record""" self.assertEqual(guess_collection_of_a_record(96), 'Articles') self.assertEqual(guess_collection_of_a_record(96, '%s/collection/Theoretical Physics (TH)?ln=en' % CFG_SITE_URL), 'Articles') self.assertEqual(guess_collection_of_a_record(12, '%s/collection/Theoretical Physics (TH)?ln=en' % CFG_SITE_URL), 'Theoretical Physics (TH)') self.assertEqual(guess_collection_of_a_record(12, '%s/collection/Theoretical%%20Physics%%20%%28TH%%29?ln=en' % CFG_SITE_URL), 'Theoretical Physics (TH)') class WebSearchGetFieldValuesTest(unittest.TestCase): """Testing get_fieldvalues() function.""" def test_get_fieldvalues_001(self): """websearch - get_fieldvalues() for bibxxx-agnostic tags""" self.assertEqual(get_fieldvalues(10, '001___'), ['10']) def test_get_fieldvalues_980(self): """websearch - get_fieldvalues() for bibxxx-powered tags""" self.assertEqual(get_fieldvalues(18, '700__a'), ['Enqvist, K', 'Nanopoulos, D V']) self.assertEqual(get_fieldvalues(18, '909C1u'), ['CERN']) def test_get_fieldvalues_wildcard(self): """websearch - get_fieldvalues() for tag wildcards""" self.assertEqual(get_fieldvalues(18, '%'), []) self.assertEqual(get_fieldvalues(18, '7%'), []) self.assertEqual(get_fieldvalues(18, '700%'), ['Enqvist, K', 'Nanopoulos, D V']) self.assertEqual(get_fieldvalues(18, '909C0%'), ['1985', '13','TH']) def test_get_fieldvalues_recIDs(self): """websearch - get_fieldvalues() for list of recIDs""" self.assertEqual(get_fieldvalues([], '001___'), []) self.assertEqual(get_fieldvalues([], '700__a'), []) self.assertEqual(get_fieldvalues([10, 13], '001___'), ['10', '13']) self.assertEqual(get_fieldvalues([18, 13], '700__a'), ['Dawson, S', 'Ellis, R K', 'Enqvist, K', 'Nanopoulos, D V']) def test_get_fieldvalues_repetitive(self): """websearch - get_fieldvalues() for repetitive values""" self.assertEqual(get_fieldvalues([17, 18], '909C1u'), ['CERN', 'CERN']) self.assertEqual(get_fieldvalues([17, 18], '909C1u', repetitive_values=True), ['CERN', 'CERN']) self.assertEqual(get_fieldvalues([17, 18], '909C1u', repetitive_values=False), ['CERN']) class WebSearchAddToBasketTest(unittest.TestCase): """Test of the add-to-basket presence depending on user rights.""" def test_add_to_basket_guest(self): """websearch - add-to-basket facility allowed for guests""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10', expected_text='Add to basket')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10', expected_text='')) def test_add_to_basket_jekyll(self): """websearch - add-to-basket facility allowed for Dr. Jekyll""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10', expected_text='Add to basket', username='jekyll', password='j123ekyll')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10', expected_text='', username='jekyll', password='j123ekyll')) def test_add_to_basket_hyde(self): """websearch - add-to-basket facility denied to Mr. Hyde""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10', unexpected_text='Add to basket', username='hyde', password='h123yde')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10', unexpected_text='', username='hyde', password='h123yde')) class WebSearchAlertTeaserTest(unittest.TestCase): """Test of the alert teaser presence depending on user rights.""" def test_alert_teaser_guest(self): """websearch - alert teaser allowed for guests""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis', expected_link_label='email alert')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis', expected_text='RSS feed')) def test_alert_teaser_jekyll(self): """websearch - alert teaser allowed for Dr. Jekyll""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis', expected_text='email alert', username='jekyll', password='j123ekyll')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis', expected_text='RSS feed', username='jekyll', password='j123ekyll')) def test_alert_teaser_hyde(self): """websearch - alert teaser allowed for Mr. Hyde""" self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis', expected_text='email alert', username='hyde', password='h123yde')) self.assertEqual([], test_web_page_content(CFG_SITE_URL + '/search?p=ellis', expected_text='RSS feed', username='hyde', password='h123yde')) TEST_SUITE = make_test_suite(WebSearchWebPagesAvailabilityTest, WebSearchTestSearch, WebSearchTestBrowse, WebSearchTestOpenURL, WebSearchTestCollections, WebSearchTestRecord, WebSearchTestLegacyURLs, WebSearchNearestTermsTest, WebSearchBooleanQueryTest, WebSearchAuthorQueryTest, WebSearchSearchEnginePythonAPITest, WebSearchSearchEngineWebAPITest, WebSearchRestrictedCollectionTest, WebSearchRestrictedPicturesTest, WebSearchRSSFeedServiceTest, WebSearchXSSVulnerabilityTest, WebSearchResultsOverview, WebSearchSortResultsTest, WebSearchSearchResultsXML, WebSearchUnicodeQueryTest, WebSearchMARCQueryTest, WebSearchExtSysnoQueryTest, WebSearchResultsRecordGroupingTest, WebSearchSpecialTermsQueryTest, WebSearchJournalQueryTest, WebSearchStemmedIndexQueryTest, WebSearchSummarizerTest, WebSearchRecordCollectionGuessTest, WebSearchGetFieldValuesTest, WebSearchAddToBasketTest, WebSearchAlertTeaserTest) if __name__ == "__main__": run_test_suite(TEST_SUITE, warn_user=True) diff --git a/modules/websession/lib/webaccount.py b/modules/websession/lib/webaccount.py index 817561f84..726a3cc29 100644 --- a/modules/websession/lib/webaccount.py +++ b/modules/websession/lib/webaccount.py @@ -1,432 +1,433 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __revision__ = "$Id$" import re import MySQLdb import urllib from invenio.config import \ CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS, \ CFG_CERN_SITE, \ CFG_SITE_LANG, \ CFG_SITE_SUPPORT_EMAIL, \ CFG_SITE_ADMIN_EMAIL, \ CFG_SITE_SECURE_URL, \ CFG_VERSION, \ CFG_DATABASE_HOST, \ CFG_DATABASE_NAME from invenio.access_control_engine import acc_authorize_action from invenio.access_control_config import CFG_EXTERNAL_AUTHENTICATION, SUPERADMINROLE from invenio.dbquery import run_sql from invenio.webuser import getUid,isGuestUser, get_user_preferences, \ collect_user_info from invenio.access_control_admin import acc_find_user_role_actions from invenio.messages import gettext_set_language from invenio.external_authentication import InvenioWebAccessExternalAuthError import invenio.template websession_templates = invenio.template.load('websession') def perform_info(req, ln): """Display the main features of CDS personalize""" out = "" uid = getUid(req) user_info = collect_user_info(req) return websession_templates.tmpl_account_info( ln = ln, uid = uid, guest = isGuestUser(uid), CFG_CERN_SITE = CFG_CERN_SITE, ) def perform_display_external_user_settings(settings, ln): """show external user settings which is a dictionary.""" _ = gettext_set_language(ln) html_settings = "" print_settings = False settings_keys = settings.keys() settings_keys.sort() for key in settings_keys: value = settings[key] if key.startswith("EXTERNAL_") and not "HIDDEN_" in key: print_settings = True key = key[9:].capitalize() html_settings += websession_templates.tmpl_external_setting(ln, key, value) return print_settings and websession_templates.tmpl_external_user_settings(ln, html_settings) or "" def perform_youradminactivities(user_info, ln): """Return text for the `Your Admin Activities' box. Analyze whether user UID has some admin roles, and if yes, then print suitable links for the actions he can do. If he's not admin, print a simple non-authorized message.""" your_role_actions = acc_find_user_role_actions(user_info) your_roles = [] your_admin_activities = [] guest = isGuestUser(user_info['uid']) for (role, action) in your_role_actions: if role not in your_roles: your_roles.append(role) if action not in your_admin_activities: your_admin_activities.append(action) if SUPERADMINROLE in your_roles: for action in ("runbibedit", "cfgbibformat", "cfgbibharvest", "cfgoairepository", "cfgbibrank", "cfgbibindex", "cfgwebaccess", "cfgwebcomment", "cfgwebsearch", "cfgwebsubmit", "cfgbibknowledge"): if action not in your_admin_activities: your_admin_activities.append(action) return websession_templates.tmpl_account_adminactivities( ln = ln, uid = user_info['uid'], guest = guest, roles = your_roles, activities = your_admin_activities, ) def perform_display_account(req, username, bask, aler, sear, msgs, loan, grps, sbms, appr, admn, ln): """Display a dynamic page that shows the user's account.""" # load the right message language _ = gettext_set_language(ln) uid = getUid(req) user_info = collect_user_info(req) #your account if isGuestUser(uid): user = "guest" login = "%s/youraccount/login?ln=%s" % (CFG_SITE_SECURE_URL, ln) accBody = _("You are logged in as guest. You may want to %(x_url_open)slogin%(x_url_close)s as a regular user.") %\ {'x_url_open': '', 'x_url_close': ''} accBody += "

    " bask=aler=msgs= _("The %(x_fmt_open)sguest%(x_fmt_close)s users need to %(x_url_open)sregister%(x_url_close)s first") %\ {'x_fmt_open': '', 'x_fmt_close': '', 'x_url_open': '', 'x_url_close': ''} sear= _("No queries found") else: user = username accBody = websession_templates.tmpl_account_body( ln = ln, user = user, ) #Display warnings if user is superuser roles = acc_find_user_role_actions(user_info) warnings = "0" for role in roles: if "superadmin" in role: warnings = "1" break warning_list = superuser_account_warnings() #check if tickets ok tickets = (acc_authorize_action(user_info, 'runbibedit')[0] == 0) return websession_templates.tmpl_account_page( ln = ln, warnings = warnings, warning_list = warning_list, accBody = accBody, baskets = bask, alerts = aler, searches = sear, messages = msgs, loans = loan, groups = grps, submissions = sbms, approvals = appr, tickets = tickets, administrative = admn ) def superuser_account_warnings(): """Check to see whether admin accounts have default / blank password etc. Returns a list""" warning_array = [] #Try and connect to the mysql database with the default invenio password try: conn = MySQLdb.connect (host = CFG_DATABASE_HOST, user = "root", passwd = "my123p$ss", db = "mysql") conn.close() warning_array.append("warning_mysql_password_equal_to_invenio_password") except: pass #Try and connect to the invenio database with the default invenio password try: conn = MySQLdb.connect (host = CFG_DATABASE_HOST, user = "cdsinvenio", passwd = "my123p$ss", db = CFG_DATABASE_NAME) conn.close () warning_array.append("warning_invenio_password_equal_to_default") except: pass #Check if the admin password is empty res = run_sql("SELECT password, email from user where nickname = 'admin'") res1 = run_sql("SELECT email from user where nickname = 'admin' and password = AES_ENCRYPT(%s,'')", (res[0][1], )) for user in res1: warning_array.append("warning_empty_admin_password") #Check if the admin email has been changed from the default if (CFG_SITE_ADMIN_EMAIL == "cds.support@cern.ch" or CFG_SITE_SUPPORT_EMAIL == "cds.support@cern.ch") and CFG_CERN_SITE == 0: warning_array.append("warning_site_support_email_equal_to_default") #Check for a new release of CDS Invenio try: find = re.compile('CDS Invenio v[0-9]+.[0-9]+.[0-9]+ is released') webFile = urllib.urlopen("http://cdsware.cern.ch/download/RELEASE-NOTES") temp = "" version = "" version1 = "" while 1: temp = webFile.readline() match1 = find.match(temp) try: version = match1.group() break except: pass if not temp: break webFile.close() submatch = re.compile('[0-9]+.[0-9]+.[0-9]+') version1 = submatch.search(version) web_version = version1.group().split(".") local_version = CFG_VERSION.split(".") if web_version[0] > local_version[0]: warning_array.append("note_new_release_available") elif web_version[0] == local_version[0] and web_version[1] > local_version[1]: warning_array.append("note_new_release_available") elif web_version[0] == local_version[0] and web_version[1] == local_version[1] and web_version[2] > local_version[2]: warning_array.append("note_new_release_available") except: warning_array.append("error_cannot_download_release_notes") return warning_array def template_account(title, body, ln): """It is a template for print each of the options from the user's account.""" return websession_templates.tmpl_account_template( ln = ln, title = title, body = body ) def warning_guest_user(type, ln=CFG_SITE_LANG): """It returns an alert message,showing that the user is a guest user and should log into the system.""" # load the right message language _ = gettext_set_language(ln) return websession_templates.tmpl_warning_guest_user( ln = ln, type = type, ) def perform_delete(ln): """Delete the account of the user, not implement yet.""" # TODO return websession_templates.tmpl_account_delete(ln = ln) def perform_set(email, ln, can_config_bibcatalog = False, verbose = 0): """Perform_set(email,password): edit your account parameters, email and password. If can_config_bibcatalog is True, show the bibcatalog dialog (if configured). """ try: res = run_sql("SELECT id, nickname FROM user WHERE email=%s", (email,)) uid = res[0][0] nickname = res[0][1] except: uid = 0 nickname = "" CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL = CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS prefs = get_user_preferences(uid) if CFG_EXTERNAL_AUTHENTICATION.has_key(prefs['login_method']) and CFG_EXTERNAL_AUTHENTICATION[prefs['login_method']][0]: CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL = 3 out = websession_templates.tmpl_user_preferences( ln = ln, email = email, email_disabled = (CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL >= 2), password_disabled = (CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL >= 3), nickname = nickname, ) if len(CFG_EXTERNAL_AUTHENTICATION) > 1: try: uid = run_sql("SELECT id FROM user where email=%s", (email,)) uid = uid[0][0] except: uid = 0 current_login_method = prefs['login_method'] methods = CFG_EXTERNAL_AUTHENTICATION.keys() # Filtering out methods that don't provide user_exists to check if # a user exists in the external auth method before letting him/her # to switch. for method in methods: if CFG_EXTERNAL_AUTHENTICATION[method][0]: try: if not CFG_EXTERNAL_AUTHENTICATION[method][0].user_exists(email): methods.remove(method) except (AttributeError, InvenioWebAccessExternalAuthError): methods.remove(method) methods.sort() if len(methods) > 1: out += websession_templates.tmpl_user_external_auth( ln = ln, methods = methods, current = current_login_method, method_disabled = (CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 4) ) current_group_records = prefs.get('websearch_group_records', 10) show_latestbox = prefs.get('websearch_latestbox', True) show_helpbox = prefs.get('websearch_helpbox', True) out += websession_templates.tmpl_user_websearch_edit( ln = ln, current = current_group_records, show_latestbox = show_latestbox, show_helpbox = show_helpbox, ) preferred_lang = prefs.get('language', ln) out += websession_templates.tmpl_user_lang_edit( ln = ln, preferred_lang = preferred_lang ) #show this dialog only if the system has been configured to use a ticket system from invenio.config import CFG_BIBCATALOG_SYSTEM if CFG_BIBCATALOG_SYSTEM and can_config_bibcatalog: bibcatalog_username = prefs.get('bibcatalog_username', "") bibcatalog_password = prefs.get('bibcatalog_password', "") out += websession_templates.tmpl_user_bibcatalog_auth(bibcatalog_username, \ bibcatalog_password, ln=ln) if verbose >= 9: for key, value in prefs.items(): out += "%s:%s
    " % (key, value) out += perform_display_external_user_settings(prefs, ln) return out def create_register_page_box(referer='', ln=CFG_SITE_LANG): """Register a new account.""" return websession_templates.tmpl_register_page( referer = referer, ln = ln, level = CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS, ) ## create_login_page_box(): ask for the user's email and password, for login into the system def create_login_page_box(referer='', apache_msg="", ln=CFG_SITE_LANG): # List of referer regexep and message to print _ = gettext_set_language(ln) login_referrer2msg = ( (re.compile(r"/search"), "

    " + _("This collection is restricted. If you think you have right to access it, please authenticate yourself.") + "

    "), + (re.compile(r"/record/\d+/files/.+"), "

    " + _("This file is restricted. If you think you have right to access it, please authenticate yourself.") + "

    "), ) msg = "" for regexp, txt in login_referrer2msg: if regexp.search(referer): msg = txt break # FIXME: Temporary Hack to help CDS current migration if CFG_CERN_SITE and apache_msg: return msg + apache_msg if apache_msg: msg += apache_msg + "

    2) Otherwise please authenticate yourself" \ " in the following form:

    " internal = None for system in CFG_EXTERNAL_AUTHENTICATION.keys(): if not CFG_EXTERNAL_AUTHENTICATION[system][0]: internal = system break register_available = CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS <= 1 and internal methods = CFG_EXTERNAL_AUTHENTICATION.keys() methods.sort() selected = '' for method in methods: if CFG_EXTERNAL_AUTHENTICATION[method][1]: selected = method break return websession_templates.tmpl_login_form( ln = ln, referer = referer, internal = internal, register_available = register_available, methods = methods, selected_method = selected, msg = msg, ) # perform_logout: display the message of not longer authorized, def perform_logout(req, ln): return websession_templates.tmpl_account_logout(ln = ln) #def perform_lost: ask the user for his email, in order to send him the lost password def perform_lost(ln): return websession_templates.tmpl_lost_password_form(ln) #def perform_reset_password: ask the user for a new password to reset the lost one def perform_reset_password(ln, email, reset_key, msg=''): return websession_templates.tmpl_reset_password_form(ln, email, reset_key, msg) # perform_emailSent(email): confirm that the password has been emailed to 'email' address def perform_emailSent(email, ln): return websession_templates.tmpl_account_emailSent(ln = ln, email = email) # peform_emailMessage : display a error message when the email introduced is not correct, and sugest to try again def perform_emailMessage(eMsg, ln): return websession_templates.tmpl_account_emailMessage( ln = ln, msg = eMsg ) # perform_back(): template for return to a previous page, used for login,register and setting def perform_back(mess, url, linkname, ln='en'): return websession_templates.tmpl_back_form( ln = ln, message = mess, url = url, link = linkname, ) diff --git a/modules/websubmit/doc/admin/websubmit-admin-guide.webdoc b/modules/websubmit/doc/admin/websubmit-admin-guide.webdoc index 7f042937c..ed1951a85 100644 --- a/modules/websubmit/doc/admin/websubmit-admin-guide.webdoc +++ b/modules/websubmit/doc/admin/websubmit-admin-guide.webdoc @@ -1,2216 +1,2217 @@ ## -*- mode: html; coding: utf-8; -*- ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

    WARNING: OLD WEBSUBMIT ADMIN GUIDE FOLLOWS
    This WebSubmit Admin Guide was written for the previous PHP-based version of the admin tool. The submission concepts and pipeline description remain valid, but the interface snapshot examples would now differ. The guide is to be updated soon.

    Table of Contents

     

     

    General Overview of the Manager Tool

    Things to know before using the Manager:

       This manager tool allows you to administrate all the WebSubmit interface. With it, you will be able to create new actions, new types of documents and edit the existing ones.

       The main objects in webSubmit are the "action" (such as "Submit New Record", "Submit New File", "Modify Record"...) and the "type of document" (such as "preprint", "photo"...).

       To one given type of document can be attached several actions. An action is the addition of two processes:
      • The first one is the data gathering. The manager will allow you to create new web forms corresponding to the fields the user will have to fill in when using webSubmit.
      • The second one is the data treatement. Basically, what the program will do with the data gathered during the first phase. The treatment appears in this tool as a sequence of functions. This manager will allow you to add functions to an action, edit the existing functions, and reorder the functions.

    See also:

  • using the manager through an example
  • interface description
  • actions
  • document types
  •  

     

    Using the manager through an example

    what is this?

      This page presents you the typical situations a user could meet using WebSubmit, and for each situation how to use the manager to configure it.

    The user reaches WebSubmit main page.

    Main Page  To add a document type to WebSubmit, you should go to the main page and click on "New Doctype" in the left blue panel.

     Even once created, a document type will not appear automatically on this page. To configure the list of catalogues and document types displayed on this page, the administrator shall go to the edit catalogues page. (see the guide section)

    The user can then click on the document type he is interested in.

    Document type Page  The text appearing under the header containing the name of the document can be configured by going to the main page, click on the title of the document type then on the "Edit Document Types Details" button.

     You can associate several categories to a document type which can be defined by going to the main page, click on the title of the document type then on the "View Categories" button. The selected category will be saved in a file named "comboXXX" (where XXX is the short name of the document type) in the submission directory.

     To add an action button to this page, first implement this action by going to the main page, click on the title of the document type then on the "Add a new submission" button. If the action is already implemented and the button still does not appear on the submision page, then you should edit the details of this implementation: go to the main page, click on the title of the document type then on the icon in the "Edit Submission" column and in the line of the desired action. There you should set the "Displayed" form field to "YES".

     You can also change the order of the buttons, by going to the main page, click on the title of the document type then on the icon in the "Edit Submission" column and in the line of the desired action. There you can set the "buttonorder" form field.

    The user now may choose a category, then click on the action button he wishes.
    The submission starts, the first page of the web form appears.

    Document type Page  This web form is composed of several pages, on each of these pages form fields can be found. To modify the number of pages, add or withdraw form fields and modify the texts before each form field, you shall go to the main page, click on the title of the document type then on the icon in the "Edit Submission Pages" column and in the line of the desired action. (see the guide section)

    On the last page of the submission, there should be a button like in the following image which will trigger the end script

    Document type End Page  This button is defined like any other form field. Its definition should include a onclick="finish();" javascript attribute.

     After clicking this button, WebSubmit will apply the end script functions to the gathered data. To modify the end script, you shall go to the main page, click on the title of the document type then on the icon in the "Edit Functions" column and in the line of the desired action. (see the guide section)

    See also:

    interface description
    actions
    document types

     

     

    Philosophy behind the document submission system

    This page will explain some philosophical issues behind the document submission system.

    On the relation between a search collection and a submission doctype:

       The relation between a search collection and a submission document type may be prone to certain confusion for CDS Invenio administrators. This comes from the fact that there is no one-to-one direct mapping between them, as is usual elsewhere. The relation is more flexible than that.

       A search collection in CDS Invenio is defined through a search query. For example, "all records where field F contains the value V belong to collection C". Several assertions can be deduced from this definition:
       1/ A single record can appear in several collections.
       2/ There is no limitation to the number of collections in which a record can appear.
       3/ Any query can be used to build a collection. The query can also be a complex one using logical operators, hence can rely on the value of several fields.

       (In addition, a search collection can be defined via a set of its subcollections in the hierarchy tree. Refer to the WebSearch Admin Guide for that matter.)

       The submission system basically creates an XML MARC record and stores it in the database. To which collection this new record belongs depends exclusively on the content of the XML MARC record. This XML MARC record is created by the Make_Record function. So the secret of the matching of a submitted record to a particular collection lies in the configuration of this function. Some examples will clarify this point:

       Example 1: Let's consider a "Preprints" collection which is defined by this query: "980__a:PREPRINT". We want to create a submission document type from which all records will go to this "Preprints" collection. For this, the Make_Record function should be configured so that a 980__a field containing "PREPRINT" will always be created.
       Example 2: Let's still consider the same "Preprints" collection, and an additional "Theses" collection based on a slightly different query "980__a:THESIS". We want to create a single submission type from which the records will go in the "Preprints" or "Theses" collections depending on a field chosen by the submitter. In this case, the Make_Record function should be configured so that a 980__a field will contain either "PREPRINT" or "THESIS" depending on the value entered by the submitter.

       The apparent disconnection between a submission document type and a search collection allows a great flexibility, allowing administrators to create 1 to 1, 1 to n, n to 1 or even 1 to 0 (not very useful!) relations.

     

     

    Interface Description

    Welcome to webSubmit Management tool:

       on the websubmit admin main page you will find:



      • The list of all existing document type in the middle of the page. Click on one line in the list to have access to the main document modification panel
      • The right menu panel with the following links inside:
        • "webSubmit Admin": This links leads you back to the main page of the manager.
        • "New Doctype": Click here if you wish to create a new document type.
        • "Remove Doctype": Click here if you want to remove an existing document type.
        • "Available Actions": Lists all existing actions
        • "Available Javascript Checks": Lists all existing Javascript checking functions.
        • "Available Element Description": Lists all existing html form element descriptions.
        • "Available Functions": Lists all existing functions in CDS Submit.
        • "Organise Main Page": Allows you to manage the appearance and order of the list of document types on CDS Submit User main page.

    See also:

    interface description
    actions
    document types

     

     

    Document Types

       WebSubmit can propose several actions on different document types. Each of these document type may or may not implement all possible actions. The main difference between each document type is the metadata which define each of them, and may also be the kind of fulltext files attached to one record.

       A document type can be one of "Thesis", "Photos", "Videotapes"... or whatever type of document you may invent. A document type is always defined by its metadata. It may or may not have a fulltext file attached to it.

       This tool leaves you free to create the web forms adapted to whatever type of document you want to create (see "Create and Maintain the Web Form") as well as free to determine what treatment you wish to apply to the collected data (see "Create and Maintain the Data Treatment").

    See also:

    add a new type of document
    remove a type of document
    modify a type of document
    implement an action over a type of document

     

     

    Ading new type of document

    How to get there?

     Click on the "New Doctype" link in the webSubmit right menu.

    How to do this?

     A new document type is defined by 6 fields:
    • Creation Date and Modification Dates are generated and modified automatically.
    • Document Type ID: This is the acronym for your new document type. We usually use a 3 letters acronym.
    • Document Type Name: This is the full name of your new document. This is the text which will appear on the list of available documents and catalogues on webSubmit main page.
    • Document Type Description: This is the text which will appear on the document type submission page. This can be pure text or html.
    • Doctype to clone: Here you can choose to create your document type as a clone of another existing document type. If so, the new document type will implement all actions implemented by the chosen one. The web forms will be the same, and the functions also, as well as the values of the parameters for these functions. Of course once cloned, you will be able to modify the implemented actions.

    See also:

  • remove a type of document
  • modify a type of document
  • implement an action over a type of document
  •  

     

    Removing a Document Type

    How to get there?

     Click on the "Remove Doctype" link in the webSubmit admin right menu

    How to do this?

     Select the document type to delete then click on the "Remove Doctype" button. Remember by doing this, you will delete this document type as well as all the implementation of actions for this document type!

    See also:

  • create a type of document
  • modify a type of document
  • implement an action over a type of document
  •  

     

    Modifying a Document Type

    What is it?

     Modifying a document type in webSubmit - this will modify its general data description, not the implementations of the actions on this document type. For the later, please see implement an action over a type of document.

    How to get there?

     From the main page of the manager, click on the title of the document type you want to modify, then click on the "Edit Document Type Details".

    How to do this?

     Once here, you can modify 2 fields:
  • Document Type Name: This is the full name of your new document. This is the text which will appear on the list of available documents and catalogues on webSubmit main page.
  • Document Type Description: This is the text which will appear on the right of the screen when the user moves the mouse over the document type title and on the document type submission page. This can be pure text or html.
  • See also:

  • remove a type of document
  • create a type of document
  • implement an action over a type of document
  •  

     

    Actions

     In webSubmit you can create several actions (for example "Submit New Record", "Submit a New File", "Send to a Distribution List", etc. in fact any action you can imagine to perform on a document stored in your database). The creation of an action is very simple and consists in filling in a name, description and associating a directory to this action. The directory parameter indicates where the collected data will be stored when the action is carried on.

     Once an action is created, you have to implement it over a document type. Implementing an action means defining the web form which will be displayed to a user, and defining the treatment (set of functions) applied to the data which have been gathered. The implementation of the same action over two document types can be very different. The fields in the web form can be different as well as the functions applied at the end of this action.

    See also:

  • create a new action
  • remove an action
  • modify an action
  • implement an action over a type of document
  •  

     

    Adding a New Action

    How to get there?

     Click on the "Available Actions" link in the websubmit right menu, then on the "Add an Action" button.

    How to do this?

     A new action is defined by 6 fields:

    • Creation Date and Modification Dates are generated and modified automatically.
    • Action Code: This is the acronym for your new action. We usually use a 3 letters acronym.
    • Action Description: This is a short description of the new action.
    • dir: This is the name of the directory in which the submission data will be stored temporarily. If the dir value is "running" as for the "Submit New Record" action (SBI), then the submission data for a Text Document (document acronym "TEXT") will be stored in the /opt/cds-invenio/var/data/submit/storage/running/TEXT/9089760_90540 directory (where 9089760_90540 is what we call the submission number. It is a string automatically generated at the beginning of each submission). Once finished, the submission data will be moved to the /opt/cds-invenio/var/data/submit/storage/done/running/TEXT/ directory by the "Move_to_Done" function.
    • statustext: text displayed in the status bar of the browser when the user moves his mouse upon the action button.

    See also:

  • remove an action
  • modify an action
  • implement an action over a type of document
  •  

     

    Removing an Action

    What is it?

     Removing the implementation of an action over a document type - Please note the removal of the action itself is not allowed with this tool.

    How to get there?

     From the websubmit admin main page, click on the title of the relevant document type. Then click on the red cross corresponding to the line of the action you want to remove.

    See also:

  • create an action
  • modify an action
  • implement an action over a type of document
  •  

     

    Modifying an Action

    What is it?

     This page is about how to modify the general data about an action - for modifying the implementation of an action over a document type, see implement an action over a type of document

    How to get there?

     Click on the "View Actions" link in the right menu of the websubmit admin, then on the title of the action you want to modify...

    How to do this?

     You may modify 3 fields:
    • Action Description: This is a short description of the new action.
    • dir: This is the name of the directory in which the submission data will be stored temporarily. See the meaning of this parameter in create an action.
    • statustext: text displayed in the status bar of the browser when the user moves his mouse upon the action button.

    See also:

  • remove an action
  • create an action
  • implement an action over a type of document
  •  

     

    Implement an action over a document type

    What is it?

     Implement an action over a document type. Create the web forms and the treatment process.

    How to get there?

     From the main page of the manager, click on the title of the relevant document type.
    Then click on the "Add a New Submission" button.

    How to do this?

     Just select the name of the action you want to implement. When you select an action, the list of document which already implement this action appears. Then you can select from this list the document from which you want to clone the implementation, or just choose "No Clone" if you want to build this implementation from scratch.

     After selecting the correct fields, click on the "Add Submission" button.

     You then go back to the document type manager page where you can see that in the bottom array your newly implemented action appears (check the acronym in the first column).



    • Clicking on the action acronym will allow you to modify the general data about the action (remember in this case that all the other implementations of this particular action will also be changed).
    • The second column indicates whether the button representing this action will appear on the submission page.
    • The third column shows you the number of pages composing the web form for this implementation. (see create and maintain the web form).
    • The 4th and 5th columns indicate the creation and last modification dates for this implementation.
    • In the 6th column, you can find the order in which the button will be displayed on the submission page of this document type.
    • The following 4 columns (level, score, stpage, endtxt) deal with the insertion of this action in an action set.


      An action set is a succession of actions which should be done in a given order when a user starts.
      For example the submission of a document is usually composed of two actions: Submission of Bibliographic Information (SBI) and Fulltext Transfer (FTT) which should be done one after the other.
      When the user starts the submission, we want CDS Submit to get him first in SBI and when he finishes SBI to carry him to FTT.
      SBI and FTT are in this case in the same action set.
      They will both have a level of 1 ("level" is a bad name, it should be "action set number"), SBI will have a score of 1, and FTT a score of 2 (which means it will be started after SBI). If you set the stpage of FTT to 2, the user will be directly carried to the 2nd page of the FTT web form. This value is usually set to 1.
      The endtxt field contains the text which will be display to the user at the end of the first action (here it could be "you now have to transfer your files")

      A single action like "Modify Bibliographic Information" should have the 3 columns to 0,0 and 1.
       


    • Click on the icon in the 12th column ("Edit Submission Pages") to create or edit the web form.
    • Click on the icon in the 13th column ("Edit Functions") to create or edit the function list.
    • The "Edit Submission" column allows you to modify the data (level, status text...) for this implementation.
    • Finally the last column allows you to delete this implementation.
       

     If you chose to clone the implementation from an existing one, the web form as well as the functions list will already be defined. Else you will have to create them from scratch.

    See also:

  • create and maintain the web form
  • create and maintain the data treatment
  •  

     

    Create and maintain the web form

    What is it?

     Create and define the web form used during an action.

    How to get there?

     From the main page of the manager, click on the title of the relevant document type. Then click on the icon in the "Edit Submission Pages" column of the relevant line.

    List of the form pages

     A web form can be split over several pages. This is a matter of easiness for the user: he will have an overview of all form fields present on the page without having to scroll it. Moreover, each time the user goes from one page to the other, all entered data are saved. If he wants to stop then come back later (or if the browser crashes!) he will be able to get back to the submission at the exact moment he left it.

     Once here:



    you can see the ordered list of already existing pages in the web form. In this example there are 4 pages. You can then:
    • Move one page from one place to an other, using the small blue arrows under each page number.
    • Suppress one page by clicking on the relevant red cross.
    • Add a page, by clicking the "ADD A PAGE" button!
    • Edit the content of one page by clicking on the page number.
    • Go back to the document main page.

    Edit one form page

     Click on a page number, you then arrive to a place where you can edit this form page.

     A form page is composed of a list of form elements. Each of these form elements is roughly made of an html template and a text displayed before the form field.

     In the first part of the page, you have a preview of what the form will look like to the user:


     Then the second table shows you the list of the form elements present on the page:


     You can then:
    • Move one element from one place to another using the drop-down menus in the first column ("Item No") of the table, or the little blue arrows in the second column.
    • Edit the html template of one form element by clicking on the name of the template in the 3rd column ("Name").
    • Edit one of the form elements by clicking on the icon in the 10th column.
    • delete one form element by clicking on the relevant red cross.
    • Add an element to the page by clicking the "ADD ELEMENT TO PAGE" button.

    Edit the html template of one form element

     In the html template edition page, you can modify the following values:
    • Element type: indicates which html form element to create
    • Aleph code: Aleph users only! - This indicates in which field of the Aleph document database to retrieve the original value when modifying this information (function Create_Modify_Interface of action MBI).
    • Marc Code: MySQL users only! - This indicates in which field of the MySQL document database to retrieve the original value when modifying this information (function Create_Modify_Interface of action MBI).
    • Cookies: indicates whether WebSubmit will set a cookie on the value filled in by the user. If yes, next time the user will come to this submission, the value he has entered last time will be filled in automatically. Note: This feature has been REMOVED.
    • other fields: The other fields help defining the html form element.
    Important warning! Please remember this is a template! This means it can be used in many different web forms/implementations. When you modify this template the modification will take place in each of the implementations this template has been used.

    Edit one form element

     In the form element edition page, you may modify the following values:
    • element label: This is the text displayed before the actual form field.
    • level: can be one of "mandatory" or "optional". If mandatory, the user won't be able to leave this page before filling this field in.
    • short desc: This is the text displayed in the summary window when it is opened.
    • Check: Select here the javascript checking function to be applied to the submitted value of this field
    • Modify Text: This text will be displayed before the form field when modifying the value (action "Modify Record", function "Create_Modify_Interface")

    Add one form element

     Click on the "ADD ELEMENT TO PAGE" button. There you will have to decide which html template field to use ("Element Description code"), and also the field mentioned above.

    Create a new html template

     You have access to the list of all existing html templates by clicking on the "View element descriptions" link in the websubmit admin right menu.
    By clicking on one of them, you will have access to its description.
    If no template corresponds to the one you seek, click on the "ADD NEW ELEMENT DESCRIPTION" button to create one.
     The fields you have to enter in the creation form are the one described in the Edit the html template of one form element section.
    You also have to choose a name for this new element.
    IMPORTANT! The name you choose for your html element is also the name of the file in which webSubmit will save the value entered in this field. This is also the one you will use in your BibConvert configuration. Bibconvert is the program which will convert the data gathered in webSubmit in a formatted XML file for insertion in the documents database.
     Tips:
  • Elements of type "select box" which are used as a mandatory field in a form must start with "<option>Select:</option>"
  • Create and edit a checking function.

     Click on the "View Checks" link in the websubmit admin right menu. You then have access to a list of all the defined javascript functions.
    You can then click on the name of the function you want to modify, or click on the "ADD NEW CHECK" button to create a new javascript function.
    These functions are inserted in the web page when the user is doing his submission. When he clicks on "next page", this function will be called with the value entered by the user as a parameter. If the function returns false, the page does not change and an error message should be output. If the function returns true, everything is correct, so page can be changed.

    See also:

  • create and maintain the data treatment
  •  

     

    Setup the Data Treatment

    What is it?

     At the end of a submission, we have to tell webSubmit what to do with the data it has gathered. This is expressed through one or several lists of functions (we call this the "end script").

    How to get there?

     From the main page of the manager, click on the title of the relevant document type.
    Then click on the icon in the "Edit Functions" column of the relevant line.

    List of functions

     Here is what you may see then (this is the end script list of functions for a document type named "TEST" and action "FTT" - Fulltext Transfer):



     You can see the ordered list of all the functions in the end script. This end script is composed of 2 steps (see the "step" column). The functions composing the first step are called, then there should be action from the user which would trigger step 2 - in the present case the Upload_Files function (last of step 1) allows the user to upload additional files by creating a web form, then when the user finishes, he presses another button created by the function, which ends the process. Functions of step 2 are then called.

     Why implement multiple steps? The reason can vary with the task you want to accomplish. For example with the example above (Fulltext Transfer), we use the first step to allow the upload of multiple additional files (dynamic action) which could not be done in the static web form. In the case of the "Modify Bibliographic Information" action, the first step is used to display the fields the user wants to modify, prefilled with the existing values. The reason is once again that the task we want to realise is dynamic.

     The "score" column is used to order the functions. The function which has the smallest score will be called first, and the largest score will be called last.

     You can then:
    • View and edit the parameters of each function by clicking on the name of the function.
    • Move one function up and down, by using the small blue arrows.
    • Suppress one function by clicking on the relevant red cross.
    • Add a function to the list by clicking the "ADD FUNCTION" button.
    • Go back to the document main page ("FINISHED" button).
     Please note: To pass one function from one step to another, you have to delete it then add it again in the proper step.

    See also:

  • all about functions
  •  

     

    Functions

    Description:

     In webSubmit, each action process is divided into two phases: the gathering of data (through a web form) and the treatment of the data.

     The treatment is organised in a succession of functions, each of which has its own input and output.

     The functions themselves are stored in separate files (one per function) in the /opt/cds-invenio/lib/python/invenio/websubmit_functions directory. A file containing a function MUST be named after the function name itself. For example, a function called "Move_to_Done" MUST be stored in a file called Move_to_Done.py. The case is important here.

     For a description of what should be inside the file, have a look to the "create a new function" page of this guide.

     To each function you can associate one or several parameters, which may have different values according to the document type the function is used for. One parameter may be used for different functions. For example one standard parameter used in several functions is called "edsrn". It contains the name of the file in which the reference of the document is stored.

    See also:

  • create a new function
  • delete a function
  • edit a function
  •  

     

    Creating a New Function

    How to get there?

     Click on the "Available Functions" link in the websubmit admin right menu. Then click on the "Add New Function" button.

    How to do this?

     Enter the name of the new function as well as a text description if you wish.
     You will then reach a page where you can add parameters to your new function.

     Don't forget to add the function file inside the /opt/cds-invenio/lib/python/invenio/websubmit_functions directory and to name the file after the function. Functions must be written in Python. Here is an example implementation of a function:

    /opt/cds-invenio/lib/python/invenio/websubmit_functions/Get_Report_Number.py:

    def Get_Report_Number (parameters,curdir,form): global rn #Path of file containing report number if os.path.exists("%s/%s" % (curdir,parameters['edsrn'])): fp = open("%s/%s" % (curdir,parameters['edsrn']),"r") rn = fp.read() rn = rn.replace("/","_") rn = re.sub("[\n\r ]+","",rn) else: rn = "" return ""

    The function parameters are passed to the function through the parameters dictionary.
    The curdir parameter contains the current submission directory path.
    The form parameter contains the form passed to the current web page for possible reference from inside the function.

    See also:

  • edit a function
  • delete a function
  •  

     

    Removing a Function

    Note

     There are currently no way of deleting a function through this interface. Use the direct MySQL command line interface for this.

    See also:

  • edit a function
  • create a function
  •  

     

    Editing a Function

    What is it?

     Edit a function, add parameters to it...

    How to get there?

     Click on the "Available Functions" link in the websubmit admin right menu.

    How to do this?

     On this page appears a list of all functions defined into the system. Two columns give you access to some features:
    • View function usage Click here to have access to the list of all document types and all actions in which this function is used. Then by clicking on one of the items, you will be given a chance to modify the parameters value for the given document type.
    • View/Edit function details There you will be able to modify the function description, as well as add/withdraw parameters for this function.

    See also:

  • create a new function
  • delete a function
  •  

     

    All functions explained

    Description:

     This page lists and explains all the functions used in the demo provided with the CDS Invenio package. This list is not exhaustive since you can add any new function you need.
     Click on one function name to get its description.
     Please note in this page when we refer to [param] this means the value of the parameter 'param' for a given document type.

    CaseEDS
    Create_Modify_Interface
    Create_Recid
    Finish_Submission
    Get_Info
    Get_Recid
    Get_Report_Number
    Get_Sysno
    Get_TFU_Files
    Insert_Modify_Record
    Insert_Record
    Is_Original_Submitter
    Is_Referee
    Mail_Submitter
    Make_Modify_Record
    Make_Record
    Move_From_Pending
    Move_to_Done
    Move_to_Pending
    Print_Success
    Print_Success_APP
    Print_Success_MBI
    Print_Success_SRV
    Report_Number_Generation
    Send_Approval_Request
    Send_APP_Mail
    Send_Modify_Mail
    Send_SRV_Mail
    Test_Status
    Update_Approval_DB
    Upload_Files


    CaseEDS
    description
    This function may be used if the treatment to be done after a submission depends on a field entered by the user. Typically this is used in an approval interface. If the referee approves then we do this. If he rejects, then we do other thing.
    More specifically, the function gets the value from the file named [casevariable] and compares it with the values stored in [casevalues]. If a value matches, the function directly goes to the corresponding step stored in [casesteps]. If no value is matched, it goes to step [casedefault].
    parameters
    casevariable This parameters contains the name of the file in which the function will get the chosen value.
    Eg: "decision"
    casevalues Contains the list of recognized values to match with the chosen value. Should be a comma separated list of words.
    Eg: "approve,reject"
    casesteps Contains the list of steps corresponding to the values matched in [casevalue]. It should be a comma separated list of numbers
    Eg: "2,3"
    In this example, if the value stored in the file named "decision" is "approved", then the function launches step 2 of this action. If it is "reject", then step 3 is launched.
    casedefault Contains the step number to go by default if no match is found.
    Eg: "4"
    In this example, if the value stored in the file named "decision" is not "approved" nor "reject", then step 4 is launched.


    Create_Modify_Interface
    description
    To be used in the MBI-Modify Record action. It displays a web form allowing the user to modify the fields he chose. The fields are prefilled with the existing values extracted from the documents database. This functions takes the values stored in the [fieldnameMBI] file. This file contains a list of field name separated with "+" (it is usually generated from a multiple select form field). Then the function retrieves the corresponding tag name (marc-21) stored in the element definition. Finally it displays the web form and fills it with the existing values found in the documents database.
    parameters
    fieldnameMBI Contains the name of the file in which the function will find the list of fields the user wants to modify. Depends on the web form configuration.


    Create_Recid
    description
    This function retrieves a new record id from the records database. This record id will then be used to create the XML record afterwards, or to link with the fulltext files. The created id is stored in a file named "SN".
    parameters
    none


    Finish_Submission
    description
    This function stops the data treatment process even if further steps exist. This is used for example in the approval action. In the first step, the program determines whether the user approved or rejected the document (see CaseEDS function description). Then depending on the result, it executes step 2 or step 3. If it executes step 2, then it should continue with step 3 if nothing stopped it. The Finish_Submission function plays this role.
    parameters
    none


    Get_Info
    description
    This function tries to retrieve in the "pending" directory or directly in the documents database, some information about the document: title, original submitter's email and author(s).
    If found, this information is stored in 3 global variables: $emailvalue, $titlevalue, $authorvalue to be used in other functions.
    If not found, an error message is displayed.
    parameters
    authorFile Name of the file in which the author may be found if the document has not yet been integrated (in this case it is still in the "pending" directory).
    emailFile Name of the file in which the email of the original submitter may be found if the document has not yet been integrated (in this case it is still in the "pending" directory).
    titleFile Name of the file in which the title may be found if the document has not yet been integrated (in this case it is still in the "pending" directory).


    Get_Recid
    description
    This function searches for the document in the database and stores the recid of this document in the "SN" file and in a global variable "sysno".
    The function conducts the search based upon the document's report-number (and relies upon the global variable "rn") so the "Get_Report_Number" function should be called before this one.
    This function replaces the older function "Get_Sysno".
    parameters
    none


    Get_Report_Number
    description
    This function gets the value contained in the [edsrn] file and stores it in the reference global variable.
    parameters
    edsrn Name of the file which stores the reference.
    This value depends on the web form configuration you did. It should contain the name of the form element used for storing the reference of the document.


    Get_Sysno
    description
    This function searches for the document in the database and stores the system number of this document in the "SN" file and in a global variable.
    "Get_Report_Number" should be called before.
    Deprecated: Use Get_Recid instead.
    parameters
    none


    Insert_Modify_Record
    description
    This function gets the output of bibconvert and uploads it into the MySQL bibliographical database.
    parameters
    none


    Insert_Record
    description
    This function gets the output of bibFormat and uploads it into the MySQL bibliographical database.
    parameters
    none


    Is_Original_Submitter
    description
    If the authentication module (login) is active in webSubmit, this function compares the current login with the email of the original submitter. If it is the same (or if the current user has superuser rights), we go on. If it differs, an error message is issued.
    parameters
    none


    Is_Referee
    description
    This function checks whether the currently logged user is a referee for this document.
    parameters
    none


    Mail_Submitter
    description
    This function send an email to the submitter to warn him the document he has just submitted has been correctly received.
    parameters
    authorfile Name of the file containing the authors of the document
    titleFile Name of the file containing the title of the document
    emailFile Name of the file containing the email of the submitter of the document
    status Depending on the value of this parameter, the function adds an additional text to the email.
    This parameter can be one of:
    ADDED: The file has been integrated in the database.
    APPROVAL: The file has been sent for approval to a referee.
    or can stay empty.
    edsrn Name of the file containing the reference of the document
    newrnin Name of the file containing the 2nd reference of the document (if any)


    Make_Modify_Record
    description
    This function creates the record file formatted for a direct insertion in the documents database. It uses the BibConvert tool.
    The main difference between all the Make_..._Record functions are the parameters.
    As its name says, this particular function should be used for the modification of a record. (MBI- Modify Record action).
    parameters
    modifyTemplate Name of bibconvert's configuration file used for creating the mysql record.
    sourceTemplate Name of bibconvert's source file.


    Make_Record
    description
    This function creates the record file formatted for a direct insertion in the documents database. It uses the BibConvert tool.
    The main difference between all the Make_..._Record functions are the parameters.
    As its name does not say :), this particular function should be used for the submission of a document.
    parameters
    createTemplate Name of bibconvert's configuration file used for creating the mysql record.
    sourceTemplate Name of bibconvert's source file.


    Move_From_Pending
    description
    This function retrieves the data of a submission which was temporarily stored in the "pending" directory (waiting for an approval for example), and moves it to the current action directory.
    parameters
    none


    Move_to_Done
    description
    This function moves the existing submission directory to the /opt/cds-invenio/var/data/submit/storage/done directory. If the Then it tars and gzips the directory.
    parameters
    none


    Move_to_Pending
    description
    This function moves the existing submission directory to the /opt/cds-invenio/var/data/submit/storage/pending directory. It is used to store temporarily this data until it is approved or...
    parameters
    none


    Print_Success
    description
    This function simply displays a text on the screen, telling the user the submission went fine. To be used in the "Submit New Record" action.
    parameters
    status Depending on the value of this parameter, the function adds an additional text to the email.
    This parameter can be one of:
    ADDED: The file has been integrated in the database.
    APPROVAL: The file has been sent for approval to a referee.
    or can stay empty.
    edsrn Name of the file containing the reference of the document
    newrnin Name of the file containing the 2nd reference of the document (if any)


    Print_Success_APP
    description
    This function simply displays a text on the screen, telling the referee his decision has been taken into account. To be used in the Approve (APP) action.
    parameters
    none


    Print_Success_MBI
    description
    This function simply displays a text on the screen, telling the user the modification went fine. To be used in the Modify Record (MBI) action.
    parameters
    none


    Print_Success_SRV
    description
    This function simply displays a text on the screen, telling the user the revision went fine. To be used in the Submit New File (SRV) action.
    parameters
    none


    Report_Number_Generation
    description
    This function is used to automatically generate a reference number.
    After generating the reference, the function saves it into the [newrnin] file and sets the global variable containing this reference.
    parameters
    autorngen If set to "Y": The reference number is generated.
    If set to "N": The reference number is read from a file ([newrnin])
    If set to "A": The reference number will be the access number of the submission.
    counterpath indicates the file in which the program will find the counter for this reference generation.
    The value of this parameter may contain one of:
    "<PA>categ</PA>": in this case this string is replaced with the content of the file [altrnin]
    "<PA>yy</PA>": in this case this string is replaced by the current year (4 digits) if [altyeargen] is set to "AUTO", or by the content of the [altyeargen] file in any other case. (this content should be formatted as a date (dd/mm/yyyy).
    "<PA>file:name_of_file</PA>": in this case, this string is replaced by the first line of the given file
    "<PA>file*:name_of_file</PA>": in this case, this string is replaced by all the lines of the given file, separated by a dash ('-') character.
    rnformat This is the format used by the program to create the reference. The program computes the value of the parameter and appends a "-" followed by the current value of the counter increased by 1.
    The value of this parameter may contain one of:
    "<PA>categ</PA>": in this case this string is replaced with the content of the file [altrnin]
    "<PA>yy</PA>": in this case this string is replaced by the current year (4 digits) if [altyeargen] is set to "AUTO", or by the content of the [altyeargen] file in any other case. (this content should be formatted as a date (dd/mm/yyyy).
    "<PA>file:name_of_file</PA>": in this case, this string is replaced by the first line of the given file
    "<PA>file*:name_of_file</PA>": in this case, this string is replaced by all the lines of the given file, separated by a dash ('-') character.
    rnin This parameter contains the name of the file in which the program will find the category if needed. The content of thif file will then replace the string <PA>categ</PA> in the reference format or in the counter path.
    yeargen This parameter can be one of:
    "AUTO": in this case the program takes the current 4 digit year.
    "<filename>": in this case the program extract the year from the file which name is <filename>. This file should contain a date (dd/mm/yyyy).
    edsrn Name of the file in which the created reference will be stored.


    Send_Approval_Request
    description
    This function sends an email to the referee in order to start the simple approval process.
    This function is very CERN-specific and should be changed in case of external use.
    Must be called after the Get_Report_Number function.
    parameters
    addressesDAM email addresses of the people who will receive this email (comma separated list). this parameter may contain the <CATEG> string. In which case the variable computed from the [categformatDAM] parameter replaces this string.
    eg.: "<CATEG>-email@cern.ch"
    categformatDAM contains a regular expression used to compute the category of the document given the reference of the document.
    eg.: if [categformatAFP]="TEST-<CATEG>-.*" and the reference of the document is "TEST-CATEGORY1-2001-001", then the computed category equals "CATEGORY1"
    authorfile name of the file in which the authors are stored
    titlefile name of the file in which the title is stored.
    directory parameter used to create the URL to access the files.


    Send_APP_Mail
    description
    Sends an email to warn people that a document has been approved.
    parameters
    addressesAPP email addresses of the people who will receive this email (comma separated list). this parameter may contain the <CATEG> string. In which case the variable computed from the [categformatAFP] parameter replaces this string.
    eg.: "<CATEG>-email@cern.ch"
    categformatAPP contains a regular expression used to compute the category of the document given the reference of the document.
    eg.: if [categformatAFP]="TEST-<CATEG>-.*" and the reference of the document is "TEST-CATEGORY1-2001-001", then the computed category equals "CATEGORY1"
    newrnin Name of the file containing the 2nd reference of the approved document (if any).
    edsrn Name of the file containing the reference of the approved document.


    Send_Modify_Mail
    description
    This function sends an email to warn people a document has been modified and the user his modifications have been taken into account..
    parameters
    addressesMBI email addresses of the people who will receive this email (comma separated list).
    fieldnameMBI name of the file containing the modified fields.
    sourceDoc Long name for the type of document. This name will be displayed in the mail.
    emailfile name of the file in which the email of the modifier will be found.


    Send_SRV_Mail
    description
    This function sends an email to warn people a revision has been carried out.
    parameters
    notefile name of the file in which the note can be found
    emailfile name of the file containing the submitter's email
    addressesSRV email addresses of the people who will receive this email (comma separated list). this parameter may contain the <CATEG> string. In which case the variable computed from the [categformatDAM] parameter replaces this string.
    eg.: "<CATEG>-email@cern.ch"
    categformatDAM contains a regular expression used to compute the category of the document given the reference of the document.
    eg.: if [categformatAFP]="TEST-<CATEG>-.*" and the reference of the document is "TEST-CATEGORY1-2001-001", then the computed category equals "CATEGORY1"


    Test_Status
    description
    This function checks whether the considered document has been requested for approval and is still waiting for approval. It also checks whether the password stored in file "password" of the submission directory corresponds to the password associated with the document..
    parameters
    none


    Update_Approval_DB
    description
    This function updates the approval database when a document has just been approved or rejected. It uses the [categformatDAM] parameter to compute the category of the document.
    Must be called after the Get_Report_Number function.
    parameters
    categformatDAM It contains the regular expression which allows the retrieval of the category from the reference number.
    Eg: if [categformatDAM]="TEST-<CATEG>-.*" and the reference is "TEST-CATEG1-2001-001" then the category will be recognized as "CATEG1".


    Upload_Files
    description
    This function displays the list of already transfered files (main and additional ones), and also outputs an html form for uploading other files (pictures or fulltexts).
    parameters
    maxsize Maximum allowed size for the transfered files (size in bits)
    minsize Minimum allowed size for the transfered files (size in bits)
    iconsize In case the transfered files are pictures (jpg, gif or pdf), the function will automatically try to create icons from them. This parameter indicates the size in pixel of the created icon.
    type This can be one of "fulltext" or "picture". If the type is set to "picture" then the function will try to create icons (uses the ImageMagick's "convert" tool)

    See also:

  • create a new function
  • delete a function
  • edit a function
  •  

     

    Protection and Restriction

    Description:

     In webSubmit, you can restrict the use of some actions on a given document type to a list of users. You can use the webAccess manager for this.

     Let's say you want to restrict the submission of new TEXT documents to a given user. You should then create a role in webAccess which will authorize the action "submit" over doctype "TEXT" and act "SBI" (Submit new record). You can call this role "submitter_TEXT_SBI" for example. Then link the role to the proper users.
     Another example: if you wish to authorize a user to Modify the bibliographic data of PICT documents, you have to create a role which authorize the action "submit" over doctype "PICT" and act "MBI". This role can be called "submitter_PICT_MBI" or whatever you want.

     If no role is defined for a given action and a given document type, then all users will be allowed to use it.

     

     

    Submission Catalogue Organisation

    What is it?

     This feature allows you to organise the way webSubmit main page will look like. You will be able to group document types inside catalogues and order the catalogues the way you wish.

    How to get there?

     Click on the "Organisation" link in the websubmit admin right menu.

    How to do this?

     Once on the "Edit Catalogues page", you will find the currently defined organisation chart in the middle of the page. To the right, one form allows you to create a new catalogue ("Add a Catalogue") and one to add a document type to an existing catalogue ("Add a document type").
     
    • To add a catalogue: Enter the name of your new catalogue in the "Catalogue Name" free text field then choose to which existing catalogue this one will be attached to. If you attach the new one to an already existing catalogue, you can create a sub-catalogue. To actually create it, click on "ADD".
    • To add a document type to a catalogue: Choose in the list of existing "Document type names" the one you want to add to the chart. Then choose to which catalogue the document type will be associated. Click on "ADD" to finalise this action.
    • To withdraw a document type or a catalogue from the chart: Click on the red cross next to the item you want to withdraw. If you withdraw a catalogue all document types attached to it will be withdrawn also (of course the actual document types in webSubmit won't be destroyed!).
    • To move a document type or a catalogue in the chart: Use the small up and down arrows next to the document type/catalogue title.

    See also:

  • Create a New Document Type
  • document types
  •  

     

    BibConvert

    What is it?

     WebSubmit stores the data gathered during a submission in a directory. In this directory each file corresponds to a field saved during the submission.
     BibConvert is used to create a formatted file which will be easy to upload in the bibliographical database from this directory.
     This BibConvert program is called from the Make_Record and Make_Modify_Record functions from the end script system of webSubmit.
     The BibConvert configuration files used by webSubmit are in the /bibconvert/config directory.

     For more info about bibconvert, please see the dedicated guide.

     

     

    FAQ

     Q1. I'd like to be warned each time there is an error, or an important action is made through the manager. Is this possible?
     Q2. Where are all the files stored in this system?
     Q3. How is the documents archive organised?



     Q1. I'd like to be warned each time there is an error, or an important action is made through the manager. Is this possible?
    Yes, it is. Edit the invenio-local.conf file, the "CFG_SITE_ADMIN_EMAIL" definition and set it to your email address. You will then receive all the warning emails issued by the manager.
     Q2. Where are all the files stored in this system?
  • the counter files are here: /opt/cds-invenio/var/data/submit/counters. There are used by the Report_Number_Generation function.
  • all running and completed submissions are stored here: /opt/cds-invenio/var/data/submit/storage.
  • all the document files attached to records are stored here: /opt/cds-invenio/var/data/files.
  • all python functions used by webSubmit are stored here: /opt/cds-invenio/lib/python/invenio/websubmit_functions
  •  Q3. How is the documents archive organised?
    First of all, the documents files attached to records are stored here: /opt/cds-invenio/var/data/files.

    The Upload_Files webSubmit function is used to link a document with a record.

    All documents get an id from the system and are stored in the "bibdoc" table in the database. The link between a document and a record is stored using the "bibdoc_bibrec" table.

    The document id is used to determine where the files are stored. For example the files of document #14 will be stored here: /opt/cds-invenio/var/data/files/g0/14

    The subdirectory g0 is used to split the documents accross the filesystem. The CFG_FILE_DIR_SIZE variable from invenio.conf determines how many documents will be stored under one subdirectory.

    Several files may be stored under the same document directory: they are the different formats and versions of the - same document. Versions are indicated by a string of the form ";1.0" concatenated to the name of the file. + same document. Versions are indicated by a string of the form ";1.0" concatenated to the name of the file.

    + Please see the HOWTO Manage Fulltext Files for more information on the administrative command line tools available to manipulate fulltext files.

    See also:

    notes
    diff --git a/modules/websubmit/doc/hacking/Makefile.am b/modules/websubmit/doc/hacking/Makefile.am index fa9101c6b..5dbfa6415 100644 --- a/modules/websubmit/doc/hacking/Makefile.am +++ b/modules/websubmit/doc/hacking/Makefile.am @@ -1,25 +1,28 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. webdoclibdir = $(libdir)/webdoc/invenio/hacking -webdoclib_DATA = \ - websubmit-bibdocfile.webdoc +webdoclib_DATA = websubmit-bibdocfile.webdoc \ + websubmit-internals.webdoc \ + websubmit-file-converter.webdoc \ + websubmit-file-stamper.webdoc \ + websubmit-icon-creator.webdoc EXTRA_DIST = $(webdoclib_DATA) CLEANFILES = *~ *.tmp diff --git a/modules/websubmit/doc/hacking/websubmit-bibdocfile.webdoc b/modules/websubmit/doc/hacking/websubmit-bibdocfile.webdoc index 542ca367b..34e0e8df9 100644 --- a/modules/websubmit/doc/hacking/websubmit-bibdocfile.webdoc +++ b/modules/websubmit/doc/hacking/websubmit-bibdocfile.webdoc @@ -1,44 +1,43 @@ ## -*- mode: html; coding: utf-8; -*- ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - - - + + -

    BibDocFile library handles every interaction relation between records and related fulltext document files.

    +

    The BibDocFile library handles every interaction relation between records and related fulltext document files.

    Nomenclature

    record
    the unit of information within CDS Invenio. It is constituted by all the MARC metadata and have unique integer called Record ID.
    fulltext
    is a physical file connected to a record and often described by the record it self.
    bibdoc
    is an abstract document related to a record. It has a unique docname within the record, and is linked with multiple format and revision of fulltext files.
    format (or extension)
    is the extension associated with a physical file. Within a bibdoc, for a given version it can exist at most one file with a given format (e.g. .gif, .jpeg...)
    version (or revision)
    a progressive integer associated with a fulltext. The higher, the recent. Usually previous versions of a file are hidden.

    -

    API

    -

    Given a record_id, BibRecDocs(record_id) will be an object useful to represent all the bibdocs connected to a record.

    +

    Python API

    +

    Given a record_id, BibRecDocs(record_id) will be an object useful to represent all the bibdocs connected to a record.

    -

    Given a record_id and a docname or a document_id, BibDoc(recid=record_id, docname=docname) or BibDoc(docid=document_id) will be an object useful to represent all the possible version and formats of a document connected to a record.

    +

    Given a record_id and a docname or a document_id, BibDoc(recid=record_id, docname=docname) or BibDoc(docid=document_id) will be an object useful to represent all the possible version and formats of a document connected to a record.

    -

    By properly querying BibRecDocs and BibDoc you can obtain a BibDocFile. This is an object representing all the possible details related a single physical file, like a comment, a description, the path, the related URL, the size, the format, the version, the checksum, the protection...

    +

    By properly querying BibRecDocs and BibDoc you can obtain a BibDocFile. This is an object representing all the possible details related a single physical file, like a comment, a description, the path, the related URL, the size, the format, the version, the checksum, the protection...

    -

    Please refer to bibdocfile.py for a complete and up-to-date description of the API.

    \ No newline at end of file +

    See bibdocfile API for a complete API description.

    diff --git a/modules/websubmit/doc/hacking/websubmit-file-converter.webdoc b/modules/websubmit/doc/hacking/websubmit-file-converter.webdoc new file mode 100644 index 000000000..5a57053c7 --- /dev/null +++ b/modules/websubmit/doc/hacking/websubmit-file-converter.webdoc @@ -0,0 +1,89 @@ +## -*- mode: html; coding: utf-8; -*- +## This file is part of CDS Invenio. +## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. +## +## CDS Invenio is free software; you can redistribute it and/or +## modify it under the terms of the GNU General Public License as +## published by the Free Software Foundation; either version 2 of the +## License, or (at your option) any later version. +## +## CDS Invenio is distributed in the hope that it will be useful, but +## WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +## General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., +## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. + + + + + +

    The WebSubmit Conversion Tools library (websubmit_file_converter.py) let you convert from a fulltext format into an other and to perform OCR.

    + +

    Python API

    +
    +def get_best_format_to_extract_text_from(filelist, best_formats=CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM):
    +    """
    +    Return among the filelist the best file whose format is best suited for
    +    extracting text.
    +    """
    +
    +def get_missing_formats(filelist, desired_conversion=CFG_WEBSUBMIT_DESIRED_CONVERSIONS):
    +    """Given a list of files it will return a dictionary of the form:
    +    file1 : missing formats to generate from it...
    +    """
    +
    +def can_convert(input_format, output_format, max_intermediate_conversions=2):
    +    """Return the chain of conversion to transform input_format into output_format, if any."""
    +
    +def can_pdfopt():
    +    """Return True if it's possible to optimize PDFs."""
    +
    +def can_pdfa():
    +    """Return True if it's possible to generate PDF/As."""
    +
    +def can_perform_ocr():
    +    """Return True if it's possible to perform OCR."""
    +
    +def can_spell_check(ln='en'):
    +    """Return True if it's possible to perform spell checking."""
    +
    +def guess_is_OCR_needed(input_file, ln='en'):
    +    """
    +    Tries to see if enough text is retrievable from input_file.
    +    Return True if OCR is needed, False if it's already
    +    possible to retrieve information from the document.
    +    """
    +    output_file = convert_file(input_file, format='.txt', perform_ocr=False)
    +
    +def convert_file(input_file, output_file=None, format=None, **params):
    +    """
    +    Convert files from one format to another.
    +    @param input_file [string] the path to an existing file
    +    @param output_file [string] the path to the desired ouput. (if None a
    +        temporary file is generated)
    +    @param format [string] the desired format (if None it is taken from
    +        output_file)
    +    @param params other paramaters to pass to the particular converter
    +    @return [string] the final output_file
    +    """
    +
    +def pdf2hocr2pdf(input_file, output_file=None, font="Courier", author=None, keywords=None, subject=None, title=None, draft=False, ln='en', pdfopt=True, **args):
    +    """
    +    Transform a scanned PDF into a PDF with OCRed text.
    +    @param font the default font (e.g. Courier, Times-Roman).
    +    @param author the author name.
    +    @param subject the subject of the document.
    +    @param title the title of the document.
    +    @param draft whether to enable debug information in the output.
    +    @param ln is a two letter language code to give the OCR tool a hint.
    +    """
    +    input_file, output_hocr_file, dummy = prepare_io(input_file, output_ext='.hocr', need_working_dir=False)
    +    output_hocr_file, working_dir = pdf2hocr(input_file, output_file=output_hocr_file, ln=ln, return_working_dir=True)
    +    output_file = hocr2pdf(output_hocr_file, output_file, working_dir, font=font, author=author, keywords=keywords, subject=subject, title=title, draft=draft)
    +    clean_working_dir(working_dir)
    +    return output_file
    +
    +

    See websubmit_file_converter API for a complete API description.

    \ No newline at end of file diff --git a/modules/websubmit/doc/hacking/websubmit-file-stamper.webdoc b/modules/websubmit/doc/hacking/websubmit-file-stamper.webdoc new file mode 100644 index 000000000..43b3965e1 --- /dev/null +++ b/modules/websubmit/doc/hacking/websubmit-file-stamper.webdoc @@ -0,0 +1,72 @@ +## -*- mode: html; coding: utf-8; -*- +## This file is part of CDS Invenio. +## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. +## +## CDS Invenio is free software; you can redistribute it and/or +## modify it under the terms of the GNU General Public License as +## published by the Free Software Foundation; either version 2 of the +## License, or (at your option) any later version. +## +## CDS Invenio is distributed in the hope that it will be useful, but +## WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +## General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., +## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. + + + + + +

    The WebSubmit File Stamper library (websubmit_file_stamper.py) let you stamps your PDFs.

    + +

    Python API

    +
    +def stamp_file(options):
    +    """The driver for the stamping process. This is effectively the function
    +       that is responsible for coordinating the stamping of a file.
    +       @param options: (dictionary) - a dictionary of options that are required
    +        by the function in order to carry out the stamping process.
    +
    +        The dictionary must have the following structure:
    +           + latex-template: (string) - the path to the LaTeX template to be
    +              used for the creation of the stamp itself;
    +           + latex-template-var: (dictionary) - This dictionary contains
    +              variables that should be sought in the LaTeX template file, and
    +              the values that should be substituted in their place. E.g.:
    +                    { "TITLE" : "An Introduction to CDS Invenio" }
    +           + input-file: (string) - the path to the input file (i.e. that
    +              which is to be stamped;
    +           + output-file: (string) - the name of the stamped file that should
    +              be created by the program. This is optional - if not provided,
    +              a default name will be applied to a file instead;
    +           + stamp: (string) - the type of stamp that is to be applied to the
    +              input file. It must take one of 3 values:
    +                    - "first": Stamp only the first page of the document;
    +                    - "all": Apply the stamp to all pages of the document;
    +                    - "coverpage": Add a "cover page" to the document;
    +           + verbosity: (integer) - the verbosity level under which the program
    +              is to run;
    +        So, an example of the returned dictionary would be something like:
    +              { 'latex-template'      : "demo-stamp-left.tex",
    +                'latex-template-var'  : { "REPORTNUMBER" : "TEST-2008-001",
    +                                          "DATE"         : "15/02/2008",
    +                                        },
    +                'input-file'          : "test-doc.pdf",
    +                'output-file'         : "",
    +                'stamp'               : "first",
    +                'verbosity'           : 0,
    +              }
    +
    +       @return: (tuple) - consisting of two strings:
    +          1. the path to the working directory in which all stamping-related
    +              files are stored;
    +          2. The name of the "stamped" file;
    +       @Exceptions raised: (InvenioWebSubmitFileStamperError) exceptions may
    +        be raised or propagated by this function when the stamping process
    +        fails for one reason or another.
    +    """
    +
    +

    See websubmit_file_stamper API for a complete API description.

    \ No newline at end of file diff --git a/modules/websubmit/doc/hacking/websubmit-icon-creator.webdoc b/modules/websubmit/doc/hacking/websubmit-icon-creator.webdoc new file mode 100644 index 000000000..264bddb49 --- /dev/null +++ b/modules/websubmit/doc/hacking/websubmit-icon-creator.webdoc @@ -0,0 +1,82 @@ +## -*- mode: html; coding: utf-8; -*- +## This file is part of CDS Invenio. +## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. +## +## CDS Invenio is free software; you can redistribute it and/or +## modify it under the terms of the GNU General Public License as +## published by the Free Software Foundation; either version 2 of the +## License, or (at your option) any later version. +## +## CDS Invenio is distributed in the hope that it will be useful, but +## WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +## General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., +## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. + + + + + +

    The WebSubmit Icon Creator library (websubmit_icon_creator.py) handles icon creation based on existing fulltext files.

    + +

    Python API

    +
    +def create_icon(options):
    +    """The driver for the icon creation process. This is effectively the
    +       function that is responsible for coordinating the icon creation.
    +       It is the API for the creation of an icon.
    +       @param options: (dictionary) - a dictionary of options that are required
    +        by the function in order to carry out the icon-creation process.
    +
    +        The dictionary must have the following structure:
    +           + input-file: (string) - the path to the input file (i.e. that
    +              which is to be stamped;
    +           + icon-name: (string) - the name of the icon that is to be created
    +              by the program. This is optional - if not provided,
    +              a default name will be applied to the icon file instead;
    +           + multipage-icon: (boolean) - used only when the original file
    +              is a PDF or PS file. If False, the created icon will feature ONLY
    +              the first page of the PDF. If True, ALL pages of the PDF will
    +              be included in the created icon. Note: If the icon type is not
    +              gif, this flag will be forced as False.
    +           + multipage-icon-delay: (integer) - used only when the original
    +              file is a PDF or PS AND use-first-page-only is False AND
    +              the icon type is gif.
    +              This allows the user to specify the delay between "pages"
    +              of a multi-page (animated) icon.
    +           + icon-scale: ('geometry') - the scaling information to be used for the
    +              creation of the new icon. Type 'geometry' as defined in ImageMagick.
    +              (eg. 320 or 320x240 or 100> or 5%)
    +           + icon-file-format: (string) - the file format of the icon that is
    +              to be created. Legal values are:
    +              * pdf
    +              * gif
    +              * jpg
    +              * jpeg
    +              * ps
    +              * png
    +              * bmp
    +           + verbosity: (integer) - the verbosity level under which the program
    +              is to run;
    +        So, an example of the returned dictionary could be something like:
    +              { 'input-file'           : "demo-picture-file.jpg",
    +                'icon-name'            : "icon-demo-picture-file",
    +                'icon-file-format'     : "gif",
    +                'multipage-icon'       : True,
    +                'multipage-icon-delay' : 100,
    +                'icon-scale'           : 180,
    +                'verbosity'            : 0,
    +              }
    +       @return: (tuple) - consisting of two strings:
    +          1. the path to the working directory in which all files related to
    +              icon creation are stored;
    +          2. The name of the "icon" file;
    +       @Exceptions raised: (InvenioWebSubmitIconCreatorError)
    +        be raised or propagated by this function when the icon creation process
    +        fails for one reason or another.
    +    """
    +
    +

    See websubmit_icon_creator API for a complete API description.

    \ No newline at end of file diff --git a/modules/webhelp/web/admin/howto/howto.webdoc b/modules/websubmit/doc/hacking/websubmit-internals.webdoc similarity index 52% copy from modules/webhelp/web/admin/howto/howto.webdoc copy to modules/websubmit/doc/hacking/websubmit-internals.webdoc index 1cdad6066..ff2de2c97 100644 --- a/modules/webhelp/web/admin/howto/howto.webdoc +++ b/modules/websubmit/doc/hacking/websubmit-internals.webdoc @@ -1,48 +1,41 @@ +## -*- mode: html; coding: utf-8; -*- + ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - - + + -

    The HOWTO guides will give you both short and not-so-short -recipes and thoughts on some of the most frequently encountered -administrative tasks.

    +This page summarizes all the information suitable to dig inside +the WebSubmit internals.
    +
    +
    Fulltext manipulation API
    +
    Explains how to manipulate fulltext files within CDS Invenio.
    -
    - -
    HOWTO MARC - -
    Describes how to choose the MARC representation of your metadata -and how it will be stored in CDS Invenio. - -
    HOWTO Migrate +
    Conversion tools
    +
    Explains how to convert from a file format to an other, and how to perform OCR.
    -
    Describes how to migrate a bunch of your old data from any format -you might have into CDS Invenio. +
    Stamping fulltextes
    +
    Explains how to stamp fulltextes.
    -
    HOWTO Run - -
    Describes how to run your CDS Invenio installation and how to take -care of its normal operation day by day. - -
    +
    Icon creation tools
    +
    Explains how to create icons from fulltextes.
    +
    - -

    Haven't found what you were looking for? Suggest a HOWTO.>>>) % Customize. + /DOCINFO pdfmark + +% Define an ICC profile : + +[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark +[{icc_PDFA} <> /PUT pdfmark +[{icc_PDFA} ICCProfile (r) file /PUT pdfmark + +% Define the output intent dictionary : + +[/_objdef {OutputIntent_PDFA} /type /dict /OBJ pdfmark +[{OutputIntent_PDFA} << + /Type /OutputIntent % Must be so (the standard requires). + /S /GTS_PDFA1 % Must be so (the standard requires). + /DestOutputProfile {icc_PDFA} % Must be so (see above). + /OutputConditionIdentifier (CGATS TR001) % Customize +>> /PUT pdfmark +[{Catalog} <> /PUT pdfmark diff --git a/modules/websubmit/etc/deskew.lua b/modules/websubmit/etc/deskew.lua new file mode 100644 index 000000000..fb1ec5c38 --- /dev/null +++ b/modules/websubmit/etc/deskew.lua @@ -0,0 +1,15 @@ +-- FIXME this should do the right thing for grayscale images: +-- threshold, compute rotation, then rotate the grayscale image + +if #arg < 2 then + print("usage: ... input output") + os.exit(1) +end + +proc = ocr.make_DeskewPageByRAST() + +input = bytearray:new() +output = bytearray:new() +iulib.read_image_gray(input,arg[1]) +proc:cleanup(output,input) +iulib.write_image_gray(arg[2],output) diff --git a/modules/websubmit/lib/Makefile.am b/modules/websubmit/lib/Makefile.am index 19e171b72..db2d2f8dc 100644 --- a/modules/websubmit/lib/Makefile.am +++ b/modules/websubmit/lib/Makefile.am @@ -1,42 +1,45 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. SUBDIRS = functions pylibdir = $(libdir)/python/invenio -pylib_DATA = websubmit_config.py websubmit_engine.py file.py \ +pylib_DATA = websubmit_config.py websubmit_engine.py file.py \ websubmit_dblayer.py \ websubmit_webinterface.py \ websubmit_templates.py \ websubmit_regression_tests.py \ websubmitadmin_config.py \ websubmitadmin_dblayer.py \ websubmitadmin_engine.py \ websubmitadmin_templates.py \ websubmitadmin_regression_tests.py \ websubmit_file_stamper.py \ websubmit_icon_creator.py \ + websubmit_file_converter.py \ + unoconv.py \ bibdocfile.py \ bibdocfilecli.py \ - bibdocfile_regression_tests.py + bibdocfile_regression_tests.py \ + hocrlib.py -noinst_DATA = fulltext_files_migration_kit.py +noinst_DATA = fulltext_files_migration_kit.py icon_migration_kit.py EXTRA_DIST = $(pylib_DATA) $(noinst_DATA) CLEANFILES = *~ *.tmp *.pyc diff --git a/modules/websubmit/lib/bibdocfile.py b/modules/websubmit/lib/bibdocfile.py index aae64a27d..27be3cd6d 100644 --- a/modules/websubmit/lib/bibdocfile.py +++ b/modules/websubmit/lib/bibdocfile.py @@ -1,2544 +1,3652 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. +""" +This module implements the low-level API for dealing with fulltext files. + - All the files associated to a I{record} (identified by a I{recid}) can be + managed via an instance of the C{BibRecDocs} class. + - A C{BibRecDocs} is a wrapper of the list of I{documents} attached to the + record. + - Each document is represented by an instance of the C{BibDoc} class. + - A document is identified by a C{docid} and name (C{docname}). The docname + must be unique within the record. A document is the set of all the + formats and revisions of a piece of information. + - A document has a type called C{doctype} and can have a restriction. + - Each physical file, i.e. the concretization of a document into a + particular I{version} and I{format} is represented by an instance of the + C{BibDocFile} class. + - The format is infact the extension of the physical file. + - A comment and a description and other information can be associated to a + BibDocFile. + - A C{bibdoc} is a synonim for a document, while a C{bibdocfile} is a + synonim for a physical file. + +@group Main classes: BibRecDocs,BibDoc,BibDocFile +@group Other classes: BibDocMoreInfo,Md5Folder,InvenioWebSubmitFileError +@group Main functions: decompose_file,stream_file,bibdocfile_*,download_url +@group Configuration Variables: CFG_* +""" + __revision__ = "$Id$" import os import re import shutil import filecmp import time import random import socket import urllib2 import urllib import tempfile import cPickle import base64 import binascii import cgi import sys if sys.hexversion < 0x2060000: from md5 import md5 else: from hashlib import md5 try: import magic CFG_HAS_MAGIC = True except ImportError: CFG_HAS_MAGIC = False from datetime import datetime from mimetypes import MimeTypes from thread import get_ident from invenio import webinterface_handler_wsgi_utils as apache ## Let's set a reasonable timeout for URL request (e.g. FFT) socket.setdefaulttimeout(40) if sys.hexversion < 0x2040000: # pylint: disable-msg=W0622 from sets import Set as set # pylint: enable-msg=W0622 from invenio.shellutils import escape_shell_arg from invenio.dbquery import run_sql, DatabaseError, blob_to_string from invenio.errorlib import register_exception from invenio.bibrecord import record_get_field_instances, \ field_get_subfield_values, field_get_subfield_instances, \ encode_for_xml +from invenio.urlutils import create_url +from invenio.textutils import nice_size from invenio.access_control_engine import acc_authorize_action from invenio.config import CFG_SITE_LANG, CFG_SITE_URL, \ CFG_WEBDIR, CFG_WEBSUBMIT_FILEDIR,\ CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS, \ CFG_WEBSUBMIT_FILESYSTEM_BIBDOC_GROUP_LIMIT, CFG_SITE_SECURE_URL, \ CFG_BIBUPLOAD_FFT_ALLOWED_LOCAL_PATHS, \ CFG_TMPDIR, CFG_PATH_MD5SUM, \ CFG_WEBSUBMIT_STORAGEDIR, \ CFG_BIBDOCFILE_USE_XSENDFILE, \ CFG_BIBDOCFILE_MD5_CHECK_PROBABILITY -#from invenio.bibformat import format_record +from invenio.websubmit_config import CFG_WEBSUBMIT_ICON_SUBFORMAT_RE, \ + CFG_WEBSUBMIT_DEFAULT_ICON_SUBFORMAT +from invenio.bibformat import format_record import invenio.template websubmit_templates = invenio.template.load('websubmit') websearch_templates = invenio.template.load('websearch') +#: block size when performing I/O. +CFG_BIBDOCFILE_BLOCK_SIZE = 1024 * 8 + +#: threshold used do decide when to use Python MD5 of CLI MD5 algorithm. CFG_BIBDOCFILE_MD5_THRESHOLD = 256 * 1024 + +#: chunks loaded by the Python MD5 algorithm. CFG_BIBDOCFILE_MD5_BUFFER = 1024 * 1024 + +#: whether to normalize e.g. ".JPEG" and ".jpg" into .jpeg. CFG_BIBDOCFILE_STRONG_FORMAT_NORMALIZATION = False +#: flags that can be associated to files. +CFG_BIBDOCFILE_AVAILABLE_FLAGS = ( + 'PDF/A', + 'STAMPED', + 'PDFOPT', + 'HIDDEN', + 'CONVERTED', + 'PERFORM_HIDE_PREVIOUS', + 'OCRED' +) + +#: constant used if FFT correct with the obvious meaning. KEEP_OLD_VALUE = 'KEEP-OLD-VALUE' + _mimes = MimeTypes(strict=False) _mimes.suffix_map.update({'.tbz2' : '.tar.bz2'}) _mimes.encodings_map.update({'.bz2' : 'bzip2'}) _magic_cookies = {} -def get_magic_cookies(): - """Return a tuple of magic object. - ... not real magic. Just see: man file(1)""" +def _get_magic_cookies(): + """ + @return: a tuple of magic object. + @rtype: (MAGIC_NONE, MAGIC_COMPRESS, MAGIC_MIME, MAGIC_COMPRESS + MAGIC_MIME) + @note: ... not real magic. Just see: man file(1) + """ thread_id = get_ident() if thread_id not in _magic_cookies: _magic_cookies[thread_id] = { magic.MAGIC_NONE : magic.open(magic.MAGIC_NONE), magic.MAGIC_COMPRESS : magic.open(magic.MAGIC_COMPRESS), magic.MAGIC_MIME : magic.open(magic.MAGIC_MIME), magic.MAGIC_COMPRESS + magic.MAGIC_MIME : magic.open(magic.MAGIC_COMPRESS + magic.MAGIC_MIME) } for key in _magic_cookies[thread_id].keys(): _magic_cookies[thread_id][key].load() return _magic_cookies[thread_id] def _generate_extensions(): + """ + Generate the regular expression to match all the known extensions. + + @return: the regular expression. + @rtype: regular expression object + """ _tmp_extensions = _mimes.encodings_map.keys() + \ _mimes.suffix_map.keys() + \ _mimes.types_map[1].keys() + \ CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS extensions = [] for ext in _tmp_extensions: if ext.startswith('.'): extensions.append(ext) else: extensions.append('.' + ext) extensions.sort() extensions.reverse() extensions = set([ext.lower() for ext in extensions]) extensions = '\\' + '$|\\'.join(extensions) + '$' extensions = extensions.replace('+', '\\+') return re.compile(extensions, re.I) +#: Regular expression to recognized extensions. _extensions = _generate_extensions() - class InvenioWebSubmitFileError(Exception): + """ + Exception raised in case of errors related to fulltext files. + """ pass -def file_strip_ext(afile, skip_version=False): - """Strip in the best way the extension from a filename""" - if skip_version: +def file_strip_ext(afile, skip_version=False, only_known_extensions=False, allow_subformat=True): + """ + Strip in the best way the extension from a filename. + + >>> file_strip_ext("foo.tar.gz") + 'foo' + >>> file_strip_ext("foo.buz.gz") + 'foo.buz' + >>> file_strip_ext("foo.buz") + 'foo' + >>> file_strip_ext("foo.buz", only_known_extensions=True) + 'foo.buz' + >>> file_strip_ext("foo.buz;1", skip_version=False, + ... only_known_extensions=True) + 'foo.buz;1' + >>> file_strip_ext("foo.gif;icon") + 'foo' + >>> file_strip_ext("foo.gif:icon", allow_subformat=False) + 'foo.gif:icon' + + @param afile: the path/name of a file. + @type afile: string + @param skip_version: whether to skip a trailing ";version". + @type skip_version: bool + @param only_known_extensions: whether to strip out only known extensions or + to consider as extension anything that follows a dot. + @type only_known_extensions: bool + @param allow_subformat: whether to consider also subformats as part of + the extension. + @type allow_subformat: bool + @return: the name/path without the extension (and version). + @rtype: string + """ + if skip_version or allow_subformat: afile = afile.split(';')[0] nextfile = _extensions.sub('', afile) - if nextfile == afile: + if nextfile == afile and not only_known_extensions: nextfile = os.path.splitext(afile)[0] while nextfile != afile: afile = nextfile nextfile = _extensions.sub('', afile) return nextfile -def normalize_format(format): - """Normalize the format.""" +def normalize_format(format, allow_subformat=True): + """ + Normalize the format, e.g. by adding a dot in front. + + @param format: the format/extension to be normalized. + @type format: string + @param allow_subformat: whether to consider also subformats as part of + the extension. + @type allow_subformat: bool + @return: the normalized format. + @rtype; string + """ + if allow_subformat: + subformat = format[format.rfind(';'):] + format = format[:format.rfind(';')] + else: + subformat = '' if format and format[0] != '.': format = '.' + format if CFG_BIBDOCFILE_STRONG_FORMAT_NORMALIZATION: if format not in ('.Z', '.H', '.C', '.CC'): format = format.lower() format = { '.jpg' : '.jpeg', '.htm' : '.html', '.tif' : '.tiff' }.get(format, format) - return format + return format + subformat + +def guess_format_from_url(url): + """ + Given a URL tries to guess it's extension. + + Different method will be used, including HTTP HEAD query, + downloading the resource and using mime + + @param url: the URL for which the extension shuld be guessed. + @type url: string + @return: the recognized extension or empty string if it's impossible to + recognize it. + @rtype: string + """ + def parse_content_disposition(text): + for item in text.split(';'): + item = item.strip() + if item.strip().startswith('filename='): + return item[len('filename="'):-len('"')] + + def parse_content_type(text): + return text.split(';')[0].strip() + + ## Let's try to guess the extension by considering the URL as a filename + ext = decompose_file(url, skip_version=True, only_known_extensions=True)[2] + if ext.startswith('.'): + return ext + + if is_url_a_local_file(url) and CFG_HAS_MAGIC: + ## if the URL corresponds to a local file, let's try to use + ## the Python magic library to guess it + try: + magic_cookie = _get_magic_cookies()[magic.MAGIC_MIME] + mimetype = magic_cookie.file(url) + ext = _mimes.guess_extension(mimetype) + if ext: + return normalize_format(ext) + except Exception: + pass + else: + ## Since the URL is remote, let's try to perform a HEAD request + ## and see the corresponding headers + info = urllib2.urlopen(url).info() + content_disposition = info.getheader('Content-Disposition') + if content_disposition: + filename = parse_content_disposition(content_disposition) + if filename: + return decompose_file(filename)[2] + content_type = info.getheader('Content-Type') + if content_type: + content_type = parse_content_type(content_type) + ext = _mimes.guess_extension(content_type) + if ext: + return normalize_format(ext) + if CFG_HAS_MAGIC: + ## Last solution: let's download the remote resource + ## and use the Python magic library to guess the extension + try: + filename = download_url(url, format='') + magic_cookie = _get_magic_cookies()[magic.MAGIC_MIME] + mimetype = magic_cookie.file(filename) + os.remove(filename) + ext = _mimes.guess_extension(content_type) + if ext: + return normalize_format(ext) + except Exception: + pass + return "" _docname_re = re.compile(r'[^-\w.]*') def normalize_docname(docname): - """Normalize the docname (only digit and alphabetic letters and underscore are allowed)""" + """ + Normalize the docname. + + At the moment the normalization is just returning the same string. + + @param docname: the docname to be normalized. + @type docname: string + @return: the normalized docname. + @rtype: string + """ #return _docname_re.sub('', docname) return docname def normalize_version(version): - """Normalize the version.""" + """ + Normalize the version. + + The version can be either an integer or the keyword 'all'. Any other + value will be transformed into the empty string. + + @param version: the version (either a number or 'all'). + @type version: integer or string + @return: the normalized version. + @rtype: string + """ try: int(version) except ValueError: if version.lower().strip() == 'all': return 'all' else: return '' return str(version) -def decompose_file(afile, skip_version=False): - """Decompose a file into dirname, basename and extension. - Note that if provided with a URL, the scheme in front will be part - of the dirname.""" +def decompose_file(afile, skip_version=False, only_known_extensions=False, + allow_subformat=True): + """ + Decompose a file/path into its components dirname, basename and extension. + + >>> decompose_file('/tmp/foo.tar.gz') + ('/tmp', 'foo', '.tar.gz') + >>> decompose_file('/tmp/foo.tar.gz;1', skip_version=True) + ('/tmp', 'foo', '.tar.gz') + >>> decompose_file('http://www.google.com/index.html') + ('http://www.google.com', 'index', '.html') + + @param afile: the path/name of a file. + @type afile: string + @param skip_version: whether to skip a trailing ";version". + @type skip_version: bool + @param only_known_extensions: whether to strip out only known extensions or + to consider as extension anything that follows a dot. + @type only_known_extensions: bool + @param allow_subformat: whether to consider also subformats as part of + the extension. + @type allow_subformat: bool + @return: a tuple with the directory name, the docname and extension. + @rtype: (dirname, docname, extension) + + @note: if a URL is provided, the scheme will be part of the dirname. + @see: L{file_strip_ext} for the algorithm used to retrieve the extension. + """ if skip_version: version = afile.split(';')[-1] try: int(version) afile = afile[:-len(version)-1] except ValueError: pass basename = os.path.basename(afile) dirname = afile[:-len(basename)-1] - base = file_strip_ext(basename) + base = file_strip_ext( + basename, + only_known_extensions=only_known_extensions, + allow_subformat=allow_subformat) extension = basename[len(base) + 1:] if extension: extension = '.' + extension return (dirname, base, extension) def decompose_file_with_version(afile): - """Decompose a file into dirname, basename, extension and version. - In case version does not exist it will raise ValueError. - Note that if provided with a URL, the scheme in front will be part - of the dirname.""" + """ + Decompose a file into dirname, basename, extension and version. + + >>> decompose_file_with_version('/tmp/foo.tar.gz;1') + ('/tmp', 'foo', '.tar.gz', 1) + + @param afile: the path/name of a file. + @type afile: string + @return: a tuple with the directory name, the docname, extension and + version. + @rtype: (dirname, docname, extension, version) + + @raise ValueError: in case version does not exist it will. + @note: if a URL is provided, the scheme will be part of the dirname. + """ version_str = afile.split(';')[-1] version = int(version_str) afile = afile[:-len(version_str)-1] basename = os.path.basename(afile) dirname = afile[:-len(basename)-1] base = file_strip_ext(basename) extension = basename[len(base) + 1:] if extension: extension = '.' + extension return (dirname, base, extension, version) +def get_subformat_from_format(format): + """ + @return the subformat if any. + @rtype: string + >>> get_superformat_from_format('foo;bar') + 'bar' + >>> get_superformat_from_format('foo') + '' + """ + try: + return format[format.rindex(';') + 1:] + except ValueError: + return '' + +def get_superformat_from_format(format): + """ + @return the superformat if any. + @rtype: string + + >>> get_superformat_from_format('foo;bar') + 'foo' + >>> get_superformat_from_format('foo') + 'foo' + """ + try: + return format[:format.rindex(';')] + except ValueError: + return format def propose_next_docname(docname): - """Propose a next docname docname""" + """ + Given a I{docname}, suggest a new I{docname} (useful when trying to generate + a unique I{docname}). + + >>> propose_next_docname('foo') + 'foo_1' + >>> propose_next_docname('foo_1') + 'foo_2' + >>> propose_next_docname('foo_10') + 'foo_11' + + @param docname: the base docname. + @type docname: string + @return: the next possible docname based on the given one. + @rtype: string + """ if '_' in docname: split_docname = docname.split('_') try: split_docname[-1] = str(int(split_docname[-1]) + 1) docname = '_'.join(split_docname) except ValueError: docname += '_1' else: docname += '_1' return docname class BibRecDocs: - """this class represents all the files attached to one record""" + """ + This class represents all the files attached to one record. + + @param recid: the record identifier. + @type recid: integer + @param deleted_too: whether to consider deleted documents as normal + documents (useful when trying to recover deleted information). + @type deleted_too: bool + @param human_readable: whether numbers should be printed in human readable + format (e.g. 2048 bytes -> 2Kb) + @ivar id: the record identifier as passed to the constructor. + @type id: integer + @ivar human_readable: the human_readable flag as passed to the constructor. + @type human_readable: bool + @ivar deleted_too: the deleted_too flag as passed to the constructor. + @type deleted_too: bool + @ivar bibdocs: the list of documents attached to the record. + @type bibdocs: list of BibDoc + """ def __init__(self, recid, deleted_too=False, human_readable=False): self.id = recid self.human_readable = human_readable self.deleted_too = deleted_too self.bibdocs = [] self.build_bibdoc_list() def __repr__(self): - if self.deleted_too: - return 'BibRecDocs(%s, True)' % self.id - else: - return 'BibRecDocs(%s)' % self.id + """ + @return: the canonical string representation of the C{BibRecDocs}. + @rtype: string + """ + return 'BibRecDocs(%s%s%s)' % (self.id, + self.deleted_too and ', True' or '', + self.human_readable and ', True' or '' + ) def __str__(self): + """ + @return: an easy to be I{grepped} string representation of the + whole C{BibRecDocs} content. + @rtype: string + """ out = '%i::::total bibdocs attached=%i\n' % (self.id, len(self.bibdocs)) out += '%i::::total size latest version=%s\n' % (self.id, nice_size(self.get_total_size_latest_version())) out += '%i::::total size all files=%s\n' % (self.id, nice_size(self.get_total_size())) for bibdoc in self.bibdocs: out += str(bibdoc) return out def empty_p(self): - """Return True if the bibrec is empty, i.e. it has no bibdocs - connected.""" + """ + @return: True when the record has no attached documents. + @rtype: bool + """ return len(self.bibdocs) == 0 def deleted_p(self): - """Return True if the bibrec has been deleted.""" + """ + @return: True if the corresponding record has been deleted. + @rtype: bool + """ from invenio.search_engine import record_exists return record_exists(self.id) == -1 def get_xml_8564(self): - """Return a snippet of XML representing the 8564 corresponding to the - current state""" + """ + Return a snippet of I{MARCXML} representing the I{8564} fields + corresponding to the current state. + + @return: the MARCXML representation. + @rtype: string + """ from invenio.search_engine import get_record out = '' record = get_record(self.id) fields = record_get_field_instances(record, '856', '4', ' ') for field in fields: - url = field_get_subfield_values(field, 'u') - if not bibdocfile_url_p(url): + urls = field_get_subfield_values(field, 'u') + if urls and not bibdocfile_url_p(urls[0]): out += '\t\n' for subfield, value in field_get_subfield_instances(field): out += '\t\t%s\n' % (subfield, encode_for_xml(value)) out += '\t\n' - for afile in self.list_latest_files(): + for afile in self.list_latest_files(list_hidden=False): out += '\t\n' url = afile.get_url() description = afile.get_description() comment = afile.get_comment() if url: out += '\t\t%s\n' % encode_for_xml(url) if description: out += '\t\t%s\n' % encode_for_xml(description) if comment: out += '\t\t%s\n' % encode_for_xml(comment) out += '\t\n' - for bibdoc in self.bibdocs: - icon = bibdoc.get_icon() - if icon: - icon = icon.list_all_files() - if icon: - out += '\t\n' - out += '\t\t%s\n' % encode_for_xml(icon[0].get_url()) - out += '\t\ticon\n' - out += '\t\n' - return out def get_total_size_latest_version(self): - """Return the total size used on disk of all the files belonging - to this record and corresponding to the latest version.""" + """ + Returns the total size used on disk by all the files belonging + to this record and corresponding to the latest version. + + @return: the total size. + @rtype: integer + """ size = 0 for bibdoc in self.bibdocs: size += bibdoc.get_total_size_latest_version() return size def get_total_size(self): - """Return the total size used on disk of all the files belonging - to this record of any version.""" + """ + Return the total size used on disk of all the files belonging + to this record of any version (not only the last as in + L{get_total_size_latest_version}). + + @return: the total size. + @rtype: integer + """ size = 0 for bibdoc in self.bibdocs: size += bibdoc.get_total_size() return size def build_bibdoc_list(self): - """This function must be called everytime a bibdoc connected to this - recid is added, removed or modified. + """ + This method must be called everytime a I{bibdoc} is added, removed or + modified. """ self.bibdocs = [] if self.deleted_too: res = run_sql("""SELECT id_bibdoc, type FROM bibrec_bibdoc JOIN bibdoc ON id=id_bibdoc WHERE id_bibrec=%s ORDER BY docname ASC""", (self.id,)) else: res = run_sql("""SELECT id_bibdoc, type FROM bibrec_bibdoc JOIN bibdoc ON id=id_bibdoc WHERE id_bibrec=%s AND status<>'DELETED' ORDER BY docname ASC""", (self.id,)) for row in res: cur_doc = BibDoc(docid=row[0], recid=self.id, doctype=row[1], human_readable=self.human_readable) self.bibdocs.append(cur_doc) def list_bibdocs(self, doctype=''): - """Returns the list all bibdocs object belonging to a recid. - If doctype is set, it returns just the bibdocs of that doctype. + """ + Returns the list all bibdocs object belonging to a recid. + If C{doctype} is set, it returns just the bibdocs of that doctype. + + @param doctype: the optional doctype. + @type doctype: string + @return: the list of bibdocs. + @rtype: list of BibDoc """ if not doctype: return self.bibdocs else: return [bibdoc for bibdoc in self.bibdocs if doctype == bibdoc.doctype] def get_bibdoc_names(self, doctype=''): - """Returns the names of the files associated with the bibdoc of a - paritcular doctype""" + """ + Returns all the names of the documents associated with the bibdoc. + If C{doctype} is set, restrict the result to all the matching doctype. + + @param doctype: the optional doctype. + @type doctype: string + @return: the list of document names. + @rtype: list of string + """ return [bibdoc.docname for bibdoc in self.list_bibdocs(doctype)] def check_file_exists(self, path): - """Returns 1 if the recid has a file identical to the one stored in path.""" + """ + Check if a file with the same content of the file pointed in C{path} + is already attached to this record. + + @param path: the file to be checked against. + @type path: string + @return: True if a file with the requested content is already attached + to the record. + @rtype: bool + """ size = os.path.getsize(path) # Let's consider all the latest files files = self.list_latest_files() # Let's consider all the latest files with same size potential = [afile for afile in files if afile.get_size() == size] if potential: checksum = calculate_md5(path) # Let's consider all the latest files with the same size and the # same checksum potential = [afile for afile in potential if afile.get_checksum() == checksum] if potential: potential = [afile for afile in potential if filecmp.cmp(afile.get_full_path(), path)] if potential: return True else: # Gosh! How unlucky, same size, same checksum but not same # content! pass return False def propose_unique_docname(self, docname): - """Propose a unique docname.""" + """ + Given C{docname}, return a new docname that is not already attached to + the record. + + @param docname: the reference docname. + @type docname: string + @return: a docname not already attached. + @rtype: string + """ docname = normalize_docname(docname) goodname = docname i = 1 while goodname in self.get_bibdoc_names(): i += 1 goodname = "%s_%s" % (docname, i) return goodname def merge_bibdocs(self, docname1, docname2): - """This method merge docname2 into docname1. - Given all the formats of the latest version of docname2 the files - are added as new formats into docname1. - Docname2 is marked as deleted. - This method fails if at least one format in docname2 already exists - in docname1. (In this case the two bibdocs are preserved) - Comments and descriptions are also copied and if docname2 has an icon - and docname1 has not, the icon is imported. - If docname2 has a restriction(status) and docname1 has not the - restriction is imported.""" - + """ + This method merge C{docname2} into C{docname1}. + + 1. Given all the formats of the latest version of the files + attached to C{docname2}, these files are added as new formats + into C{docname1}. + 2. C{docname2} is marked as deleted. + + @raise InvenioWebSubmitFileError: if at least one format in C{docname2} + already exists in C{docname1}. (In this case the two bibdocs are + preserved) + @note: comments and descriptions are also copied. + @note: if C{docname2} has a I{restriction}(i.e. if the I{status} is + set) and C{docname1} doesn't, the restriction is imported. + """ bibdoc1 = self.get_bibdoc(docname1) bibdoc2 = self.get_bibdoc(docname2) ## Check for possibility for bibdocfile in bibdoc2.list_latest_files(): format = bibdocfile.get_format() if bibdoc1.format_already_exists_p(format): raise InvenioWebSubmitFileError('Format %s already exists in bibdoc %s of record %s. It\'s impossible to merge bibdoc %s into it.' % (format, docname1, self.id, docname2)) - ## Importing Icon if needed. - icon1 = bibdoc1.get_icon() - icon2 = bibdoc2.get_icon() - if icon2 is not None and icon1 is None: - icon = icon2.list_latest_files()[0] - bibdoc1.add_icon(icon.get_full_path(), format=icon.get_format()) - ## Importing restriction if needed. restriction1 = bibdoc1.get_status() restriction2 = bibdoc2.get_status() if restriction2 and not restriction1: bibdoc1.set_status(restriction2) ## Importing formats for bibdocfile in bibdoc2.list_latest_files(): format = bibdocfile.get_format() comment = bibdocfile.get_comment() description = bibdocfile.get_description() bibdoc1.add_file_new_format(bibdocfile.get_full_path(), description=description, comment=comment, format=format) ## Finally deleting old bibdoc2 bibdoc2.delete() self.build_bibdoc_list() def get_docid(self, docname): - """Returns the docid corresponding to the given docname, if the docname - is valid. + """ + @param docname: the document name. + @type docname: string + @return: the identifier corresponding to the given C{docname}. + @rtype: integer + @raise InvenioWebSubmitFileError: if the C{docname} does not + corresponds to a document attached to this record. """ for bibdoc in self.bibdocs: if bibdoc.docname == docname: return bibdoc.id raise InvenioWebSubmitFileError, "Recid '%s' is not connected with a " \ "docname '%s'" % (self.id, docname) def get_docname(self, docid): - """Returns the docname corresponding to the given docid, if the docid - is valid. + """ + @param docid: the document identifier. + @type docid: integer + @return: the name of the document corresponding to the given document + identifier. + @rtype: string + @raise InvenioWebSubmitFileError: if the C{docid} does not + corresponds to a document attached to this record. """ for bibdoc in self.bibdocs: if bibdoc.id == docid: return bibdoc.docname raise InvenioWebSubmitFileError, "Recid '%s' is not connected with a " \ "docid '%s'" % (self.id, docid) def has_docname_p(self, docname): - """Return True if a bibdoc with a particular docname belong to this - record.""" + """ + @param docname: the document name, + @type docname: string + @return: True if a document with the given name is attached to this + record. + @rtype: bool + """ for bibdoc in self.bibdocs: if bibdoc.docname == docname: return True return False def get_bibdoc(self, docname): - """Returns the bibdoc with a particular docname associated with + """ + @return: the bibdoc with a particular docname associated with this recid""" for bibdoc in self.bibdocs: if bibdoc.docname == docname: return bibdoc raise InvenioWebSubmitFileError, "Recid '%s' is not connected with " \ " docname '%s'" % (self.id, docname) def delete_bibdoc(self, docname): - """Deletes a docname associated with the recid.""" + """ + Deletes the document with the specified I{docname}. + + @param docname: the document name. + @type docname: string + """ for bibdoc in self.bibdocs: if bibdoc.docname == docname: bibdoc.delete() self.build_bibdoc_list() def add_bibdoc(self, doctype="Main", docname='file', never_fail=False): - """Creates a new bibdoc associated with the recid, with a file - called docname and a particular doctype. It returns the bibdoc object - which was just created. - If never_fail is True then the system will always be able - to create a bibdoc. + """ + Add a new empty document object (a I{bibdoc}) to the list of + documents of this record. + + @param doctype: the document type. + @type doctype: string + @param docname: the document name. + @type docname: string + @param never_fail: if True, this procedure will not fail, even if + a document with the given name is already attached to this + record. In this case a new name will be generated (see + L{propose_unique_docname}). + @type never_fail: bool + @return: the newly created document object. + @rtype: BibDoc + @raise InvenioWebSubmitFileError: in case of any error. """ try: docname = normalize_docname(docname) if never_fail: docname = self.propose_unique_docname(docname) if docname in self.get_bibdoc_names(): raise InvenioWebSubmitFileError, "%s has already a bibdoc with docname %s" % (self.id, docname) else: bibdoc = BibDoc(recid=self.id, doctype=doctype, docname=docname, human_readable=self.human_readable) self.build_bibdoc_list() return bibdoc except Exception, e: register_exception() raise InvenioWebSubmitFileError(str(e)) - def add_new_file(self, fullpath, doctype="Main", docname=None, never_fail=False, description=None, comment=None, format=None): - """Adds a new file with the following policy: if the docname is not set - it is retrieved from the name of the file. If bibdoc with the given - docname doesn't exist, it is created and the file is added to it. - It it exist but it doesn't contain the format that is being added, the - new format is added. If the format already exists then if never_fail - is True a new bibdoc is created with a similar name but with a progressive - number as a suffix and the file is added to it. The elaborated bibdoc - is returned. + def add_new_file(self, fullpath, doctype="Main", docname=None, never_fail=False, description=None, comment=None, format=None, flags=None): + """ + Directly add a new file to this record. + + Adds a new file with the following policy: + - if the C{docname} is not set it is retrieved from the name of the + file. + - If a bibdoc with the given docname doesn't already exist, it is + created and the file is added to it. + - It it exist but it doesn't contain the format that is being + added, the new format is added. + - If the format already exists then if C{never_fail} is True a new + bibdoc is created with a similar name but with a progressive + number as a suffix and the file is added to it (see + L{propose_unique_docname}). + + @param fullpath: the filesystme path of the document to be added. + @type fullpath: string + @param doctype: the type of the document. + @type doctype: string + @param docname: the document name. + @type docname: string + @param never_fail: if True, this procedure will not fail, even if + a document with the given name is already attached to this + record. In this case a new name will be generated (see + L{propose_unique_docname}). + @type never_fail: bool + @param description: an optional description of the file. + @type description: string + @param comment: an optional comment to the file. + @type comment: string + @param format: the extension of the file. If not specified it will + be guessed (see L{guess_format_from_url}). + @type format: string + @param flags: a set of flags to be associated with the file (see + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}) + @type flags: list of string + @return: the elaborated document object. + @rtype: BibDoc + @raise InvenioWebSubmitFileError: in case of error. """ if docname is None: docname = decompose_file(fullpath)[1] if format is None: format = decompose_file(fullpath)[2] docname = normalize_docname(docname) try: bibdoc = self.get_bibdoc(docname) except InvenioWebSubmitFileError: # bibdoc doesn't already exists! bibdoc = self.add_bibdoc(doctype, docname, False) - bibdoc.add_file_new_version(fullpath, description=description, comment=comment, format=format) + bibdoc.add_file_new_version(fullpath, description=description, comment=comment, format=format, flags=flags) self.build_bibdoc_list() else: try: - bibdoc.add_file_new_format(fullpath, description=description, comment=comment, format=format) + bibdoc.add_file_new_format(fullpath, description=description, comment=comment, format=format, flags=flags) self.build_bibdoc_list() except InvenioWebSubmitFileError, e: # Format already exist! if never_fail: bibdoc = self.add_bibdoc(doctype, docname, True) - bibdoc.add_file_new_version(fullpath, description=description, comment=comment, format=format) + bibdoc.add_file_new_version(fullpath, description=description, comment=comment, format=format, flags=flags) self.build_bibdoc_list() else: - raise e + raise return bibdoc - def add_new_version(self, fullpath, docname=None, description=None, comment=None, format=None, hide_previous_versions=False): - """Adds a new fullpath file to an already existent docid making the - previous files associated with the same bibdocids obsolete. - It returns the bibdoc object. + def add_new_version(self, fullpath, docname=None, description=None, comment=None, format=None, flags=None): + """ + Adds a new file to an already existent document object as a new + version. + + @param fullpath: the filesystem path of the file to be added. + @type fullpath: string + @param docname: the document name. If not specified it will be + extracted from C{fullpath} (see L{decompose_file}). + @type docname: string + @param description: an optional description for the file. + @type description: string + @param comment: an optional comment to the file. + @type comment: string + @param format: the extension of the file. If not specified it will + be guessed (see L{guess_format_from_url}). + @type format: string + @param flags: a set of flags to be associated with the file (see + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}) + @type flags: list of string + @return: the elaborated document object. + @rtype: BibDoc + @raise InvenioWebSubmitFileError: in case of error. + @note: previous files associated with the same document will be + considered obsolete. """ if docname is None: docname = decompose_file(fullpath)[1] if format is None: format = decompose_file(fullpath)[2] + if flags is None: + flags = [] bibdoc = self.get_bibdoc(docname=docname) - bibdoc.add_file_new_version(fullpath, description=description, comment=comment, format=format, hide_previous_versions=hide_previous_versions) + bibdoc.add_file_new_version(fullpath, description=description, comment=comment, format=format, flags=flags) self.build_bibdoc_list() return bibdoc - def add_new_format(self, fullpath, docname=None, description=None, comment=None, format=None): - """Adds a new format for a fullpath file to an already existent - docid along side already there files. - It returns the bibdoc object. + def add_new_format(self, fullpath, docname=None, description=None, comment=None, format=None, flags=None): + """ + Adds a new file to an already existent document object as a new + format. + + @param fullpath: the filesystem path of the file to be added. + @type fullpath: string + @param docname: the document name. If not specified it will be + extracted from C{fullpath} (see L{decompose_file}). + @type docname: string + @param description: an optional description for the file. + @type description: string + @param comment: an optional comment to the file. + @type comment: string + @param format: the extension of the file. If not specified it will + be guessed (see L{guess_format_from_url}). + @type format: string + @param flags: a set of flags to be associated with the file (see + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}) + @type flags: list of string + @return: the elaborated document object. + @rtype: BibDoc + @raise InvenioWebSubmitFileError: in case the same format already + exists. """ if docname is None: docname = decompose_file(fullpath)[1] if format is None: format = decompose_file(fullpath)[2] + if flags is None: + flags = [] bibdoc = self.get_bibdoc(docname=docname) - bibdoc.add_file_new_format(fullpath, description=description, comment=comment, format=format) + bibdoc.add_file_new_format(fullpath, description=description, comment=comment, format=format, flags=flags) self.build_bibdoc_list() return bibdoc - def list_latest_files(self, doctype=''): - """Returns a list which is made up by all the latest docfile of every - bibdoc (of a particular doctype). + def list_latest_files(self, doctype='', list_hidden=True): + """ + Returns a list of the latest files. + + @param doctype: if set, only document of the given type will be listed. + @type doctype: string + @param list_hidden: if True, will list also files with the C{HIDDEN} + flag being set. + @type list_hidden: bool + @return: the list of latest files. + @rtype: list of BibDocFile """ docfiles = [] for bibdoc in self.list_bibdocs(doctype): - docfiles += bibdoc.list_latest_files() + docfiles += bibdoc.list_latest_files(list_hidden=list_hidden) return docfiles def display(self, docname="", version="", doctype="", ln=CFG_SITE_LANG, verbose=0, display_hidden=True): - """Returns a formatted panel with information and links about a given - docid of a particular version (or any), of a particular doctype (or any) + """ + Returns an HTML representation of the the attached documents. + + @param docname: if set, include only the requested document. + @type docname: string + @param version: if not set, only the last version will be displayed. If + 'all', all versions will be displayed. + @type version: string (integer or 'all') + @param doctype: is set, include only documents of the requested type. + @type doctype: string + @param ln: the language code. + @type ln: string + @param verbose: if greater than 0, includes debug information. + @type verbose: integer + @param display_hidden: whether to include hidden files as well. + @type display_hidden: bool + @return: the formatted representation. + @rtype: HTML string """ t = "" if docname: try: bibdocs = [self.get_bibdoc(docname)] except InvenioWebSubmitFileError: bibdocs = self.list_bibdocs(doctype) else: bibdocs = self.list_bibdocs(doctype) if bibdocs: types = list_types_from_array(bibdocs) fulltypes = [] for mytype in types: fulltype = { 'name' : mytype, 'content' : [], } for bibdoc in bibdocs: if mytype == bibdoc.get_type(): fulltype['content'].append(bibdoc.display(version, ln=ln, display_hidden=display_hidden)) fulltypes.append(fulltype) if verbose >= 9: verbose_files = str(self) else: verbose_files = '' t = websubmit_templates.tmpl_bibrecdoc_filelist( ln=ln, types = fulltypes, verbose_files=verbose_files ) return t def fix(self, docname): - """Algorithm that transform an a broken/old bibdoc into a coherent one: - i.e. the corresponding folder will have files named after the bibdoc - name. Proper .recid, .type, .md5 files will be created/updated. - In case of more than one file with the same format revision a new bibdoc - will be created in order to put does files. - Returns the list of newly created bibdocs if any. + """ + Algorithm that transform a broken/old bibdoc into a coherent one. + Think of it as being the fsck of BibDocs. + - All the files in the bibdoc directory will be renamed according + to the document name. Proper .recid, .type, .md5 files will be + created/updated. + - In case of more than one file with the same format version a new + bibdoc will be created in order to put does files. + @param docname: the document name that need to be fixed. + @type docname: string + @return: the list of newly created bibdocs if any. + @rtype: list of BibDoc + @raise InvenioWebSubmitFileError: in case of issues that can not be + fixed automatically. """ bibdoc = self.get_bibdoc(docname) versions = {} res = [] new_bibdocs = [] # List of files with the same version/format of # existing file which need new bibdoc. counter = 0 zero_version_bug = False if os.path.exists(bibdoc.basedir): for filename in os.listdir(bibdoc.basedir): if filename[0] != '.' and ';' in filename: name, version = filename.split(';') try: version = int(version) except ValueError: # Strange name register_exception() - raise InvenioWebSubmitFileError, "A file called %s exists under %s. This is not a valid name. After the ';' there must be an integer representing the file revision. Please, manually fix this file either by renaming or by deleting it." % (filename, bibdoc.basedir) + raise InvenioWebSubmitFileError, "A file called %s exists under %s. This is not a valid name. After the ';' there must be an integer representing the file version. Please, manually fix this file either by renaming or by deleting it." % (filename, bibdoc.basedir) if version == 0: zero_version_bug = True format = name[len(file_strip_ext(name)):] format = normalize_format(format) if not versions.has_key(version): versions[version] = {} new_name = 'FIXING-%s-%s' % (str(counter), name) try: shutil.move('%s/%s' % (bibdoc.basedir, filename), '%s/%s' % (bibdoc.basedir, new_name)) except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Error in renaming '%s' to '%s': '%s'" % ('%s/%s' % (bibdoc.basedir, filename), '%s/%s' % (bibdoc.basedir, new_name), e) if versions[version].has_key(format): new_bibdocs.append((new_name, version)) else: versions[version][format] = new_name counter += 1 elif filename[0] != '.': # Strange name register_exception() - raise InvenioWebSubmitFileError, "A file called %s exists under %s. This is not a valid name. There should be a ';' followed by an integer representing the file revision. Please, manually fix this file either by renaming or by deleting it." % (filename, bibdoc.basedir) + raise InvenioWebSubmitFileError, "A file called %s exists under %s. This is not a valid name. There should be a ';' followed by an integer representing the file version. Please, manually fix this file either by renaming or by deleting it." % (filename, bibdoc.basedir) else: # we create the corresponding storage directory old_umask = os.umask(022) os.makedirs(bibdoc.basedir) # and save the father record id if it exists try: if self.id != "": recid_fd = open("%s/.recid" % bibdoc.basedir, "w") recid_fd.write(str(self.id)) recid_fd.close() if bibdoc.doctype != "": type_fd = open("%s/.type" % bibdoc.basedir, "w") type_fd.write(str(bibdoc.doctype)) type_fd.close() except Exception, e: register_exception() raise InvenioWebSubmitFileError, e os.umask(old_umask) - if not versions: bibdoc.delete() else: for version, formats in versions.iteritems(): if zero_version_bug: version += 1 for format, filename in formats.iteritems(): destination = '%s%s;%i' % (docname, format, version) try: shutil.move('%s/%s' % (bibdoc.basedir, filename), '%s/%s' % (bibdoc.basedir, destination)) except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Error in renaming '%s' to '%s': '%s'" % ('%s/%s' % (bibdoc.basedir, filename), '%s/%s' % (bibdoc.basedir, destination), e) try: recid_fd = open("%s/.recid" % bibdoc.basedir, "w") recid_fd.write(str(self.id)) recid_fd.close() type_fd = open("%s/.type" % bibdoc.basedir, "w") type_fd.write(str(bibdoc.doctype)) type_fd.close() except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Error in creating .recid and .type file for '%s' folder: '%s'" % (bibdoc.basedir, e) self.build_bibdoc_list() res = [] for (filename, version) in new_bibdocs: if zero_version_bug: version += 1 new_bibdoc = self.add_bibdoc(doctype=bibdoc.doctype, docname=docname, never_fail=True) new_bibdoc.add_file_new_format('%s/%s' % (bibdoc.basedir, filename), version) res.append(new_bibdoc) try: os.remove('%s/%s' % (bibdoc.basedir, filename)) except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Error in removing '%s': '%s'" % ('%s/%s' % (bibdoc.basedir, filename), e) Md5Folder(bibdoc.basedir).update(only_new=False) bibdoc._build_file_list() self.build_bibdoc_list() for bibdoc in self.bibdocs: if not run_sql('SELECT more_info FROM bibdoc WHERE id=%s', (bibdoc.id,)): ## Import from MARC only if the bibdoc has never had ## its more_info initialized. try: bibdoc.import_descriptions_and_comments_from_marc() except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Error in importing description and comment from %s for record %s: %s" % (repr(bibdoc), self.id, e) return res def check_format(self, docname): - """In case CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS is + """ + Check for any format related issue. + In case L{CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS} is altered or Python version changes, it might happen that a docname contains files which are no more docname + .format ; version, simply because the .format is now recognized (and it was not before, so it was contained into the docname). - This algorithm verify if it is necessary to fix. - Return True if format is correct. False if a fix is needed.""" + This algorithm verify if it is necessary to fix (seel L{fix_format}). + + @param docname: the document name whose formats should be verified. + @type docname: string + @return: True if format is correct. False if a fix is needed. + @rtype: bool + @raise InvenioWebSubmitFileError: in case of any error. + """ bibdoc = self.get_bibdoc(docname) - correct_docname = decompose_file(docname)[1] + correct_docname = decompose_file(docname + '.pdf')[1] if docname != correct_docname: return False for filename in os.listdir(bibdoc.basedir): if not filename.startswith('.'): try: dummy, dummy, format, version = decompose_file_with_version(filename) - except: + except Exception: raise InvenioWebSubmitFileError('Incorrect filename "%s" for docname %s for recid %i' % (filename, docname, self.id)) if '%s%s;%i' % (correct_docname, format, version) != filename: return False return True def check_duplicate_docnames(self): - """Check wethever the record is connected with at least tho bibdoc - with the same docname. - Return True if everything is fine. + """ + Check wethever the record is connected with at least tho documents + with the same name. + + @return: True if everything is fine. + @rtype: bool """ docnames = set() for docname in self.get_bibdoc_names(): if docname in docnames: return False else: docnames.add(docname) return True def uniformize_bibdoc(self, docname): - """This algorithm correct wrong file name belonging to a bibdoc.""" + """ + This algorithm correct wrong file name belonging to a bibdoc. + + @param docname: the document name whose formats should be verified. + @type docname: string + """ bibdoc = self.get_bibdoc(docname) for filename in os.listdir(bibdoc.basedir): if not filename.startswith('.'): try: dummy, dummy, format, version = decompose_file_with_version(filename) except ValueError: register_exception(alert_admin=True, prefix= "Strange file '%s' is stored in %s" % (filename, bibdoc.basedir)) else: os.rename(os.path.join(bibdoc.basedir, filename), os.path.join(bibdoc.basedir, '%s%s;%i' % (docname, format, version))) Md5Folder(bibdoc.basedir).update() bibdoc.touch() bibdoc._build_file_list('rename') def fix_format(self, docname, skip_check=False): - """ Fixing this situation require - different steps, because docname might already exists. - This algorithm try to fix this situation. - In case a merging is needed the algorithm return False if the merging - is not possible. + """ + Fixes format related inconsistencies. + + @param docname: the document name whose formats should be verified. + @type docname: string + @param skip_check: if True assume L{check_format} has already been + called and the need for fix has already been found. + If False, will implicitly call L{check_format} and skip fixing + if no error is found. + @type skip_check: bool + @return: in case merging two bibdocs is needed but it's not possible. + @rtype: bool """ if not skip_check: if self.check_format(docname): return True bibdoc = self.get_bibdoc(docname) - correct_docname = decompose_file(docname)[1] + correct_docname = decompose_file(docname + '.pdf')[1] need_merge = False if correct_docname != docname: need_merge = self.has_docname_p(correct_docname) if need_merge: proposed_docname = self.propose_unique_docname(correct_docname) run_sql('UPDATE bibdoc SET docname=%s WHERE id=%s', (proposed_docname, bibdoc.id)) self.build_bibdoc_list() self.uniformize_bibdoc(proposed_docname) try: self.merge_bibdocs(docname, proposed_docname) except InvenioWebSubmitFileError: return False else: run_sql('UPDATE bibdoc SET docname=%s WHERE id=%s', (correct_docname, bibdoc.id)) self.build_bibdoc_list() self.uniformize_bibdoc(correct_docname) else: self.uniformize_bibdoc(docname) return True def fix_duplicate_docnames(self, skip_check=False): - """Algotirthm to fix duplicate docnames. + """ + Algotirthm to fix duplicate docnames. If a record is connected with at least two bibdoc having the same docname, the algorithm will try to merge them. + + @param skip_check: if True assume L{check_duplicate_docnames} has + already been called and the need for fix has already been found. + If False, will implicitly call L{check_duplicate_docnames} and skip + fixing if no error is found. + @type skip_check: bool """ if not skip_check: if self.check_duplicate_docnames(): return docnames = set() for bibdoc in self.list_bibdocs(): docname = bibdoc.docname if docname in docnames: new_docname = self.propose_unique_docname(bibdoc.docname) bibdoc.change_name(new_docname) self.merge_bibdocs(docname, new_docname) docnames.add(docname) class BibDoc: - """this class represents one file attached to a record - there is a one to one mapping between an instance of this class and - an entry in the bibdoc db table""" + """ + This class represents one document (i.e. a set of files with different + formats and with versioning information that consitutes a piece of + information. + + To instanciate a new document, the recid and the docname are mandatory. + To instanciate an already existing document, either the recid and docname + or the docid alone are sufficient to retrieve it. + + @param docid: the document identifier. + @type docid: integer + @param recid: the record identifier of the record to which this document + belongs to. If the C{docid} is specified the C{recid} is automatically + retrieven from the database. + @type recid: integer + @param docname: the document name. + @type docname: string + @param doctype: the document type (used when instanciating a new document). + @type doctype: string + @param human_readable: whether sizes should be represented in a human + readable format. + @type human_readable: bool + @raise InvenioWebSubmitFileError: in case of error. + """ - def __init__ (self, docid="", recid="", docname="file", doctype="Main", human_readable=False): + def __init__ (self, docid=None, recid=None, docname=None, doctype='Main', human_readable=False): """Constructor of a bibdoc. At least the docid or the recid/docname pair is needed.""" # docid is known, the document already exists - docname = normalize_docname(docname) + if docname: + docname = normalize_docname(docname) self.docfiles = [] self.md5s = None self.related_files = [] self.human_readable = human_readable - if docid != "": - if recid == "": - recid = None - self.doctype = "" - res = run_sql("select id_bibrec,type from bibrec_bibdoc " - "where id_bibdoc=%s", (docid,)) - if len(res) > 0: + if docid: + if not recid: + res = run_sql("SELECT id_bibrec,type FROM bibrec_bibdoc WHERE id_bibdoc=%s LIMIT 1", (docid,), 1) + if res: recid = res[0][0] - self.doctype = res[0][1] + doctype = res[0][1] else: - res = run_sql("select id_bibdoc1 from bibdoc_bibdoc " - "where id_bibdoc2=%s", (docid,)) - if len(res) > 0 : - main_bibdoc = res[0][0] - res = run_sql("select id_bibrec,type from bibrec_bibdoc " - "where id_bibdoc=%s", (main_bibdoc,)) - if len(res) > 0: + res = run_sql("SELECT id_bibdoc1,type FROM bibdoc_bibdoc WHERE id_bibdoc2=%s LIMIT 1", (docid,), 1) + if res: + main_docid = res[0][0] + doctype = res[0][1] + res = run_sql("SELECT id_bibrec,type FROM bibrec_bibdoc WHERE id_bibdoc=%s LIMIT 1", (main_docid,), 1) + if res: recid = res[0][0] - self.doctype = res[0][1] + else: + raise InvenioWebSubmitFileError, "The docid %s associated with docid %s is not associated with any record" % (main_docid, docid) + else: + raise InvenioWebSubmitFileError, "The docid %s is not associated to any recid or other docid" % docid else: - res = run_sql("select type from bibrec_bibdoc " - "where id_bibrec=%s and id_bibdoc=%s", (recid, docid,)) - if len(res) > 0: - self.doctype = res[0][0] + res = run_sql("SELECT type FROM bibrec_bibdoc WHERE id_bibrec=%s AND id_bibdoc=%s LIMIT 1", (recid, docid,), 1) + if res: + doctype = res[0][0] else: #this bibdoc isn't associated with the corresponding bibrec. - raise InvenioWebSubmitFileError, "No docid associated with the recid %s" % recid + raise InvenioWebSubmitFileError, "Docid %s is not associated with the recid %s" % (docid, recid) # gather the other information - res = run_sql("select id,status,docname,creation_date," - "modification_date,more_info from bibdoc where id=%s", (docid,)) - if len(res) > 0: + res = run_sql("SELECT id,status,docname,creation_date,modification_date,text_extraction_date,more_info FROM bibdoc WHERE id=%s LIMIT 1", (docid,), 1) + if res: self.cd = res[0][3] self.md = res[0][4] + self.td = res[0][5] self.recid = recid self.docname = res[0][2] self.id = docid self.status = res[0][1] - self.more_info = BibDocMoreInfo(docid, blob_to_string(res[0][5])) + self.more_info = BibDocMoreInfo(docid, blob_to_string(res[0][6])) self.basedir = _make_base_dir(self.id) + self.doctype = doctype else: # this bibdoc doesn't exist raise InvenioWebSubmitFileError, "The docid %s does not exist." % docid # else it is a new document else: - if docname == "" or doctype == "": - raise InvenioWebSubmitFileError, "Argument missing for creating a new bibdoc" + if not docname: + raise InvenioWebSubmitFileError, "You should specify the docname when creating a new bibdoc" else: self.recid = recid self.doctype = doctype self.docname = docname self.status = '' if recid: - res = run_sql("SELECT b.id FROM bibrec_bibdoc bb JOIN bibdoc b on bb.id_bibdoc=b.id WHERE bb.id_bibrec=%s AND b.docname=%s", (recid, docname)) + res = run_sql("SELECT b.id FROM bibrec_bibdoc bb JOIN bibdoc b on bb.id_bibdoc=b.id WHERE bb.id_bibrec=%s AND b.docname=%s LIMIT 1", (recid, docname), 1) if res: raise InvenioWebSubmitFileError, "A bibdoc called %s already exists for recid %s" % (docname, recid) self.id = run_sql("INSERT INTO bibdoc (status,docname,creation_date,modification_date) " "values(%s,%s,NOW(),NOW())", (self.status, docname)) - if self.id is not None: + if self.id: # we link the document to the record if a recid was # specified self.more_info = BibDocMoreInfo(self.id) - res = run_sql("SELECT creation_date, modification_date FROM bibdoc WHERE id=%s", (self.id,)) + res = run_sql("SELECT creation_date, modification_date, text_extraction_date FROM bibdoc WHERE id=%s", (self.id,)) self.cd = res[0][0] - self.md = res[0][0] + self.md = res[0][1] + self.td = res[0][2] else: raise InvenioWebSubmitFileError, "New docid cannot be created" try: self.basedir = _make_base_dir(self.id) # we create the corresponding storage directory if not os.path.exists(self.basedir): old_umask = os.umask(022) os.makedirs(self.basedir) # and save the father record id if it exists try: - if self.recid != "": + if self.recid: recid_fd = open("%s/.recid" % self.basedir, "w") recid_fd.write(str(self.recid)) recid_fd.close() - if self.doctype != "": + if self.doctype: type_fd = open("%s/.type" % self.basedir, "w") type_fd.write(str(self.doctype)) type_fd.close() except Exception, e: - register_exception() + register_exception(alert_admin=True) raise InvenioWebSubmitFileError, e os.umask(old_umask) - if self.recid != "": + if self.recid: run_sql("INSERT INTO bibrec_bibdoc (id_bibrec, id_bibdoc, type) VALUES (%s,%s,%s)", (recid, self.id, self.doctype,)) except Exception, e: run_sql('DELETE FROM bibdoc WHERE id=%s', (self.id, )) run_sql('DELETE FROM bibrec_bibdoc WHERE id_bibdoc=%s', (self.id, )) - register_exception() + register_exception(alert_admin=True) raise InvenioWebSubmitFileError, e # build list of attached files self._build_file_list('init') # link with related_files self._build_related_file_list() def __repr__(self): - return 'BibDoc(%s, %s, %s, %s)' % (repr(self.id), repr(self.recid), repr(self.docname), repr(self.doctype)) + """ + @return: the canonical string representation of the C{BibDoc}. + @rtype: string + """ + return 'BibDoc(%s, %s, %s, %s, %s)' % (repr(self.id), repr(self.recid), repr(self.docname), repr(self.doctype), repr(self.human_readable)) def __str__(self): + """ + @return: an easy to be I{grepped} string representation of the + whole C{BibDoc} content. + @rtype: string + """ out = '%s:%i:::docname=%s\n' % (self.recid or '', self.id, self.docname) out += '%s:%i:::doctype=%s\n' % (self.recid or '', self.id, self.doctype) out += '%s:%i:::status=%s\n' % (self.recid or '', self.id, self.status) out += '%s:%i:::basedir=%s\n' % (self.recid or '', self.id, self.basedir) out += '%s:%i:::creation date=%s\n' % (self.recid or '', self.id, self.cd) out += '%s:%i:::modification date=%s\n' % (self.recid or '', self.id, self.md) + out += '%s:%i:::text extraction date=%s\n' % (self.recid or '', self.id, self.td) out += '%s:%i:::total file attached=%s\n' % (self.recid or '', self.id, len(self.docfiles)) if self.human_readable: out += '%s:%i:::total size latest version=%s\n' % (self.recid or '', self.id, nice_size(self.get_total_size_latest_version())) out += '%s:%i:::total size all files=%s\n' % (self.recid or '', self.id, nice_size(self.get_total_size())) else: out += '%s:%i:::total size latest version=%s\n' % (self.recid or '', self.id, self.get_total_size_latest_version()) out += '%s:%i:::total size all files=%s\n' % (self.recid or '', self.id, self.get_total_size()) for docfile in self.docfiles: out += str(docfile) - icon = self.get_icon() - if icon: - out += str(self.get_icon()) return out def format_already_exists_p(self, format): - """Return True if the given format already exists among the latest files.""" + """ + @param format: a format to be checked. + @type format: string + @return: True if a file of the given format already exists among the + latest files. + @rtype: bool + """ format = normalize_format(format) for afile in self.list_latest_files(): if format == afile.get_format(): return True return False def get_status(self): - """Retrieve the status.""" + """ + @return: the status information. + @rtype: string + """ return self.status + def get_text(self, version=None): + """ + @param version: the requested version. If not set, the latest version + will be used. + @type version: integer + @return: the textual content corresponding to the specified version + of the document. + @rtype: string + """ + if version is None: + version = self.get_latest_version() + if self.has_text(version): + return open(os.path.join(self.basedir, '.text;%i' % version)).read() + else: + return "" + + def get_text_path(self, version=None): + """ + @param version: the requested version. If not set, the latest version + will be used. + @type version: int + @return: the full path to the textual content corresponding to the specified version + of the document. + @rtype: string + """ + if version is None: + version = self.get_latest_version() + if self.has_text(version): + return os.path.join(self.basedir, '.text;%i' % version) + else: + return "" + + def extract_text(self, version=None, perform_ocr=False, ln='en'): + """ + Try what is necessary to extract the textual information of a document. + + @param version: the version of the document for which text is required. + If not specified the text will be retrieved from the last version. + @type version: integer + @param perform_ocr: whether to perform OCR. + @type perform_ocr: bool + @param ln: a two letter language code to give as a hint to the OCR + procedure. + @type ln: string + @raise InvenioWebSubmitFileError: in case of error. + @note: the text is extracted and cached for later use. Use L{get_text} + to retrieve it. + """ + from invenio.websubmit_file_converter import get_best_format_to_extract_text_from, convert_file, InvenioWebSubmitFileConverterError + if version is None: + version = self.get_latest_version() + docfiles = self.list_version_files(version) + ## We try to extract text only from original or OCRed documents. + filenames = [docfile.get_full_path() for docfile in docfiles if 'CONVERTED' not in docfile.flags or 'OCRED' in docfile.flags] + try: + filename = get_best_format_to_extract_text_from(filenames) + except InvenioWebSubmitFileConverterError: + ## We fall back on considering all the documents + filenames = [docfile.get_full_path() for docfile in docfiles] + try: + filename = get_best_format_to_extract_text_from(filenames) + except InvenioWebSubmitFileConverterError: + open(os.path.join(self.basedir, '.text;%i' % version), 'w').write('') + return + try: + convert_file(filename, os.path.join(self.basedir, '.text;%i' % version), '.txt', perform_ocr=perform_ocr, ln=ln) + if version == self.get_latest_version(): + run_sql("UPDATE bibdoc SET text_extraction_date=NOW() WHERE id=%s", (self.id, )) + except InvenioWebSubmitFileConverterError, e: + register_exception(alert_admin=True, prefix="Error in extracting text from bibdoc %i, version %i" % (self.id, version)) + raise InvenioWebSubmitFileError, str(e) + def touch(self): - """Update the modification time of the bibdoc.""" + """ + Update the modification time of the bibdoc (as in the UNIX command + C{touch}). + """ run_sql('UPDATE bibdoc SET modification_date=NOW() WHERE id=%s', (self.id, )) #if self.recid: #run_sql('UPDATE bibrec SET modification_date=NOW() WHERE id=%s', (self.recid, )) def set_status(self, new_status): - """Set a new status.""" + """ + Set a new status. A document with a status information is a restricted + document that can be accessed only to user which as an authorization + to the I{viewrestrdoc} WebAccess action with keyword status with value + C{new_status}. + + @param new_status: the new status. If empty the document will be + unrestricted. + @type new_status: string + @raise InvenioWebSubmitFileError: in case the reserved word + 'DELETED' is used. + """ if new_status != KEEP_OLD_VALUE: if new_status == 'DELETED': raise InvenioWebSubmitFileError('DELETED is a reserved word and can not be used for setting the status') run_sql('UPDATE bibdoc SET status=%s WHERE id=%s', (new_status, self.id)) self.status = new_status self.touch() self._build_file_list() self._build_related_file_list() - def add_file_new_version(self, filename, description=None, comment=None, format=None, hide_previous_versions=False): - """Add a new version of a file.""" + def add_file_new_version(self, filename, description=None, comment=None, format=None, flags=None): + """ + Add a new version of a file. If no physical file is already attached + to the document a the given file will have version 1. Otherwise the + new file will have the current version number plus one. + + @param filename: the local path of the file. + @type filename: string + @param description: an optional description for the file. + @type description: string + @param comment: an optional comment to the file. + @type comment: string + @param format: the extension of the file. If not specified it will + be retrieved from the filename (see L{decompose_file}). + @type format: string + @param flags: a set of flags to be associated with the file (see + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}) + @type flags: list of string + @raise InvenioWebSubmitFileError: in case of error. + """ try: latestVersion = self.get_latest_version() if latestVersion == 0: myversion = 1 else: myversion = latestVersion + 1 if os.path.exists(filename): if not os.path.getsize(filename) > 0: raise InvenioWebSubmitFileError, "%s seems to be empty" % filename if format is None: format = decompose_file(filename)[2] destination = "%s/%s%s;%i" % (self.basedir, self.docname, format, myversion) try: shutil.copyfile(filename, destination) os.chmod(destination, 0644) except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Encountered an exception while copying '%s' to '%s': '%s'" % (filename, destination, e) self.more_info.set_description(description, format, myversion) self.more_info.set_comment(comment, format, myversion) - for afile in self.list_all_files(): - format = afile.get_format() - version = afile.get_version() - if version < myversion: - self.more_info.set_hidden(hide_previous_versions, format, myversion) + if flags is None: + flags = [] + for flag in flags: + if flag == 'PERFORM_HIDE_PREVIOUS': + for afile in self.list_all_files(): + format = afile.get_format() + version = afile.get_version() + if version < myversion: + self.more_info.set_flag('HIDDEN', format, myversion) + else: + self.more_info.set_flag(flag, format, myversion) else: raise InvenioWebSubmitFileError, "'%s' does not exists!" % filename finally: self.touch() Md5Folder(self.basedir).update() self._build_file_list() + def add_file_new_format(self, filename, version=None, description=None, comment=None, format=None, flags=None): + """ + Add a file as a new format. + + @param filename: the local path of the file. + @type filename: string + @param version: an optional specific version to which the new format + should be added. If None, the last version will be used. + @type version: integer + @param description: an optional description for the file. + @type description: string + @param comment: an optional comment to the file. + @type comment: string + @param format: the extension of the file. If not specified it will + be retrieved from the filename (see L{decompose_file}). + @type format: string + @param flags: a set of flags to be associated with the file (see + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}) + @type flags: list of string + @raise InvenioWebSubmitFileError: if the given format already exists. + """ + try: + if version is None: + version = self.get_latest_version() + if version == 0: + version = 1 + if os.path.exists(filename): + if not os.path.getsize(filename) > 0: + raise InvenioWebSubmitFileError, "%s seems to be empty" % filename + if format is None: + format = decompose_file(filename)[2] + destination = "%s/%s%s;%i" % (self.basedir, self.docname, format, version) + if os.path.exists(destination): + raise InvenioWebSubmitFileError, "A file for docname '%s' for the recid '%s' already exists for the format '%s'" % (self.docname, self.recid, format) + try: + shutil.copyfile(filename, destination) + os.chmod(destination, 0644) + except Exception, e: + register_exception() + raise InvenioWebSubmitFileError, "Encountered an exception while copying '%s' to '%s': '%s'" % (filename, destination, e) + self.more_info.set_comment(comment, format, version) + self.more_info.set_description(description, format, version) + if flags is None: + flags = [] + for flag in flags: + if flag != 'PERFORM_HIDE_PREVIOUS': + self.more_info.set_flag(flag, format, version) + else: + raise InvenioWebSubmitFileError, "'%s' does not exists!" % filename + finally: + Md5Folder(self.basedir).update() + self.touch() + self._build_file_list() + def purge(self): - """Phisically Remove all the previous version of the given bibdoc""" + """ + Physically removes all the previous version of the given bibdoc. + Everything but the last formats will be erased. + """ version = self.get_latest_version() if version > 1: for afile in self.docfiles: if afile.get_version() < version: self.more_info.unset_comment(afile.get_format(), afile.get_version()) self.more_info.unset_description(afile.get_format(), afile.get_version()) - self.more_info.unset_hidden(afile.get_format(), afile.get_version()) + for flag in CFG_BIBDOCFILE_AVAILABLE_FLAGS: + self.more_info.unset_flag(flag, afile.get_format(), afile.get_version()) try: os.remove(afile.get_full_path()) except Exception, e: register_exception() Md5Folder(self.basedir).update() self.touch() self._build_file_list() def expunge(self): - """Phisically remove all the traces of a given bibdoc - note that you should not use any more this object or unpredictable - things will happen.""" + """ + Physically remove all the traces of a given document. + @note: an expunged BibDoc object shouldn't be used anymore or the + result might be unpredicted. + """ del self.md5s del self.more_info os.system('rm -rf %s' % escape_shell_arg(self.basedir)) run_sql('DELETE FROM bibrec_bibdoc WHERE id_bibdoc=%s', (self.id, )) run_sql('DELETE FROM bibdoc_bibdoc WHERE id_bibdoc1=%s OR id_bibdoc2=%s', (self.id, self.id)) run_sql('DELETE FROM bibdoc WHERE id=%s', (self.id, )) run_sql('INSERT DELAYED INTO hstDOCUMENT(action, id_bibdoc, docname, doctimestamp) VALUES("EXPUNGE", %s, %s, NOW())', (self.id, self.docname)) del self.docfiles del self.id del self.cd del self.md + del self.td del self.basedir del self.recid del self.doctype del self.docname def revert(self, version): - """Revert to a given version by copying its differnt formats to a new - version.""" + """ + Revert the document to a given version. All the formats corresponding + to that version are copied forward to a new version. + + @param version: the version to revert to. + @type version: integer + @raise InvenioWebSubmitFileError: in case of errors + """ try: version = int(version) new_version = self.get_latest_version() + 1 for docfile in self.list_version_files(version): destination = "%s/%s%s;%i" % (self.basedir, self.docname, docfile.get_format(), new_version) if os.path.exists(destination): raise InvenioWebSubmitFileError, "A file for docname '%s' for the recid '%s' already exists for the format '%s'" % (self.docname, self.recid, docfile.get_format()) try: shutil.copyfile(docfile.get_full_path(), destination) os.chmod(destination, 0644) self.more_info.set_comment(self.more_info.get_comment(docfile.get_format(), version), docfile.get_format(), new_version) self.more_info.set_description(self.more_info.get_description(docfile.get_format(), version), docfile.get_format(), new_version) except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Encountered an exception while copying '%s' to '%s': '%s'" % (docfile.get_full_path(), destination, e) finally: Md5Folder(self.basedir).update() self.touch() self._build_file_list() def import_descriptions_and_comments_from_marc(self, record=None): - """Import description & comment from the corresponding marc. - if record is passed it is directly used, otherwise it is - calculated after the xm stored in the database.""" + """ + Import descriptions and comments from the corresponding MARC metadata. + + @param record: the record (if None it will be calculated). + @type record: bibrecord recstruct + @note: If record is passed it is directly used, otherwise it is retrieved + from the MARCXML stored in the database. + """ ## Let's get the record from invenio.search_engine import get_record if record is None: record = get_record(self.id) fields = record_get_field_instances(record, '856', '4', ' ') global_comment = None global_description = None local_comment = {} local_description = {} for field in fields: url = field_get_subfield_values(field, 'u') if url: ## Given a url url = url[0] if url == '%s/record/%s/files/' % (CFG_SITE_URL, self.recid): ## If it is a traditional /record/1/files/ one ## We have global description/comment for all the formats description = field_get_subfield_values(field, 'y') if description: global_description = description[0] comment = field_get_subfield_values(field, 'z') if comment: global_comment = comment[0] elif bibdocfile_url_p(url): ## Otherwise we have description/comment per format dummy, docname, format = decompose_bibdocfile_url(url) if docname == self.docname: description = field_get_subfield_values(field, 'y') if description: local_description[format] = description[0] comment = field_get_subfield_values(field, 'z') if comment: local_comment[format] = comment[0] ## Let's update the tables version = self.get_latest_version() for docfile in self.list_latest_files(): format = docfile.get_format() if format in local_comment: self.set_comment(local_comment[format], format, version) else: self.set_comment(global_comment, format, version) if format in local_description: self.set_description(local_description[format], format, version) else: self.set_description(global_description, format, version) self._build_file_list('init') - def add_file_new_format(self, filename, version=None, description=None, comment=None, format=None): - """add a new format of a file to an archive""" - try: - if version is None: - version = self.get_latest_version() - if version == 0: - version = 1 - if os.path.exists(filename): - if not os.path.getsize(filename) > 0: - raise InvenioWebSubmitFileError, "%s seems to be empty" % filename - if format is None: - format = decompose_file(filename)[2] - destination = "%s/%s%s;%i" % (self.basedir, self.docname, format, version) - if os.path.exists(destination): - raise InvenioWebSubmitFileError, "A file for docname '%s' for the recid '%s' already exists for the format '%s'" % (self.docname, self.recid, format) - try: - shutil.copyfile(filename, destination) - os.chmod(destination, 0644) - except Exception, e: - register_exception() - raise InvenioWebSubmitFileError, "Encountered an exception while copying '%s' to '%s': '%s'" % (filename, destination, e) - self.more_info.set_comment(comment, format, version) - self.more_info.set_description(description, format, version) - else: - raise InvenioWebSubmitFileError, "'%s' does not exists!" % filename - finally: - Md5Folder(self.basedir).update() - self.touch() - self._build_file_list() - - def get_icon(self): - """Returns the bibdoc corresponding to an icon of the given bibdoc.""" - if self.related_files.has_key('Icon'): - return self.related_files['Icon'][0] - else: - return None + def get_icon(self, subformat_re=CFG_WEBSUBMIT_ICON_SUBFORMAT_RE, display_hidden=True): + """ + @param subformat_re: by default the convention is that + L{CFG_WEBSUBMIT_ICON_SUBFORMAT_RE} is used as a subformat indicator to + mean that a particular format is to be used as an icon. + Specifiy a different subformat if you need to use a different + convention. + @type subformat_re: compiled regular expression + @return: the bibdocfile corresponding to the icon of this document, or + None if any icon exists for this document. + @rtype: BibDocFile + @warning: before I{subformat} were introduced this method was + returning a BibDoc, while now is returning a BibDocFile. Check + if your client code is compatible with this. + """ + for docfile in self.list_latest_files(list_hidden=display_hidden): + if subformat_re.match(docfile.get_subformat()): + return docfile + return None - def add_icon(self, filename, basename=None, format=None): - """Links an icon with the bibdoc object. Return the icon bibdoc""" + def add_icon(self, filename, format=None, subformat=CFG_WEBSUBMIT_DEFAULT_ICON_SUBFORMAT): + """ + Attaches icon to this document. + + @param filename: the local filesystem path to the icon. + @type filename: string + @param format: an optional format for the icon. If not specified it + will be calculated after the filesystem path. + @type format: string + @param subformat: by default the convention is that + CFG_WEBSUBMIT_DEFAULT_ICON_SUBFORMAT is used as a subformat indicator to + mean that a particular format is to be used as an icon. + Specifiy a different subformat if you need to use a different + convention. + @type subformat: string + @raise InvenioWebSubmitFileError: in case of errors. + """ #first check if an icon already exists - existing_icon = self.get_icon() - if existing_icon is not None: - existing_icon.delete() - #then add the new one - if basename is None: - basename = 'icon-%s' % self.docname - if format is None: + if not format: format = decompose_file(filename)[2] - newicon = BibDoc(doctype='Icon', docname=basename, human_readable=self.human_readable) - newicon.add_file_new_version(filename, format=format) - try: - try: - old_umask = os.umask(022) - recid_fd = open("%s/.docid" % newicon.get_base_dir(), "w") - recid_fd.write(str(self.id)) - recid_fd.close() - type_fd = open("%s/.type" % newicon.get_base_dir(), "w") - type_fd.write(str(self.doctype)) - type_fd.close() - os.umask(old_umask) - run_sql("INSERT INTO bibdoc_bibdoc (id_bibdoc1, id_bibdoc2, type) VALUES (%s,%s,'Icon')", (self.id, newicon.get_id(),)) - except Exception, e: - register_exception() - raise InvenioWebSubmitFileError, "Encountered an exception while writing .docid and .doctype for folder '%s': '%s'" % (newicon.get_base_dir(), e) - finally: - Md5Folder(newicon.basedir).update() - self.touch() - self._build_related_file_list() - return newicon + if subformat: + format += ";%s" % subformat - def delete_icon(self): - """Removes the current icon if it exists.""" - existing_icon = self.get_icon() - if existing_icon is not None: - existing_icon.delete() - self.touch() - self._build_related_file_list() + self.add_file_new_format(filename, format=format) + + def delete_icon(self, subformat_re=CFG_WEBSUBMIT_ICON_SUBFORMAT_RE): + """ + @param subformat_re: by default the convention is that + L{CFG_WEBSUBMIT_ICON_SUBFORMAT_RE} is used as a subformat indicator to + mean that a particular format is to be used as an icon. + Specifiy a different subformat if you need to use a different + convention. + @type subformat: compiled regular expression + Removes the icon attached to the document if it exists. + """ + for docfile in self.list_latest_files(): + if subformat_re.match(docfile.get_subformat()): + self.delete_file(docfile.get_format(), docfile.get_version()) def display(self, version="", ln=CFG_SITE_LANG, display_hidden=True): - """Returns a formatted representation of the files linked with - the bibdoc. + """ + Returns an HTML representation of the this document. + + @param version: if not set, only the last version will be displayed. If + 'all', all versions will be displayed. + @type version: string (integer or 'all') + @param ln: the language code. + @type ln: string + @param display_hidden: whether to include hidden files as well. + @type display_hidden: bool + @return: the formatted representation. + @rtype: HTML string """ t = "" if version == "all": docfiles = self.list_all_files(list_hidden=display_hidden) elif version != "": version = int(version) docfiles = self.list_version_files(version, list_hidden=display_hidden) else: - docfiles = self.list_latest_files() - existing_icon = self.get_icon() - if existing_icon is not None: - existing_icon = existing_icon.list_all_files()[0] - imageurl = "%s/record/%s/files/%s" % \ - (CFG_SITE_URL, self.recid, urllib.quote(existing_icon.get_full_name())) + docfiles = self.list_latest_files(list_hidden=display_hidden) + icon = self.get_icon(display_hidden=display_hidden) + if icon: + imageurl = icon.get_url() else: imageurl = "%s/img/smallfiles.gif" % CFG_SITE_URL versions = [] for version in list_versions_from_array(docfiles): currversion = { 'version' : version, 'previous' : 0, 'content' : [] } if version == self.get_latest_version() and version != 1: currversion['previous'] = 1 for docfile in docfiles: if docfile.get_version() == version: currversion['content'].append(docfile.display(ln = ln)) versions.append(currversion) - t = websubmit_templates.tmpl_bibdoc_filelist( - ln = ln, - versions = versions, - imageurl = imageurl, - docname = self.docname, - recid = self.recid - ) - return t + if versions: + return websubmit_templates.tmpl_bibdoc_filelist( + ln = ln, + versions = versions, + imageurl = imageurl, + docname = self.docname, + recid = self.recid + ) + else: + return "" def change_name(self, newname): - """Rename the bibdoc name. New name must not be already used by the linked - bibrecs.""" + """ + Renames this document name. + + @param newname: the new name. + @type newname: string + @raise InvenioWebSubmitFileError: if the new name corresponds to + a document already attached to the record owning this document. + """ try: newname = normalize_docname(newname) res = run_sql("SELECT b.id FROM bibrec_bibdoc bb JOIN bibdoc b on bb.id_bibdoc=b.id WHERE bb.id_bibrec=%s AND b.docname=%s", (self.recid, newname)) if res: raise InvenioWebSubmitFileError, "A bibdoc called %s already exists for recid %s" % (newname, self.recid) try: for f in os.listdir(self.basedir): if not f.startswith('.'): try: (dummy, base, extension, version) = decompose_file_with_version(f) except ValueError: register_exception(alert_admin=True, prefix="Strange file '%s' is stored in %s" % (f, self.basedir)) else: shutil.move(os.path.join(self.basedir, f), os.path.join(self.basedir, '%s%s;%i' % (newname, extension, version))) except Exception, e: register_exception() raise InvenioWebSubmitFileError("Error in renaming the bibdoc %s to %s for recid %s: %s" % (self.docname, newname, self.recid, e)) run_sql("update bibdoc set docname=%s where id=%s", (newname, self.id,)) self.docname = newname finally: Md5Folder(self.basedir).update() self.touch() self._build_file_list('rename') self._build_related_file_list() def set_comment(self, comment, format, version=None): - """Update the comment of a format/version.""" + """ + Updates the comment of a specific format/version of the document. + + @param comment: the new comment. + @type comment: string + @param format: the specific format for which the comment should be + updated. + @type format: string + @param version: the specific version for which the comment should be + updated. If not specified the last version will be used. + @type version: integer + """ if version is None: version = self.get_latest_version() + format = normalize_format(format) self.more_info.set_comment(comment, format, version) self.touch() self._build_file_list('init') def set_description(self, description, format, version=None): - """Update the description of a format/version.""" + """ + Updates the description of a specific format/version of the document. + + @param description: the new description. + @type description: string + @param format: the specific format for which the description should be + updated. + @type format: string + @param version: the specific version for which the description should be + updated. If not specified the last version will be used. + @type version: integer + """ if version is None: version = self.get_latest_version() + format = normalize_format(format) self.more_info.set_description(description, format, version) self.touch() self._build_file_list('init') - def set_hidden(self, hidden, format, version=None): - """Update the hidden flag for format/version.""" + def set_flag(self, flagname, format, version=None): + """ + Sets a flag for a specific format/version of the document. + + @param flagname: a flag from L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}. + @type flagname: string + @param format: the specific format for which the flag should be + set. + @type format: string + @param version: the specific version for which the flag should be + set. If not specified the last version will be used. + @type version: integer + """ + if version is None: + version = self.get_latest_version() + format = normalize_format(format) + self.more_info.set_flag(flagname, format, version) + self.touch() + self._build_file_list('init') + + def has_flag(self, flagname, format, version=None): + """ + Checks if a particular flag for a format/version is set. + + @param flagname: a flag from L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}. + @type flagname: string + @param format: the specific format for which the flag should be + set. + @type format: string + @param version: the specific version for which the flag should be + set. If not specified the last version will be used. + @type version: integer + @return: True if the flag is set. + @rtype: bool + """ + if version is None: + version = self.get_latest_version() + format = normalize_format(format) + return self.more_info.has_flag(flagname, format, version) + + def unset_flag(self, flagname, format, version=None): + """ + Unsets a flag for a specific format/version of the document. + + @param flagname: a flag from L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}. + @type flagname: string + @param format: the specific format for which the flag should be + unset. + @type format: string + @param version: the specific version for which the flag should be + unset. If not specified the last version will be used. + @type version: integer + """ if version is None: version = self.get_latest_version() - self.more_info.set_hidden(hidden, format, version) + format = normalize_format(format) + self.more_info.unset_flag(flagname, format, version) self.touch() self._build_file_list('init') def get_comment(self, format, version=None): - """Get a comment for a given format/version.""" + """ + Retrieve the comment of a specific format/version of the document. + + @param format: the specific format for which the comment should be + retrieved. + @type format: string + @param version: the specific version for which the comment should be + retrieved. If not specified the last version will be used. + @type version: integer + @return: the comment. + @rtype: string + """ if version is None: version = self.get_latest_version() + format = normalize_format(format) return self.more_info.get_comment(format, version) def get_description(self, format, version=None): - """Get a description for a given format/version.""" + """ + Retrieve the description of a specific format/version of the document. + + @param format: the specific format for which the description should be + retrieved. + @type format: string + @param version: the specific version for which the description should + be retrieved. If not specified the last version will be used. + @type version: integer + @return: the description. + @rtype: string + """ if version is None: version = self.get_latest_version() + format = normalize_format(format) return self.more_info.get_description(format, version) def hidden_p(self, format, version=None): - """Is the format/version hidden?""" + """ + Returns True if the file specified by the given format/version is + hidden. + + @param format: the specific format for which the description should be + retrieved. + @type format: string + @param version: the specific version for which the description should + be retrieved. If not specified the last version will be used. + @type version: integer + @return: True if hidden. + @rtype: bool + """ if version is None: version = self.get_latest_version() - return self.more_info.hidden_p(format, version) - - def icon_p(self): - """Return True if this bibdoc correspond to an icon which is linked - to another bibdoc.""" - return run_sql("SELECT count(id_bibdoc2) FROM bibdoc_bibdoc WHERE id_bibdoc2=%s AND type='Icon'", (self.id, ))[0][0] > 0 + return self.more_info.has_flag('HIDDEN', format, version) def get_docname(self): - """retrieve bibdoc name""" + """ + @return: the name of this document. + @rtype: string + """ return self.docname def get_base_dir(self): - """retrieve bibdoc base directory, e.g. /soft/cdsweb/var/data/files/123""" + """ + @return: the base directory on the local filesystem for this document + (e.g. C{/soft/cdsweb/var/data/files/g0/123}) + @rtype: string + """ return self.basedir def get_type(self): - """retrieve bibdoc doctype""" + """ + @return: the type of this document. + @rtype: string""" return self.doctype def get_recid(self): - """retrieve bibdoc recid""" + """ + @return: the record id of the record to which this document is + attached. + @rtype: integer + """ return self.recid def get_id(self): - """retrieve bibdoc id""" + """ + @return: the id of this document. + @rtype: integer + """ return self.id + def pdf_a_p(self): + """ + @return: True if this document contains a PDF in PDF/A format. + @rtype: bool""" + return self.has_flag('PDF/A', 'pdf') + + def has_text(self, require_up_to_date=False, version=None): + """ + Return True if the text of this document has already been extracted. + + @param require_up_to_date: if True check the text was actually + extracted after the most recent format of the given version. + @type require_up_to_date: bool + @param version: a version for which the text should have been + extracted. If not specified the latest version is considered. + @type version: integer + @return: True if the text has already been extracted. + @rtype: bool + """ + if version is None: + version = self.get_latest_version() + if os.path.exists(os.path.join(self.basedir, '.text;%i' % version)): + if not require_up_to_date: + return True + else: + docfiles = self.list_version_files(version) + text_md = datetime.fromtimestamp(os.path.getmtime(os.path.join(self.basedir, '.text;%i' % version))) + for docfile in docfiles: + if text_md <= docfile.md: + return False + return True + return False + def get_file(self, format, version=""): - """Return a DocFile with docname name, with format (the extension), and - with the given version. + """ + Returns a L{BibDocFile} instance of this document corresponding to the + specific format and version. + + @param format: the specific format. + @type format: string + @param version: the specific version for which the description should + be retrieved. If not specified the last version will be used. + @type version: integer + @return: the L{BibDocFile} instance. + @rtype: BibDocFile """ if version == "": docfiles = self.list_latest_files() else: version = int(version) docfiles = self.list_version_files(version) format = normalize_format(format) for docfile in docfiles: if (docfile.get_format()==format or not format): return docfile + + ## Let's skip the subformat specification and consider just the + ## superformat + superformat = get_superformat_from_format(format) + for docfile in docfiles: + if get_superformat_from_format(docfile.get_format()) == superformat: + return docfile raise InvenioWebSubmitFileError, "No file called '%s' of format '%s', version '%s'" % (self.docname, format, version) def list_versions(self): - """Returns the list of existing version numbers for a given bibdoc.""" + """ + @return: the list of existing version numbers for this document. + @rtype: list of integer + """ versions = [] for docfile in self.docfiles: if not docfile.get_version() in versions: versions.append(docfile.get_version()) + versions.sort() return versions def delete(self): - """delete the current bibdoc instance.""" + """ + Delete this document. + @see: L{undelete} for how to undelete the document. + @raise InvenioWebSubmitFileError: in case of errors. + """ try: today = datetime.today() self.change_name('DELETED-%s%s-%s' % (today.strftime('%Y%m%d%H%M%S'), today.microsecond, self.docname)) run_sql("UPDATE bibdoc SET status='DELETED' WHERE id=%s", (self.id,)) + self.status = 'DELETED' except Exception, e: register_exception() raise InvenioWebSubmitFileError, "It's impossible to delete bibdoc %s: %s" % (self.id, e) def deleted_p(self): - """Return True if the bibdoc has been deleted.""" + """ + @return: True if this document has been deleted. + @rtype: bool + """ return self.status == 'DELETED' def empty_p(self): - """Return True if the bibdoc is empty, i.e. it has no bibdocfile - connected.""" + """ + @return: True if this document is empty, i.e. it has no bibdocfile + connected. + @rtype: bool + """ return len(self.docfiles) == 0 def undelete(self, previous_status=''): - """undelete a deleted file (only if it was actually deleted). The - previous status, i.e. the restriction key can be provided. - Otherwise the bibdoc will pe public.""" + """ + Undelete a deleted file (only if it was actually deleted via L{delete}). + The previous C{status}, i.e. the restriction key can be provided. + Otherwise the undeleted document will be public. + @param previous_status: the previous status the should be restored. + @type previous_status: string + @raise InvenioWebSubmitFileError: in case of any error. + """ bibrecdocs = BibRecDocs(self.recid) try: run_sql("UPDATE bibdoc SET status=%s WHERE id=%s AND status='DELETED'", (previous_status, self.id)) except Exception, e: raise InvenioWebSubmitFileError, "It's impossible to undelete bibdoc %s: %s" % (self.id, e) if self.docname.startswith('DELETED-'): try: # Let's remove DELETED-20080214144322- in front of the docname original_name = '-'.join(self.docname.split('-')[2:]) original_name = bibrecdocs.propose_unique_docname(original_name) self.change_name(original_name) except Exception, e: raise InvenioWebSubmitFileError, "It's impossible to restore the previous docname %s. %s kept as docname because: %s" % (original_name, self.docname, e) else: raise InvenioWebSubmitFileError, "Strange just undeleted docname isn't called DELETED-somedate-docname but %s" % self.docname def delete_file(self, format, version): - """Delete on the filesystem the particular format version. - Note, this operation is not reversible!""" + """ + Delete a specific format/version of this document on the filesystem. + @param format: the particular format to be deleted. + @type format: string + @param version: the particular version to be deleted. + @type version: integer + @note: this operation is not reversible!""" try: afile = self.get_file(format, version) except InvenioWebSubmitFileError: return try: os.remove(afile.get_full_path()) except OSError: pass self.touch() self._build_file_list() def get_history(self): - """Return a string with a line for each row in the history for the - given docid.""" + """ + @return: a human readable and parsable string that represent the + history of this document. + @rtype: string + """ ret = [] hst = run_sql("""SELECT action, docname, docformat, docversion, docsize, docchecksum, doctimestamp FROM hstDOCUMENT WHERE id_bibdoc=%s ORDER BY doctimestamp ASC""", (self.id, )) for row in hst: ret.append("%s %s '%s', format: '%s', version: %i, size: %s, checksum: '%s'" % (row[6].strftime('%Y-%m-%d %H:%M:%S'), row[0], row[1], row[2], row[3], nice_size(row[4]), row[5])) return ret def _build_file_list(self, context=''): - """Lists all files attached to the bibdoc. This function should be + """ + Lists all files attached to the bibdoc. This function should be called everytime the bibdoc is modified. As a side effect it log everything that has happened to the bibdocfiles in the log facility, according to the context: "init": means that the function has been called; for the first time by a constructor, hence no logging is performed "": by default means to log every deleted file as deleted and every added file as added; "rename": means that every appearently deleted file is logged as renamef and every new file as renamet. """ def log_action(action, docid, docname, format, version, size, checksum, timestamp=''): """Log an action into the bibdoclog table.""" try: if timestamp: run_sql('INSERT DELAYED INTO hstDOCUMENT(action, id_bibdoc, docname, docformat, docversion, docsize, docchecksum, doctimestamp) VALUES(%s, %s, %s, %s, %s, %s, %s, %s)', (action, docid, docname, format, version, size, checksum, timestamp)) else: run_sql('INSERT DELAYED INTO hstDOCUMENT(action, id_bibdoc, docname, docformat, docversion, docsize, docchecksum, doctimestamp) VALUES(%s, %s, %s, %s, %s, %s, %s, NOW())', (action, docid, docname, format, version, size, checksum)) except DatabaseError: register_exception() def make_removed_added_bibdocfiles(previous_file_list): """Internal function for build the log of changed files.""" # Let's rebuild the previous situation old_files = {} for bibdocfile in previous_file_list: old_files[(bibdocfile.name, bibdocfile.format, bibdocfile.version)] = (bibdocfile.size, bibdocfile.checksum, bibdocfile.md) # Let's rebuild the new situation new_files = {} for bibdocfile in self.docfiles: new_files[(bibdocfile.name, bibdocfile.format, bibdocfile.version)] = (bibdocfile.size, bibdocfile.checksum, bibdocfile.md) # Let's subtract from added file all the files that are present in # the old list, and let's add to deleted files that are not present # added file. added_files = dict(new_files) deleted_files = {} for key, value in old_files.iteritems(): if added_files.has_key(key): del added_files[key] else: deleted_files[key] = value return (added_files, deleted_files) if context != 'init': previous_file_list = list(self.docfiles) + res = run_sql("SELECT status,docname,creation_date," + "modification_date,more_info FROM bibdoc WHERE id=%s", (self.id,)) + self.cd = res[0][2] + self.md = res[0][3] + self.docname = res[0][1] + self.status = res[0][0] + self.more_info = BibDocMoreInfo(self.id, blob_to_string(res[0][4])) self.docfiles = [] if os.path.exists(self.basedir): self.md5s = Md5Folder(self.basedir) files = os.listdir(self.basedir) files.sort() for afile in files: if not afile.startswith('.'): try: filepath = os.path.join(self.basedir, afile) fileversion = int(re.sub(".*;", "", afile)) fullname = afile.replace(";%s" % fileversion, "") checksum = self.md5s.get_checksum(afile) (dirname, basename, format) = decompose_file(fullname) - comment = self.more_info.get_comment(format, fileversion) - description = self.more_info.get_description(format, fileversion) - hidden = self.more_info.hidden_p(format, fileversion) # we can append file: self.docfiles.append(BibDocFile(filepath, self.doctype, fileversion, basename, format, - self.recid, self.id, self.status, checksum, description, comment, hidden, human_readable=self.human_readable)) + self.recid, self.id, self.status, checksum, + self.more_info, human_readable=self.human_readable)) except Exception, e: register_exception() if context == 'init': return else: added_files, deleted_files = make_removed_added_bibdocfiles(previous_file_list) deletedstr = "DELETED" addedstr = "ADDED" if context == 'rename': deletedstr = "RENAMEDFROM" addedstr = "RENAMEDTO" for (docname, format, version), (size, checksum, md) in added_files.iteritems(): if context == 'rename': md = '' # No modification time log_action(addedstr, self.id, docname, format, version, size, checksum, md) for (docname, format, version), (size, checksum, md) in deleted_files.iteritems(): if context == 'rename': md = '' # No modification time log_action(deletedstr, self.id, docname, format, version, size, checksum, md) def _build_related_file_list(self): """Lists all files attached to the bibdoc. This function should be called everytime the bibdoc is modified within e.g. its icon. + @deprecated: use subformats instead. """ self.related_files = {} res = run_sql("SELECT ln.id_bibdoc2,ln.type,bibdoc.status FROM " "bibdoc_bibdoc AS ln,bibdoc WHERE id=ln.id_bibdoc2 AND " "ln.id_bibdoc1=%s", (self.id,)) for row in res: docid = row[0] doctype = row[1] if row[2] != 'DELETED': if not self.related_files.has_key(doctype): self.related_files[doctype] = [] cur_doc = BibDoc(docid=docid, human_readable=self.human_readable) self.related_files[doctype].append(cur_doc) def get_total_size_latest_version(self): """Return the total size used on disk of all the files belonging to this bibdoc and corresponding to the latest version.""" ret = 0 for bibdocfile in self.list_latest_files(): ret += bibdocfile.get_size() return ret def get_total_size(self): """Return the total size used on disk of all the files belonging to this bibdoc.""" ret = 0 for bibdocfile in self.list_all_files(): ret += bibdocfile.get_size() return ret def list_all_files(self, list_hidden=True): """Returns all the docfiles linked with the given bibdoc.""" if list_hidden: return self.docfiles else: return [afile for afile in self.docfiles if not afile.hidden_p()] - def list_latest_files(self): + def list_latest_files(self, list_hidden=True): """Returns all the docfiles within the last version.""" - return self.list_version_files(self.get_latest_version()) + return self.list_version_files(self.get_latest_version(), list_hidden=list_hidden) def list_version_files(self, version, list_hidden=True): """Return all the docfiles of a particular version.""" version = int(version) - return [docfile for docfile in self.docfiles if docfile.get_version() == version and (list_hidden or not docfile.hidden_p)] + return [docfile for docfile in self.docfiles if docfile.get_version() == version and (list_hidden or not docfile.hidden_p())] def get_latest_version(self): """ Returns the latest existing version number for the given bibdoc. If no file is associated to this bibdoc, returns '0'. """ version = 0 for bibdocfile in self.docfiles: if bibdocfile.get_version() > version: version = bibdocfile.get_version() return version def get_file_number(self): """Return the total number of files.""" return len(self.docfiles) def register_download(self, ip_address, version, format, userid=0): """Register the information about a download of a particular file.""" format = normalize_format(format) if format[:1] == '.': format = format[1:] format = format.upper() return run_sql("INSERT INTO rnkDOWNLOADS " "(id_bibrec,id_bibdoc,file_version,file_format," "id_user,client_host,download_time) VALUES " "(%s,%s,%s,%s,%s,INET_ATON(%s),NOW())", (self.recid, self.id, version, format, userid, ip_address,)) class BibDocFile: """This class represents a physical file in the CDS Invenio filesystem. It should never be instantiated directly""" - def __init__(self, fullpath, doctype, version, name, format, recid, docid, status, checksum, description=None, comment=None, hidden=False, human_readable=False): - self.fullpath = fullpath + def __init__(self, fullpath, doctype, version, name, format, recid, docid, status, checksum, more_info, human_readable=False): + self.fullpath = os.path.abspath(fullpath) self.doctype = doctype self.docid = docid self.recid = recid self.version = version self.status = status self.checksum = checksum - self.description = description - self.comment = comment - self.hidden = hidden self.human_readable = human_readable - self.size = os.path.getsize(fullpath) - self.md = datetime.fromtimestamp(os.path.getmtime(fullpath)) - try: - self.cd = datetime.fromtimestamp(os.path.getctime(fullpath)) - except OSError: - self.cd = self.md - self.name = name + self.description = more_info.get_description(format, version) self.format = normalize_format(format) - self.dir = os.path.dirname(fullpath) - self.url = '%s/record/%s/files/%s%s' % (CFG_SITE_URL, self.recid, urllib.quote(self.name), urllib.quote(self.format)) - self.fullurl = '%s?version=%s' % (self.url, self.version) - self.etag = '"%i%s%i"' % (self.docid, self.format, self.version) + self.superformat = get_superformat_from_format(self.format) + self.subformat = get_subformat_from_format(self.format) if format == "": self.mime = "application/octet-stream" self.encoding = "" self.fullname = name else: - self.fullname = "%s%s" % (name, self.format) + self.fullname = "%s%s" % (name, self.superformat) (self.mime, self.encoding) = _mimes.guess_type(self.fullname) if self.mime is None: self.mime = "application/octet-stream" + self.more_info = more_info + self.comment = more_info.get_comment(format, version) + self.flags = more_info.get_flags(format, version) + self.hidden = 'HIDDEN' in self.flags + self.size = os.path.getsize(fullpath) + self.md = datetime.fromtimestamp(os.path.getmtime(fullpath)) + try: + self.cd = datetime.fromtimestamp(os.path.getctime(fullpath)) + except OSError: + self.cd = self.md + self.name = name + self.dir = os.path.dirname(fullpath) + if self.subformat: + self.url = create_url('%s/record/%s/files/%s%s' % (CFG_SITE_URL, self.recid, self.name, self.superformat), {'subformat' : self.subformat}) + self.fullurl = create_url('%s/record/%s/files/%s%s' % (CFG_SITE_URL, self.recid, self.name, self.superformat), {'subformat' : self.subformat, 'version' : self.version}) + else: + self.url = create_url('%s/record/%s/files/%s%s' % (CFG_SITE_URL, self.recid, self.name, self.superformat), {}) + self.fullurl = create_url('%s/record/%s/files/%s%s' % (CFG_SITE_URL, self.recid, self.name, self.superformat), {'version' : self.version}) + self.etag = '"%i%s%i"' % (self.docid, self.format, self.version) self.magic = None def __repr__(self): - return 'BibDocFile(%s, %s, %i, %s, %s, %i, %i, %s, %s, %s, %s, %s, %s)' % (repr(self.fullpath), repr(self.doctype), self.version, repr(self.name), repr(self.format), self.recid, self.docid, repr(self.status), repr(self.checksum), repr(self.description), repr(self.comment), repr(self.hidden), repr(self.human_readable)) + return ('BibDocFile(%s, %s, %i, %s, %s, %i, %i, %s, %s, %s, %s)' % (repr(self.fullpath), repr(self.doctype), self.version, repr(self.name), repr(self.format), self.recid, self.docid, repr(self.status), repr(self.checksum), repr(self.more_info), repr(self.human_readable))) def __str__(self): out = '%s:%s:%s:%s:fullpath=%s\n' % (self.recid, self.docid, self.version, self.format, self.fullpath) out += '%s:%s:%s:%s:fullname=%s\n' % (self.recid, self.docid, self.version, self.format, self.fullname) out += '%s:%s:%s:%s:name=%s\n' % (self.recid, self.docid, self.version, self.format, self.name) + out += '%s:%s:%s:%s:subformat=%s\n' % (self.recid, self.docid, self.version, self.format, get_subformat_from_format(self.format)) out += '%s:%s:%s:%s:status=%s\n' % (self.recid, self.docid, self.version, self.format, self.status) out += '%s:%s:%s:%s:checksum=%s\n' % (self.recid, self.docid, self.version, self.format, self.checksum) if self.human_readable: out += '%s:%s:%s:%s:size=%s\n' % (self.recid, self.docid, self.version, self.format, nice_size(self.size)) else: out += '%s:%s:%s:%s:size=%s\n' % (self.recid, self.docid, self.version, self.format, self.size) out += '%s:%s:%s:%s:creation time=%s\n' % (self.recid, self.docid, self.version, self.format, self.cd) out += '%s:%s:%s:%s:modification time=%s\n' % (self.recid, self.docid, self.version, self.format, self.md) out += '%s:%s:%s:%s:magic=%s\n' % (self.recid, self.docid, self.version, self.format, self.get_magic()) out += '%s:%s:%s:%s:mime=%s\n' % (self.recid, self.docid, self.version, self.format, self.mime) out += '%s:%s:%s:%s:encoding=%s\n' % (self.recid, self.docid, self.version, self.format, self.encoding) out += '%s:%s:%s:%s:url=%s\n' % (self.recid, self.docid, self.version, self.format, self.url) out += '%s:%s:%s:%s:fullurl=%s\n' % (self.recid, self.docid, self.version, self.format, self.fullurl) out += '%s:%s:%s:%s:description=%s\n' % (self.recid, self.docid, self.version, self.format, self.description) out += '%s:%s:%s:%s:comment=%s\n' % (self.recid, self.docid, self.version, self.format, self.comment) out += '%s:%s:%s:%s:hidden=%s\n' % (self.recid, self.docid, self.version, self.format, self.hidden) + out += '%s:%s:%s:%s:flags=%s\n' % (self.recid, self.docid, self.version, self.format, self.flags) out += '%s:%s:%s:%s:etag=%s\n' % (self.recid, self.docid, self.version, self.format, self.etag) return out def display(self, ln = CFG_SITE_LANG): """Returns a formatted representation of this docfile.""" return websubmit_templates.tmpl_bibdocfile_filelist( ln = ln, recid = self.recid, version = self.version, + md = self.md, name = self.name, - format = self.format, - size = self.size, + superformat = self.superformat, + subformat = self.subformat, + nice_size = nice_size(self.size), description = self.description or '' ) def is_restricted(self, req): """Returns restriction state. (see acc_authorize_action return values)""" if self.status not in ('', 'DELETED'): return acc_authorize_action(req, 'viewrestrdoc', status=self.status) elif self.status == 'DELETED': return (1, 'File has ben deleted') else: return (0, '') + def is_icon(self, subformat_re=CFG_WEBSUBMIT_ICON_SUBFORMAT_RE): + """ + @param subformat_re: by default the convention is that + L{CFG_WEBSUBMIT_ICON_SUBFORMAT_RE} is used as a subformat indicator to + mean that a particular format is to be used as an icon. + Specifiy a different subformat if you need to use a different + convention. + @type subformat: compiled regular expression + @return: True if this file is an icon. + @rtype: bool + """ + return bool(subformat_re.match(self.subformat)) + def hidden_p(self): return self.hidden def get_url(self): return self.url def get_type(self): return self.doctype def get_path(self): return self.fullpath def get_bibdocid(self): return self.docid def get_name(self): return self.name def get_full_name(self): return self.fullname def get_full_path(self): return self.fullpath def get_format(self): return self.format + def get_subformat(self): + return self.subformat + + def get_superformat(self): + return self.superformat + def get_size(self): return self.size def get_version(self): return self.version def get_checksum(self): return self.checksum def get_description(self): return self.description def get_comment(self): return self.comment def get_content(self): """Returns the binary content of the file.""" content_fd = open(self.fullpath, 'rb') content = content_fd.read() content_fd.close() return content def get_recid(self): """Returns the recid connected with the bibdoc of this file.""" return self.recid def get_status(self): """Returns the status of the file, i.e. either '', 'DELETED' or a restriction keyword.""" return self.status def get_magic(self): """Return all the possible guesses from the magic library about the content of the file.""" if self.magic is None and CFG_HAS_MAGIC: - magic_cookies = get_magic_cookies() + magic_cookies = _get_magic_cookies() magic_result = [] for key in magic_cookies.keys(): magic_result.append(magic_cookies[key].file(self.fullpath)) self.magic = tuple(magic_result) return self.magic def check(self): """Return True if the checksum corresponds to the file.""" return calculate_md5(self.fullpath) == self.checksum def stream(self, req): """Stream the file.""" if self.status: (auth_code, auth_message) = acc_authorize_action(req, 'viewrestrdoc', status=self.status) else: auth_code = 0 if auth_code == 0: if os.path.exists(self.fullpath): if random.random() < CFG_BIBDOCFILE_MD5_CHECK_PROBABILITY and calculate_md5(self.fullpath) != self.checksum: raise InvenioWebSubmitFileError, "File %s, version %i, for record %s is corrupted!" % (self.fullname, self.version, self.recid) - stream_file(req, self.fullpath, self.fullname, self.mime, self.encoding, self.etag, self.checksum, self.fullurl) + stream_file(req, self.fullpath, "%s%s" % (self.name, self.superformat), self.mime, self.encoding, self.etag, self.checksum, self.fullurl) raise apache.SERVER_RETURN, apache.DONE else: req.status = apache.HTTP_NOT_FOUND raise InvenioWebSubmitFileError, "%s does not exists!" % self.fullpath else: raise InvenioWebSubmitFileError, "You are not authorized to download %s: %s" % (self.fullname, auth_message) def stream_file(req, fullpath, fullname=None, mime=None, encoding=None, etag=None, md5=None, location=None): """This is a generic function to stream a file to the user. If fullname, mime, encoding, and location are not provided they will be guessed based on req and fullpath. md5 should be passed as an hexadecimal string. """ def normal_streaming(size): req.set_content_length(size) req.send_http_header() if not req.header_only: req.sendfile(fullpath) return "" def single_range(size, the_range): req.set_content_length(the_range[1]) req.headers_out['Content-Range'] = 'bytes %d-%d/%d' % (the_range[0], the_range[0] + the_range[1] - 1, size) req.status = apache.HTTP_PARTIAL_CONTENT req.send_http_header() if not req.header_only: req.sendfile(fullpath, the_range[0], the_range[1]) return "" def multiple_ranges(size, ranges, mime): req.status = apache.HTTP_PARTIAL_CONTENT boundary = '%s%04d' % (time.strftime('THIS_STRING_SEPARATES_%Y%m%d%H%M%S'), random.randint(0, 9999)) req.content_type = 'multipart/byteranges; boundary=%s' % boundary content_length = 0 for arange in ranges: content_length += len('--%s\r\n' % boundary) content_length += len('Content-Type: %s\r\n' % mime) content_length += len('Content-Range: bytes %d-%d/%d\r\n' % (arange[0], arange[0] + arange[1] - 1, size)) content_length += len('\r\n') content_length += arange[1] content_length += len('\r\n') content_length += len('--%s--\r\n' % boundary) req.set_content_length(content_length) req.send_http_header() if not req.header_only: for arange in ranges: req.write('--%s\r\n' % boundary, 0) req.write('Content-Type: %s\r\n' % mime, 0) req.write('Content-Range: bytes %d-%d/%d\r\n' % (arange[0], arange[0] + arange[1] - 1, size), 0) req.write('\r\n', 0) req.sendfile(fullpath, arange[0], arange[1]) req.write('\r\n', 0) req.write('--%s--\r\n' % boundary) req.flush() return "" def parse_date(date): """According to a date can come in three formats (in order of preference): Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123 Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036 Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format Moreover IE is adding some trailing information after a ';'. Wrong dates should be simpled ignored. This function return the time in seconds since the epoch GMT or None in case of errors.""" if not date: return None try: date = date.split(';')[0].strip() # Because of IE ## Sun, 06 Nov 1994 08:49:37 GMT return time.mktime(time.strptime(date, '%a, %d %b %Y %X %Z')) except: try: ## Sun, 06 Nov 1994 08:49:37 GMT return time.mktime(time.strptime(date, '%A, %d-%b-%y %H:%M:%S %Z')) except: try: ## Sun, 06 Nov 1994 08:49:37 GMT return time.mktime(date) except: return None def parse_ranges(ranges): """According to a (multiple) range request comes in the form: bytes=20-30,40-60,70-,-80 with the meaning: from byte to 20 to 30 inclusive (11 bytes) from byte to 40 to 60 inclusive (21 bytes) from byte 70 to (size - 1) inclusive (size - 70 bytes) from byte size - 80 to (size - 1) inclusive (80 bytes) This function will return the list of ranges in the form: [[first_byte, last_byte], ...] If first_byte or last_byte aren't specified they'll be set to None If the list is not well formatted it will return None """ try: if ranges.startswith('bytes') and '=' in ranges: ranges = ranges.split('=')[1].strip() else: return None ret = [] for arange in ranges.split(','): arange = arange.strip() if arange.startswith('-'): ret.append([None, int(arange[1:])]) elif arange.endswith('-'): ret.append([int(arange[:-1]), None]) else: ret.append(map(int, arange.split('-'))) return ret except: return None def parse_tags(tags): """Return a list of tags starting from a comma separated list.""" return [tag.strip() for tag in tags.split(',')] def fix_ranges(ranges, size): """Complementary to parse_ranges it will transform all the ranges into (first_byte, length), adjusting all the value based on the actual size provided. """ ret = [] for arange in ranges: if (arange[0] is None and arange[1] > 0) or arange[0] < size: if arange[0] is None: arange[0] = size - arange[1] elif arange[1] is None: arange[1] = size - arange[0] else: arange[1] = arange[1] - arange[0] + 1 arange[0] = max(0, arange[0]) arange[1] = min(size - arange[0], arange[1]) if arange[1] > 0: ret.append(arange) return ret def get_normalized_headers(headers): """Strip and lowerize all the keys of the headers dictionary plus strip, lowerize and transform known headers value into their value.""" ret = { 'if-match' : None, 'unless-modified-since' : None, 'if-modified-since' : None, 'range' : None, 'if-range' : None, 'if-none-match' : None, } for key, value in req.headers_in.iteritems(): key = key.strip().lower() value = value.strip() if key in ('unless-modified-since', 'if-modified-since'): value = parse_date(value) elif key == 'range': value = parse_ranges(value) elif key == 'if-range': value = parse_date(value) or parse_tags(value) elif key in ('if-match', 'if-none-match'): value = parse_tags(value) if value: ret[key] = value return ret if CFG_BIBDOCFILE_USE_XSENDFILE: ## If XSendFile is supported by the server, let's use it. if os.path.exists(fullpath): if fullname is None: fullname = os.path.basename(fullpath) req.headers_out["Content-Disposition"] = 'inline; filename="%s"' % fullname.replace('"', '\\"') req.headers_out["X-Sendfile"] = fullpath if mime is None: format = decompose_file(fullpath)[2] (mime, encoding) = _mimes.guess_type(fullpath) if mime is None: mime = "application/octet-stream" req.content_type = mime return "" else: raise apache.SERVER_RETURN, apache.HTTP_NOT_FOUND headers = get_normalized_headers(req.headers_in) if headers['if-match']: if etag is not None and etag not in headers['if-match']: raise apache.SERVER_RETURN, apache.HTTP_PRECONDITION_FAILED if os.path.exists(fullpath): mtime = os.path.getmtime(fullpath) if fullname is None: fullname = os.path.basename(fullpath) if mime is None: - format = decompose_file(fullpath)[2] (mime, encoding) = _mimes.guess_type(fullpath) if mime is None: mime = "application/octet-stream" if location is None: location = req.uri req.content_type = mime req.encoding = encoding req.filename = fullname req.headers_out["Last-Modified"] = time.strftime('%a, %d %b %Y %X GMT', time.gmtime(mtime)) req.headers_out["Accept-Ranges"] = "bytes" req.headers_out["Content-Location"] = location if etag is not None: req.headers_out["ETag"] = etag if md5 is not None: req.headers_out["Content-MD5"] = base64.encodestring(binascii.unhexlify(md5.upper()))[:-1] req.headers_out["Content-Disposition"] = 'inline; filename="%s"' % fullname.replace('"', '\\"') size = os.path.getsize(fullpath) if not size: try: raise Exception, '%s exists but is empty' % fullpath except Exception: register_exception(req=req, alert_admin=True) raise apache.SERVER_RETURN, apache.HTTP_NOT_FOUND if headers['if-modified-since'] and headers['if-modified-since'] >= mtime: raise apache.SERVER_RETURN, apache.HTTP_NOT_MODIFIED if headers['if-none-match']: if etag is not None and etag in headers['if-none-match']: raise apache.SERVER_RETURN, apache.HTTP_NOT_MODIFIED if headers['unless-modified-since'] and headers['unless-modified-since'] < mtime: return normal_streaming(size) if headers['range']: try: if headers['if-range']: if etag is None or etag not in headers['if-range']: return normal_streaming(size) ranges = fix_ranges(headers['range'], size) except: return normal_streaming(size) if len(ranges) > 1: return multiple_ranges(size, ranges, mime) elif ranges: return single_range(size, ranges[0]) else: raise apache.SERVER_RETURN, apache.HTTP_RANGE_NOT_SATISFIABLE else: return normal_streaming(size) else: raise apache.SERVER_RETURN, apache.HTTP_NOT_FOUND def stream_restricted_icon(req): """Return the content of the "Restricted Icon" file.""" stream_file(req, '%s/img/restricted.gif' % CFG_WEBDIR) raise apache.SERVER_RETURN, apache.DONE def list_types_from_array(bibdocs): """Retrieves the list of types from the given bibdoc list.""" types = [] for bibdoc in bibdocs: if not bibdoc.get_type() in types: types.append(bibdoc.get_type()) return types def list_versions_from_array(docfiles): """Retrieve the list of existing versions from the given docfiles list.""" versions = [] for docfile in docfiles: if not docfile.get_version() in versions: versions.append(docfile.get_version()) + versions.sort() + versions.reverse() return versions -def order_files_with_version(docfile1, docfile2): - """order docfile objects according to their version""" - version1 = docfile1.get_version() - version2 = docfile2.get_version() - return cmp(version2, version1) - def _make_base_dir(docid): """Given a docid it returns the complete path that should host its files.""" group = "g" + str(int(int(docid) / CFG_WEBSUBMIT_FILESYSTEM_BIBDOC_GROUP_LIMIT)) return os.path.join(CFG_WEBSUBMIT_FILEDIR, group, str(docid)) - class Md5Folder: """Manage all the Md5 checksum about a folder""" def __init__(self, folder): """Initialize the class from the md5 checksum of a given path""" self.folder = folder try: self.load() except InvenioWebSubmitFileError: self.md5s = {} self.update() def update(self, only_new = True): """Update the .md5 file with the current files. If only_new is specified then only not already calculated file are calculated.""" if not only_new: self.md5s = {} if os.path.exists(self.folder): for filename in os.listdir(self.folder): if filename not in self.md5s and not filename.startswith('.'): self.md5s[filename] = calculate_md5(os.path.join(self.folder, filename)) self.store() def store(self): """Store the current md5 dictionary into .md5""" try: old_umask = os.umask(022) md5file = open(os.path.join(self.folder, ".md5"), "w") for key, value in self.md5s.items(): md5file.write('%s *%s\n' % (value, key)) md5file.close() os.umask(old_umask) except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Encountered an exception while storing .md5 for folder '%s': '%s'" % (self.folder, e) def load(self): """Load .md5 into the md5 dictionary""" self.md5s = {} try: md5file = open(os.path.join(self.folder, ".md5"), "r") for row in md5file: md5hash = row[:32] filename = row[34:].strip() self.md5s[filename] = md5hash md5file.close() except IOError: self.update() except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Encountered an exception while loading .md5 for folder '%s': '%s'" % (self.folder, e) def check(self, filename = ''): """Check the specified file or all the files for which it exists a hash for being coherent with the stored hash.""" if filename and filename in self.md5s.keys(): try: return self.md5s[filename] == calculate_md5(os.path.join(self.folder, filename)) except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Encountered an exception while loading '%s': '%s'" % (os.path.join(self.folder, filename), e) else: for filename, md5hash in self.md5s.items(): try: if calculate_md5(os.path.join(self.folder, filename)) != md5hash: return False except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Encountered an exception while loading '%s': '%s'" % (os.path.join(self.folder, filename), e) return True def get_checksum(self, filename): """Return the checksum of a physical file.""" md5hash = self.md5s.get(filename, None) if md5hash is None: self.update() # Now it should not fail! md5hash = self.md5s[filename] return md5hash def calculate_md5_external(filename): """Calculate the md5 of a physical file through md5sum Command Line Tool. This is suitable for file larger than 256Kb.""" try: md5_result = os.popen(CFG_PATH_MD5SUM + ' -b %s' % escape_shell_arg(filename)) ret = md5_result.read()[:32] md5_result.close() if len(ret) != 32: # Error in running md5sum. Let's fallback to internal # algorithm. return calculate_md5(filename, force_internal=True) else: return ret except Exception, e: raise InvenioWebSubmitFileError, "Encountered an exception while calculating md5 for file '%s': '%s'" % (filename, e) def calculate_md5(filename, force_internal=False): """Calculate the md5 of a physical file. This is suitable for files smaller than 256Kb.""" if not CFG_PATH_MD5SUM or force_internal or os.path.getsize(filename) < CFG_BIBDOCFILE_MD5_THRESHOLD: try: to_be_read = open(filename, "rb") computed_md5 = md5() while True: buf = to_be_read.read(CFG_BIBDOCFILE_MD5_BUFFER) if buf: computed_md5.update(buf) else: break to_be_read.close() return computed_md5.hexdigest() except Exception, e: register_exception() raise InvenioWebSubmitFileError, "Encountered an exception while calculating md5 for file '%s': '%s'" % (filename, e) else: return calculate_md5_external(filename) def bibdocfile_url_to_bibrecdocs(url): """Given an URL in the form CFG_SITE_[SECURE_]URL/record/xxx/files/... it returns a BibRecDocs object for the corresponding recid.""" recid = decompose_bibdocfile_url(url)[0] return BibRecDocs(recid) def bibdocfile_url_to_bibdoc(url): """Given an URL in the form CFG_SITE_[SECURE_]URL/record/xxx/files/... it returns a BibDoc object for the corresponding recid/docname.""" docname = decompose_bibdocfile_url(url)[1] return bibdocfile_url_to_bibrecdocs(url).get_bibdoc(docname) def bibdocfile_url_to_bibdocfile(url): """Given an URL in the form CFG_SITE_[SECURE_]URL/record/xxx/files/... it returns a BibDocFile object for the corresponding recid/docname/format.""" dummy, dummy, format = decompose_bibdocfile_url(url) return bibdocfile_url_to_bibdoc(url).get_file(format) def bibdocfile_url_to_fullpath(url): """Given an URL in the form CFG_SITE_[SECURE_]URL/record/xxx/files/... it returns the fullpath for the corresponding recid/docname/format.""" return bibdocfile_url_to_bibdocfile(url).get_full_path() def bibdocfile_url_p(url): """Return True when the url is a potential valid url pointing to a fulltext owned by a system.""" if url.startswith('%s/getfile.py' % CFG_SITE_URL) or url.startswith('%s/getfile.py' % CFG_SITE_SECURE_URL): return True if not (url.startswith('%s/record/' % CFG_SITE_URL) or url.startswith('%s/record/' % CFG_SITE_SECURE_URL)): return False splitted_url = url.split('/files/') return len(splitted_url) == 2 and splitted_url[0] != '' and splitted_url[1] != '' +def get_docid_from_bibdocfile_fullpath(fullpath): + """Given a bibdocfile fullpath (e.g. "CFG_WEBSUBMIT_FILEDIR/g0/123/bar.pdf;1") + returns the docid (e.g. 123).""" + if not fullpath.startswith(os.path.join(CFG_WEBSUBMIT_FILEDIR, 'g')): + raise InvenioWebSubmitFileError, "Fullpath %s doesn't correspond to a valid bibdocfile fullpath" % fullpath + dirname, base, extension, version = decompose_file_with_version(fullpath) + try: + return int(dirname.split('/')[-1]) + except: + raise InvenioWebSubmitFileError, "Fullpath %s doesn't correspond to a valid bibdocfile fullpath" % fullpath + +def decompose_bibdocfile_fullpath(fullpath): + """Given a bibdocfile fullpath (e.g. "CFG_WEBSUBMIT_FILEDIR/g0/123/bar.pdf;1") + returns a quadruple (recid, docname, format, version).""" + if not fullpath.startswith(os.path.join(CFG_WEBSUBMIT_FILEDIR, 'g')): + raise InvenioWebSubmitFileError, "Fullpath %s doesn't correspond to a valid bibdocfile fullpath" % fullpath + dirname, base, extension, version = decompose_file_with_version(fullpath) + try: + docid = int(dirname.split('/')[-1]) + bibdoc = BibDoc(docid) + recid = bibdoc.get_recid() + docname = bibdoc.get_docname() + return recid, docname, extension, version + except: + raise InvenioWebSubmitFileError, "Fullpath %s doesn't correspond to a valid bibdocfile fullpath" % fullpath + def decompose_bibdocfile_url(url): """Given a bibdocfile_url return a triple (recid, docname, format).""" if url.startswith('%s/getfile.py' % CFG_SITE_URL) or url.startswith('%s/getfile.py' % CFG_SITE_SECURE_URL): return decompose_bibdocfile_very_old_url(url) if url.startswith('%s/record/' % CFG_SITE_URL): recid_file = url[len('%s/record/' % CFG_SITE_URL):] elif url.startswith('%s/record/' % CFG_SITE_SECURE_URL): recid_file = url[len('%s/record/' % CFG_SITE_SECURE_URL):] else: raise InvenioWebSubmitFileError, "Url %s doesn't correspond to a valid record inside the system." % url recid_file = recid_file.replace('/files/', '/') recid, docname, format = decompose_file(urllib.unquote(recid_file)) if not recid and docname.isdigit(): ## If the URL was something similar to CFG_SITE_URL/record/123 return (int(docname), '', '') return (int(recid), docname, format) re_bibdocfile_old_url = re.compile(r'/record/(\d*)/files/') def decompose_bibdocfile_old_url(url): """Given a bibdocfile old url (e.g. CFG_SITE_URL/record/123/files) it returns the recid.""" g = re_bibdocfile_old_url.search(url) if g: return int(g.group(1)) raise InvenioWebSubmitFileError('%s is not a valid old bibdocfile url' % url) def decompose_bibdocfile_very_old_url(url): """Decompose an old /getfile.py? URL""" if url.startswith('%s/getfile.py' % CFG_SITE_URL) or url.startswith('%s/getfile.py' % CFG_SITE_SECURE_URL): params = urllib.splitquery(url)[1] if params: try: params = cgi.parse_qs(params) if 'docid' in params: docid = int(params['docid'][0]) bibdoc = BibDoc(docid) recid = bibdoc.get_recid() docname = bibdoc.get_docname() elif 'recid' in params: recid = int(params['recid'][0]) if 'name' in params: docname = params['name'][0] else: docname = '' else: raise InvenioWebSubmitFileError('%s has not enough params to correspond to a bibdocfile.' % url) format = normalize_format(params.get('format', [''])[0]) return (recid, docname, format) except Exception, e: raise InvenioWebSubmitFileError('Problem with %s: %s' % (url, e)) else: raise InvenioWebSubmitFileError('%s has no params to correspond to a bibdocfile.' % url) else: raise InvenioWebSubmitFileError('%s is not a valid very old bibdocfile url' % url) -def nice_size(size): - """Return a nicely printed size in kilo.""" - unit = 'B' - if size > 1024: - size /= 1024.0 - unit = 'KB' - if size > 1024: - size /= 1024.0 - unit = 'MB' - if size > 1024: - size /= 1024.0 - unit = 'GB' - return '%s %s' % (websearch_templates.tmpl_nice_number(size, max_ndigits_after_dot=2), unit) - def get_docname_from_url(url): """Return a potential docname given a url""" path = urllib2.urlparse.urlsplit(urllib.unquote(url))[2] filename = os.path.split(path)[-1] return file_strip_ext(filename) def get_format_from_url(url): """Return a potential format given a url""" path = urllib2.urlparse.urlsplit(urllib.unquote(url))[2] filename = os.path.split(path)[-1] return filename[len(file_strip_ext(filename)):] def clean_url(url): """Given a local url e.g. a local path it render it a realpath.""" protocol = urllib2.urlparse.urlsplit(url)[0] if protocol in ('', 'file'): path = urllib2.urlparse.urlsplit(urllib.unquote(url))[2] return os.path.abspath(path) else: return url +def is_url_a_local_file(url): + """Return True if the given URL is pointing to a local file.""" + protocol = urllib2.urlparse.urlsplit(url)[0] + return protocol in ('', 'file') + def check_valid_url(url): """Check for validity of a url or a file.""" try: - protocol = urllib2.urlparse.urlsplit(url)[0] - if protocol in ('', 'file'): + if is_url_a_local_file(url): path = urllib2.urlparse.urlsplit(urllib.unquote(url))[2] if os.path.abspath(path) != path: raise StandardError, "%s is not a normalized path (would be %s)." % (path, os.path.normpath(path)) for allowed_path in CFG_BIBUPLOAD_FFT_ALLOWED_LOCAL_PATHS + [CFG_TMPDIR, CFG_WEBSUBMIT_STORAGEDIR]: if path.startswith(allowed_path): dummy_fd = open(path) dummy_fd.close() return raise StandardError, "%s is not in one of the allowed paths." % path else: urllib2.urlopen(url) except Exception, e: raise StandardError, "%s is not a correct url: %s" % (url, e) def safe_mkstemp(suffix): """Create a temporary filename that don't have any '.' inside a part from the suffix.""" tmpfd, tmppath = tempfile.mkstemp(suffix=suffix, dir=CFG_TMPDIR) if '.' not in suffix: # Just in case format is empty return tmpfd, tmppath while '.' in os.path.basename(tmppath)[:-len(suffix)]: os.close(tmpfd) os.remove(tmppath) tmpfd, tmppath = tempfile.mkstemp(suffix=suffix, dir=CFG_TMPDIR) return (tmpfd, tmppath) -def download_url(url, format, user=None, password=None, sleep=2): +def download_url(url, format=None, sleep=2): """Download a url (if it corresponds to a remote file) and return a local url to it.""" - class my_fancy_url_opener(urllib.FancyURLopener): - def __init__(self, user, password): - urllib.FancyURLopener.__init__(self) - self.fancy_user = user - self.fancy_password = password - - def prompt_user_passwd(self, host, realm): - return (self.fancy_user, self.fancy_password) - - format = normalize_format(format) + if format is None: + format = decompose_file(url)[2] + else: + format = normalize_format(format) protocol = urllib2.urlparse.urlsplit(url)[0] tmpfd, tmppath = safe_mkstemp(format) try: try: if protocol in ('', 'file'): path = urllib2.urlparse.urlsplit(urllib.unquote(url))[2] if os.path.abspath(path) != path: raise StandardError, "%s is not a normalized path (would be %s)." % (path, os.path.normpath(path)) for allowed_path in CFG_BIBUPLOAD_FFT_ALLOWED_LOCAL_PATHS + [CFG_TMPDIR, CFG_WEBSUBMIT_STORAGEDIR]: if path.startswith(allowed_path): shutil.copy(path, tmppath) if os.path.getsize(tmppath) > 0: return tmppath else: raise StandardError, "%s seems to be empty" % url raise StandardError, "%s is not in one of the allowed paths." % path else: - if user is not None: - urlopener = my_fancy_url_opener(user, password) - urlopener.retrieve(url, tmppath) - else: - urllib.urlretrieve(url, tmppath) - #cmd_exit_code, cmd_out, cmd_err = run_shell_command(CFG_PATH_WGET + ' %s -O %s -t 2 -T 40', - #(url, tmppath)) - #if cmd_exit_code: - #raise StandardError, "It's impossible to download %s: %s" % (url, cmd_err) + try: + from_file = urllib2.urlopen(url) + to_file = open(tmppath, 'w') + while True: + block = from_file.read(CFG_BIBDOCFILE_BLOCK_SIZE) + if not block: + break + to_file.write(block) + to_file.close() + from_file.close() + except Exception, e: + raise StandardError, "Error when downloading %s into %s: %s" % (url, tmppath, e) if os.path.getsize(tmppath) > 0: return tmppath else: raise StandardError, "%s seems to be empty" % url except: os.remove(tmppath) raise finally: os.close(tmpfd) class BibDocMoreInfo: - """Class to wrap the serialized bibdoc more_info. At the moment - it stores descriptions and comments for each .""" + """ + This class wraps contextual information of the documents, such as the + - comments + - descriptions + - flags. + Such information is kept separately per every format/version instance of + the corresponding document and is searialized in the database, ready + to be retrieved (but not searched). + + @param docid: the document identifier. + @type docid: integer + @param more_info: a serialized version of an already existing more_info + object. If not specified this information will be readed from the + database, and othewise an empty dictionary will be allocated. + @raise ValueError: if docid is not a positive integer. + @ivar docid: the document identifier as passed to the constructor. + @type docid: integer + @ivar more_info: the more_info dictionary that will hold all the + additional document information. + @type more_info: dict of dict of dict + @note: in general this class is never instanciated in client code and + never used outside bibdocfile module. + @note: this class will be extended in the future to hold all the new auxiliary + information about a document. + """ def __init__(self, docid, more_info=None): - try: - assert(type(docid) in (long, int) and docid > 0) - self.docid = docid - try: - if more_info is None: - res = run_sql('SELECT more_info FROM bibdoc WHERE id=%s', (docid, )) - if res and res[0][0]: - self.more_info = cPickle.loads(blob_to_string(res[0][0])) - else: - self.more_info = {} - else: - self.more_info = cPickle.loads(more_info) - except: + if not (type(docid) in (long, int) and docid > 0): + raise ValueError("docid is not a positive integer, but %s." % docid) + self.docid = docid + if more_info is None: + res = run_sql('SELECT more_info FROM bibdoc WHERE id=%s', (docid, )) + if res and res[0][0]: + self.more_info = cPickle.loads(blob_to_string(res[0][0])) + else: self.more_info = {} - if 'descriptions' not in self.more_info: - self.more_info['descriptions'] = {} - if 'comments' not in self.more_info: - self.more_info['comments'] = {} - if 'hidden' not in self.more_info: - self.more_info['hidden'] = {} - except: - register_exception() - raise + else: + self.more_info = cPickle.loads(more_info) + if 'descriptions' not in self.more_info: + self.more_info['descriptions'] = {} + if 'comments' not in self.more_info: + self.more_info['comments'] = {} + if 'flags' not in self.more_info: + self.more_info['flags'] = {} + + def __repr__(self): + """ + @return: the canonical string representation of the C{BibDocMoreInfo}. + @rtype: string + """ + return 'BibDocMoreInfo(%i, %s)' % (self.docid, repr(cPickle.dumps(self.more_info))) def flush(self): - """if __dirty is True reserialize di DB.""" + """ + Flush this object to the database. + """ run_sql('UPDATE bibdoc SET more_info=%s WHERE id=%s', (cPickle.dumps(self.more_info), self.docid)) + def set_flag(self, flagname, format, version): + """ + Sets a flag. + + @param flagname: the flag to set (see + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}). + @type flagname: string + @param format: the format for which the flag should set. + @type format: string + @param version: the version for which the flag should set: + @type version: integer + @raise ValueError: if the flag is not in + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS} + """ + if flagname in CFG_BIBDOCFILE_AVAILABLE_FLAGS: + if not flagname in self.more_info['flags']: + self.more_info['flags'][flagname] = {} + if not version in self.more_info['flags'][flagname]: + self.more_info['flags'][flagname][version] = {} + if not format in self.more_info['flags'][flagname][version]: + self.more_info['flags'][flagname][version][format] = {} + self.more_info['flags'][flagname][version][format] = True + self.flush() + else: + raise ValueError, "%s is not in %s" % (flagname, CFG_BIBDOCFILE_AVAILABLE_FLAGS) + def get_comment(self, format, version): - """Return the comment corresponding to the given docid/format/version.""" + """ + Returns the specified comment. + + @param format: the format for which the comment should be + retrieved. + @type format: string + @param version: the version for which the comment should be + retrieved. + @type version: integer + @return: the specified comment. + @rtype: string + """ try: assert(type(version) is int) format = normalize_format(format) return self.more_info['comments'].get(version, {}).get(format) except: register_exception() raise def get_description(self, format, version): - """Return the description corresponding to the given docid/format/version.""" + """ + Returns the specified description. + + @param format: the format for which the description should be + retrieved. + @type format: string + @param version: the version for which the description should be + retrieved. + @type version: integer + @return: the specified description. + @rtype: string + """ try: assert(type(version) is int) format = normalize_format(format) return self.more_info['descriptions'].get(version, {}).get(format) except: register_exception() raise - def hidden_p(self, format, version): - """Is the format/version hidden?""" - try: - assert(type(version) is int) - format = normalize_format(format) - return self.more_info['hidden'].get(version, {}).get(format, False) - except: - register_exception() - raise + def has_flag(self, flagname, format, version): + """ + Return True if the corresponding has been set. + + @param flagname: the name of the flag (see + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}). + @type flagname: string + @param format: the format for which the flag should be checked. + @type format: string + @param version: the version for which the flag should be checked. + @type version: integer + @return: True if the flag is set for the given format/version. + @rtype: bool + @raise ValueError: if the flagname is not in + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS} + """ + if flagname in CFG_BIBDOCFILE_AVAILABLE_FLAGS: + return self.more_info['flags'].get(flagname, {}).get(version, {}).get(format, False) + else: + raise ValueError, "%s is not in %s" % (flagname, CFG_BIBDOCFILE_AVAILABLE_FLAGS) + + def get_flags(self, format, version): + """ + Return the list of all the enabled flags. + + @param format: the format for which the list should be returned. + @type format: string + @param version: the version for which the list should be returned. + @type version: integer + @return: the list of enabled flags (from + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}). + @rtype: list of string + """ + return [flag for flag in self.more_info['flags'] if format in self.more_info['flags'][flag].get(version, {})] def set_comment(self, comment, format, version): - """Store a comment corresponding to the given docid/format/version.""" + """ + Set a comment. + + @param comment: the comment to be set. + @type comment: string + @param format: the format for which the comment should be set. + @type format: string + @param version: the version for which the comment should be set: + @type version: integer + """ try: assert(type(version) is int and version > 0) format = normalize_format(format) if comment == KEEP_OLD_VALUE: comment = self.get_comment(format, version) or self.get_comment(format, version - 1) if not comment: self.unset_comment(format, version) self.flush() return if not version in self.more_info['comments']: self.more_info['comments'][version] = {} self.more_info['comments'][version][format] = comment self.flush() except: register_exception() raise def set_description(self, description, format, version): - """Store a description corresponding to the given docid/format/version.""" + """ + Set a description. + + @param description: the description to be set. + @type description: string + @param format: the format for which the description should be set. + @type format: string + @param version: the version for which the description should be set: + @type version: integer + """ try: assert(type(version) is int and version > 0) format = normalize_format(format) if description == KEEP_OLD_VALUE: description = self.get_description(format, version) or self.get_description(format, version - 1) if not description: self.unset_description(format, version) self.flush() return if not version in self.more_info['descriptions']: self.more_info['descriptions'][version] = {} self.more_info['descriptions'][version][format] = description self.flush() except: register_exception() raise - def set_hidden(self, hidden, format, version): - """Store wethever the docid/format/version is hidden.""" - try: - assert(type(version) is int and version > 0) - format = normalize_format(format) - if not hidden: - self.unset_hidden(format, version) - self.flush() - return - if not version in self.more_info['hidden']: - self.more_info['hidden'][version] = {} - self.more_info['hidden'][version][format] = hidden - self.flush() - except: - register_exception() - raise - def unset_comment(self, format, version): - """Remove a comment.""" + """ + Unset a comment. + + @param format: the format for which the comment should be unset. + @type format: string + @param version: the version for which the comment should be unset: + @type version: integer + """ try: assert(type(version) is int and version > 0) del self.more_info['comments'][version][format] self.flush() except KeyError: pass except: register_exception() raise def unset_description(self, format, version): - """Remove a description.""" + """ + Unset a description. + + @param format: the format for which the description should be unset. + @type format: string + @param version: the version for which the description should be unset: + @type version: integer + """ try: assert(type(version) is int and version > 0) del self.more_info['descriptions'][version][format] self.flush() except KeyError: pass except: register_exception() raise - def unset_hidden(self, format, version): - """Remove hidden flag.""" - try: - assert(type(version) is int and version > 0) - del self.more_info['hidden'][version][format] - self.flush() - except KeyError: - pass - except: - register_exception() - raise + def unset_flag(self, flagname, format, version): + """ + Unset a flag. + + @param flagname: the flag to be unset (see + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS}). + @type flagname: string + @param format: the format for which the flag should be unset. + @type format: string + @param version: the version for which the flag should be unset: + @type version: integer + @raise ValueError: if the flag is not in + L{CFG_BIBDOCFILE_AVAILABLE_FLAGS} + """ + if flagname in CFG_BIBDOCFILE_AVAILABLE_FLAGS: + try: + del self.more_info['flags'][flagname][version][format] + self.flush() + except KeyError: + pass + else: + raise ValueError, "%s is not in %s" % (flagname, CFG_BIBDOCFILE_AVAILABLE_FLAGS) def serialize(self): - """Return the serialized version of the more_info.""" + """ + @return: the serialized version of this object. + @rtype: string + """ return cPickle.dumps(self.more_info) def readfile(filename): - """Try to read a file. Return '' in case of any error. - This function is useful for quick implementation of websubmit functions. + """ + Read a file. + + @param filename: the name of the file to be read. + @type filename: string + @return: the text contained in the file. + @rtype: string + @note: Returns empty string in case of any error. + @note: this function is useful for quick implementation of websubmit + functions. """ try: - fd = open(filename) - content = fd.read() - fd.close() - return content - except: + return open(filename).read() + except Exception: return '' diff --git a/modules/websubmit/lib/bibdocfile_regression_tests.py b/modules/websubmit/lib/bibdocfile_regression_tests.py index aabb42ebf..50e96d83d 100644 --- a/modules/websubmit/lib/bibdocfile_regression_tests.py +++ b/modules/websubmit/lib/bibdocfile_regression_tests.py @@ -1,238 +1,243 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """BibDocFile Regression Test Suite.""" __revision__ = "$Id$" import unittest from invenio.testutils import make_test_suite, run_test_suite from invenio.bibdocfile import BibRecDocs#, BibDoc, BibDocFile from invenio.config import \ CFG_SITE_URL, \ - CFG_PREFIX, \ - CFG_WEBSUBMIT_FILEDIR + CFG_PREFIX, \ + CFG_WEBSUBMIT_FILEDIR class BibRecDocsTest(unittest.TestCase): """regression tests about BibRecDocs""" def test_BibRecDocs(self): """bibdocfile - BibRecDocs functions""" my_bibrecdoc = BibRecDocs(2) #add bibdoc my_bibrecdoc.add_new_file(CFG_PREFIX + '/lib/webtest/invenio/test.jpg', 'Main', 'img_test', False, 'test add new file', 'test', '.jpg') my_bibrecdoc.add_bibdoc(doctype='Main', docname='file', never_fail=False) self.assertEqual(len(my_bibrecdoc.list_bibdocs()), 3) my_added_bibdoc = my_bibrecdoc.get_bibdoc('file') #add bibdocfile in empty bibdoc my_added_bibdoc.add_file_new_version(CFG_PREFIX + '/lib/webtest/invenio/test.gif', \ - description= 'added in empty bibdoc', comment=None, format=None, hide_previous_versions=False) + description= 'added in empty bibdoc', comment=None, format=None, flags=['PERFORM_HIDE_PREVIOUS']) #propose unique docname self.assertEqual(my_bibrecdoc.propose_unique_docname('file'), 'file_2') #has docname self.assertEqual(my_bibrecdoc.has_docname_p('file'), True) #merge 2 bibdocs my_bibrecdoc.merge_bibdocs('img_test', 'file') self.assertEqual(len(my_bibrecdoc.get_bibdoc("img_test").list_all_files()), 2) #check file exists self.assertEqual(my_bibrecdoc.check_file_exists(CFG_PREFIX + '/lib/webtest/invenio/test.jpg'), True) #get bibdoc names self.assertEqual(my_bibrecdoc.get_bibdoc_names('Main')[0], '0104007_02') self.assertEqual(my_bibrecdoc.get_bibdoc_names('Main')[1],'img_test') #get total size - self.assertEqual(my_bibrecdoc.get_total_size(), 1628647) + self.assertEqual(my_bibrecdoc.get_total_size(), 1647591) #get total size latest version - self.assertEqual(my_bibrecdoc.get_total_size_latest_version(), 1628647) + self.assertEqual(my_bibrecdoc.get_total_size_latest_version(), 1647591) #display value = my_bibrecdoc.display(docname='img_test', version='', doctype='', ln='en', verbose=0, display_hidden=True) self.assert_("Main" in value) #get xml 8564 value = my_bibrecdoc.get_xml_8564() self.assert_('/record/2/files/img_test.jpg' in value) #check duplicate docnames self.assertEqual(my_bibrecdoc.check_duplicate_docnames(), True) + + def tearDown(self): + my_bibrecdoc = BibRecDocs(2) #delete my_bibrecdoc.delete_bibdoc('img_test') - + my_bibrecdoc.delete_bibdoc('file') class BibDocsTest(unittest.TestCase): """regression tests about BibDocs""" def test_BibDocs(self): - """bibdocfile - Bibdocs functions""" + """bibdocfile - BibDocs functions""" #add file my_bibrecdoc = BibRecDocs(2) my_bibrecdoc.add_new_file(CFG_PREFIX + '/lib/webtest/invenio/test.jpg', 'Main', 'img_test', False, 'test add new file', 'test', '.jpg') my_new_bibdoc = my_bibrecdoc.get_bibdoc("img_test") value = my_bibrecdoc.list_bibdocs() self.assertEqual(len(value), 2) #get total file (bibdoc) self.assertEqual(my_new_bibdoc.get_total_size(), 91750) #get recid self.assertEqual(my_new_bibdoc.get_recid(), 2) #change name my_new_bibdoc.change_name('new_name') #get docname self.assertEqual(my_new_bibdoc.get_docname(), 'new_name') #get type self.assertEqual(my_new_bibdoc.get_type(), 'Main') #get id self.assert_(my_new_bibdoc.get_id() > 80) #set status my_new_bibdoc.set_status('new status') #get status self.assertEqual(my_new_bibdoc.get_status(), 'new status') #get base directory self.assert_(my_new_bibdoc.get_base_dir().startswith(CFG_WEBSUBMIT_FILEDIR)) #get file number self.assertEqual(my_new_bibdoc.get_file_number(), 1) #add file new version - my_new_bibdoc.add_file_new_version(CFG_PREFIX + '/lib/webtest/invenio/test.jpg', description= 'the new version', comment=None, format=None, hide_previous_versions=False) + my_new_bibdoc.add_file_new_version(CFG_PREFIX + '/lib/webtest/invenio/test.jpg', description= 'the new version', comment=None, format=None, flags=["PERFORM_HIDE_PREVIOUS"]) self.assertEqual(my_new_bibdoc.list_versions(), [1, 2]) #revert my_new_bibdoc.revert(1) self.assertEqual(my_new_bibdoc.list_versions(), [1, 2, 3]) self.assertEqual(my_new_bibdoc.get_description('.jpg', version=3), 'test add new file') #get total size latest version self.assertEqual(my_new_bibdoc.get_total_size_latest_version(), 91750) #get latest version self.assertEqual(my_new_bibdoc.get_latest_version(), 3) #list latest files self.assertEqual(len(my_new_bibdoc.list_latest_files()), 1) self.assertEqual(my_new_bibdoc.list_latest_files()[0].get_version(), 3) #list version files self.assertEqual(len(my_new_bibdoc.list_version_files(1, list_hidden=True)), 1) #display value = my_new_bibdoc.display(version='', ln='en', display_hidden=True) self.assert_('>test add new file<' in value) #format already exist self.assertEqual(my_new_bibdoc.format_already_exists_p('.jpg'), True) #get file self.assertEqual(my_new_bibdoc.get_file('.jpg', version='1').get_version(), 1) #set description my_new_bibdoc.set_description('new description', '.jpg', version=1) #get description self.assertEqual(my_new_bibdoc.get_description('.jpg', version=1), 'new description') #set comment my_new_bibdoc.set_description('new comment', '.jpg', version=1) #get comment self.assertEqual(my_new_bibdoc.get_description('.jpg', version=1), 'new comment') #get history assert len(my_new_bibdoc.get_history()) > 0 #delete file my_new_bibdoc.delete_file('.jpg', 2) #list all files self.assertEqual(len(my_new_bibdoc.list_all_files()), 2) #delete file my_new_bibdoc.delete_file('.jpg', 3) #add new format my_new_bibdoc.add_file_new_format(CFG_PREFIX + '/lib/webtest/invenio/test.gif', version=None, description=None, comment=None, format=None) self.assertEqual(len(my_new_bibdoc.list_all_files()), 2) #delete file my_new_bibdoc.delete_file('.jpg', 1) #delete file my_new_bibdoc.delete_file('.gif', 1) #empty bibdoc self.assertEqual(my_new_bibdoc.empty_p(), True) #hidden? self.assertEqual(my_new_bibdoc.hidden_p('.jpg', version=1), False) #hide - my_new_bibdoc.set_hidden(True, '.jpg', version=1) + my_new_bibdoc.set_flag('HIDDEN', '.jpg', version=1) #hidden? self.assertEqual(my_new_bibdoc.hidden_p('.jpg', version=1), True) #add and get icon my_new_bibdoc.add_icon( CFG_PREFIX + '/lib/webtest/invenio/icon-test.gif', basename=None, format=None) value = my_bibrecdoc.list_bibdocs()[1] self.assertEqual(value.get_icon(), my_new_bibdoc.get_icon()) #delete icon my_new_bibdoc.delete_icon() #get icon self.assertEqual(my_new_bibdoc.get_icon(), None) - #icon_p - self.assertEqual(my_new_bibdoc.icon_p(), False) #delete my_new_bibdoc.delete() self.assertEqual(my_new_bibdoc.deleted_p(), True) #undelete my_new_bibdoc.undelete(previous_status='') + + def tearDown(self): + my_bibrecdoc = BibRecDocs(2) #delete - my_bibrecdoc.delete_bibdoc('new-name') + my_bibrecdoc.delete_bibdoc('img_test') + my_bibrecdoc.delete_bibdoc('new_name') class BibDocFilesTest(unittest.TestCase): """regression tests about BibDocFiles""" def test_BibDocFiles(self): - """bibdocfile - BibdocFiles functions """ + """bibdocfile - BibDocFile functions """ #add bibdoc my_bibrecdoc = BibRecDocs(2) my_bibrecdoc.add_new_file(CFG_PREFIX + '/lib/webtest/invenio/test.jpg', 'Main', 'img_test', False, 'test add new file', 'test', '.jpg') my_new_bibdoc = my_bibrecdoc.get_bibdoc("img_test") my_new_bibdocfile = my_new_bibdoc.list_all_files()[0] #get url self.assertEqual(my_new_bibdocfile.get_url(), CFG_SITE_URL + '/record/2/files/img_test.jpg') #get type self.assertEqual(my_new_bibdocfile.get_type(), 'Main') #get path self.assert_(my_new_bibdocfile.get_path().startswith(CFG_WEBSUBMIT_FILEDIR)) self.assert_(my_new_bibdocfile.get_path().endswith('/img_test.jpg;1')) #get bibdocid self.assertEqual(my_new_bibdocfile.get_bibdocid(), my_new_bibdoc.get_id()) #get name self.assertEqual(my_new_bibdocfile.get_name() , 'img_test') #get full name self.assertEqual(my_new_bibdocfile.get_full_name() , 'img_test.jpg') #get full path self.assert_(my_new_bibdocfile.get_full_path().startswith(CFG_WEBSUBMIT_FILEDIR)) self.assert_(my_new_bibdocfile.get_full_path().endswith('/img_test.jpg;1')) #get format self.assertEqual(my_new_bibdocfile.get_format(), '.jpg') #get version self.assertEqual(my_new_bibdocfile.get_version(), 1) #get description self.assertEqual(my_new_bibdocfile.get_description(), my_new_bibdoc.get_description('.jpg', version=1)) #get comment self.assertEqual(my_new_bibdocfile.get_comment(), my_new_bibdoc.get_comment('.jpg', version=1)) #get recid self.assertEqual(my_new_bibdocfile.get_recid(), 2) #get status self.assertEqual(my_new_bibdocfile.get_status(), '') #get size self.assertEqual(my_new_bibdocfile.get_size(), 91750) #get checksum self.assertEqual(my_new_bibdocfile.get_checksum(), '28ec893f9da735ad65de544f71d4ad76') #check self.assertEqual(my_new_bibdocfile.check(), True) #display value = my_new_bibdocfile.display(ln='en') assert 'files/img_test.jpg?version=1">' in value #hidden? self.assertEqual(my_new_bibdocfile.hidden_p(), False) #delete my_new_bibdoc.delete() self.assertEqual(my_new_bibdoc.deleted_p(), True) TEST_SUITE = make_test_suite(BibRecDocsTest, \ BibDocsTest, \ BibDocFilesTest) if __name__ == "__main__": run_test_suite(TEST_SUITE, warn_user=True) diff --git a/modules/websubmit/lib/bibdocfilecli.py b/modules/websubmit/lib/bibdocfilecli.py index d0d264459..1376e44cc 100644 --- a/modules/websubmit/lib/bibdocfilecli.py +++ b/modules/websubmit/lib/bibdocfilecli.py @@ -1,689 +1,1086 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ BibDocAdmin CLI administration tool """ __revision__ = "$Id$" import sys +import re import os import time import fnmatch -from optparse import OptionParser, OptionGroup +import datetime +from logging import getLogger, debug, DEBUG +from optparse import OptionParser, OptionGroup, OptionValueError from tempfile import mkstemp -from invenio.config import CFG_TMPDIR +from invenio.errorlib import register_exception +from invenio.config import CFG_TMPDIR, CFG_SITE_URL, CFG_WEBSUBMIT_FILEDIR from invenio.bibdocfile import BibRecDocs, BibDoc, InvenioWebSubmitFileError, \ nice_size, check_valid_url, clean_url, get_docname_from_url, \ - get_format_from_url, KEEP_OLD_VALUE + guess_format_from_url, KEEP_OLD_VALUE, decompose_bibdocfile_fullpath, \ + bibdocfile_url_to_bibdoc, decompose_bibdocfile_url, \ + CFG_BIBDOCFILE_AVAILABLE_FLAGS from invenio.intbitset import intbitset from invenio.search_engine import perform_request_search from invenio.textutils import wrap_text_in_a_box, wait_for_user from invenio.dbquery import run_sql from invenio.bibtask import task_low_level_submission from invenio.textutils import encode_for_xml +from invenio.websubmit_file_converter import can_perform_ocr def _xml_mksubfield(key, subfield, fft): - return fft.get(key, None) and '\t\t%s\n' % (subfield, encode_for_xml(fft[key])) or '' + return fft.get(key, None) is not None and '\t\t%s\n' % (subfield, encode_for_xml(str(fft[key]))) or '' + +def _xml_mksubfields(key, subfield, fft): + ret = "" + for value in fft.get(key, []): + ret += '\t\t%s\n' % (subfield, encode_for_xml(string(value))) + return ret def _xml_fft_creator(fft): """Transform an fft dictionary (made by keys url, docname, format, - new_docname, icon, comment, description, restriction, doctype, into an xml + new_docname, comment, description, restriction, doctype, into an xml string.""" + debug('Input FFT structure: %s' % fft) out = '\t\n' out += _xml_mksubfield('url', 'a', fft) out += _xml_mksubfield('docname', 'n', fft) out += _xml_mksubfield('format', 'f', fft) - out += _xml_mksubfield('newdocname', 'm', fft) + out += _xml_mksubfield('new_docname', 'm', fft) out += _xml_mksubfield('doctype', 't', fft) out += _xml_mksubfield('description', 'd', fft) out += _xml_mksubfield('comment', 'z', fft) out += _xml_mksubfield('restriction', 'r', fft) - out += _xml_mksubfield('icon', 'x', fft) - out += _xml_mksubfield('options', 'o', fft) + out += _xml_mksubfields('options', 'o', fft) + out += _xml_mksubfield('version', 'v', fft) out += '\t\n' + debug('FFT created: %s' % out) return out def ffts_to_xml(ffts_dict): """Transform a dictionary: recid -> ffts where ffts is a list of fft dictionary into xml. """ + debug('Input FFTs dictionary: %s' % ffts_dict) out = '' recids = ffts_dict.keys() recids.sort() for recid in recids: ffts = ffts_dict[recid] if ffts: out += '\n' out += '\t%i\n' % recid for fft in ffts: out += _xml_fft_creator(fft) out += '\n' + debug('MARC to Upload: %s' % out) return out -_actions = [('get-info', 'print all the informations about the record/bibdoc/file structure'), - #'get-stats', - ('get-disk-usage', 'print statistics about usage disk usage'), - ('get-docnames', 'print the document docnames'), - #'get-docids', - #'get-recids', - #'get-doctypes', - #'get-revisions', - #'get-last-revisions', - #'get-formats', - #'get-comments', - #'get-descriptions', - #'get-restrictions', - #'get-icons', - ('get-history', 'print the document history'), - ('delete', 'delete the specified docname'), - ('undelete', 'undelete the specified docname'), - #'purge', - #'expunge', - ('check-md5', 'check md5 checksum validity of files'), - ('check-format', 'check if any format-related inconsistences exists'), - ('check-duplicate-docnames', 'check for duplicate docnames associated with the same record'), - ('update-md5', 'update md5 checksum of files'), - ('fix-all', 'fix inconsistences in filesystem vs database vs MARC'), - ('fix-marc', 'synchronize MARC after filesystem/database'), - ('fix-format', 'fix format related inconsistences'), - ('fix-duplicate-docnames', 'fix duplicate docnames associated with the same record')] - -_actions_with_parameter = { - #'set-doctype' : 'doctype', - #'set-docname' : 'docname', - #'set-comment' : 'comment', - #'set-description' : 'description', - #'set-restriction' : 'restriction', - 'append' : ('append_path', 'specify the URL/path of the file that will appended to the bibdoc'), - 'revise' : ('revise_path', 'specify the URL/path of the file that will revise the bibdoc'), - 'revise_hide_previous' : ('revise_hide_path', 'specify the URL/path of the file that will revise the bibdoc, previous revisions will be hidden'), - 'merge-into' : ('into_docname', 'merge the docname speficied --docname into_docname'), -} +_shift_re = re.compile("([-\+]{0,1})([\d]+)([dhms])") +def _parse_datetime(var): + """Returns a date string according to the format string. + It can handle normal date strings and shifts with respect + to now.""" + if not var: + return None + date = time.time() + factors = {"d":24*3600, "h":3600, "m":60, "s":1} + m = _shift_re.match(var) + if m: + sign = m.groups()[0] == "-" and -1 or 1 + factor = factors[m.groups()[2]] + value = float(m.groups()[1]) + return datetime.datetime.fromtimestamp(date + sign * factor * value) + else: + return datetime.datetime.strptime(var, "%Y-%m-%d %H:%M:%S") + +def _parse_date_range(var): + """Returns the two dates contained as a low,high tuple""" + limits = var.split(",") + if len(limits)==1: + low = _parse_datetime(limits[0]) + return low, None + if len(limits)==2: + low = _parse_datetime(limits[0]) + high = _parse_datetime(limits[1]) + return low, high + return None, None + +def cli_quick_match_all_recids(options): + """Return an quickly an approximate but (by excess) list of good recids.""" + url = getattr(options, 'url', None) + if url: + return intbitset([decompose_bibdocfile_url(url)[0]]) + path = getattr(options, 'path', None) + if path: + return intbitset([decompose_bibdocfile_fullpath(path)[0]]) + collection = getattr(options, 'collection', None) + pattern = getattr(options, 'pattern', None) + recids = getattr(options, 'recids', None) + md_rec = getattr(options, 'md_rec', None) + cd_rec = getattr(options, 'cd_rec', None) + tmp_date_query = [] + tmp_date_params = [] + if recids is None: + debug('Initially considering all the recids') + recids = intbitset(run_sql('SELECT id FROM bibrec')) + if not recids: + print >> sys.stderr, 'WARNING: No record in the database' + if md_rec[0] is not None: + tmp_date_query.append('modification_date>=%s') + tmp_date_params.append(md_rec[0]) + if md_rec[1] is not None: + tmp_date_query.append('modification_date<=%s') + tmp_date_params.append(md_rec[1]) + if cd_rec[0] is not None: + tmp_date_query.append('creation_date>=%s') + tmp_date_params.append(cd_rec[0]) + if cd_rec[1] is not None: + tmp_date_query.append('creation_date<=%s') + tmp_date_params.append(cd_rec[1]) + if tmp_date_query: + tmp_date_query = ' AND '.join(tmp_date_query) + tmp_date_params = tuple(tmp_date_params) + query = 'SELECT id FROM bibrec WHERE %s' % tmp_date_query + debug('Query: %s, param: %s' % (query, tmp_date_params)) + recids &= intbitset(run_sql(query % tmp_date_query, tmp_date_params)) + debug('After applying dates we obtain recids: %s' % recids) + if not recids: + print >> sys.stderr, 'WARNING: Time constraints for records are too strict' + if collection or pattern: + recids &= intbitset(perform_request_search(cc=collection or '', p=pattern or '')) + debug('After applyings pattern and collection we obtain recids: %s' % recids) + debug('Quick recids: %s' % recids) + return recids + +def cli_quick_match_all_docids(options, recids=None): + """Return an quickly an approximate but (by excess) list of good docids.""" + url = getattr(options, 'url', None) + if url: + return intbitset([bibdocfile_url_to_bibdoc(url).get_id()]) + path = getattr(options, 'path', None) + if path: + return intbitset([decompose_bibdocfile_fullpath(path)[0]]) + + deleted_docs = getattr(options, 'deleted_docs', None) + action_undelete = getattr(options, 'action', None) == 'undelete' + docids = getattr(options, 'docids', None) + md_doc = getattr(options, 'md_doc', None) + cd_doc = getattr(options, 'cd_doc', None) + if docids is None: + debug('Initially considering all the docids') + docids = intbitset(run_sql('SELECT id_bibdoc FROM bibrec_bibdoc')) + else: + debug('Initially considering this docids: %s' % docids) + tmp_query = [] + tmp_params = [] + if deleted_docs is None and action_undelete: + deleted_docs = 'only' + if deleted_docs == 'no': + tmp_query.append('status<>"DELETED"') + elif deleted_docs == 'only': + tmp_query.append('status="DELETED"') + if md_doc[0] is not None: + tmp_query.append('modification_date>=%s') + tmp_params.append(md_doc[0]) + if md_doc[1] is not None: + tmp_query.append('modification_date<=%s') + tmp_params.append(md_doc[1]) + if cd_doc[0] is not None: + tmp_query.append('creation_date>=%s') + tmp_params.append(cd_doc[0]) + if cd_doc[1] is not None: + tmp_query.append('creation_date<=%s') + tmp_params.append(cd_doc[1]) + if tmp_query: + tmp_query = ' AND '.join(tmp_query) + tmp_params = tuple(tmp_params) + query = 'SELECT id FROM bibdoc WHERE %s' % tmp_query + debug('Query: %s, param: %s' % (query, tmp_params)) + docids &= intbitset(run_sql(query, tmp_params)) + debug('After applying dates we obtain docids: %s' % docids) + return docids + +def cli_slow_match_single_recid(options, recid, recids=None, docids=None): + """Apply all the given queries in order to assert wethever a recid + match or not. + if with_docids is True, the recid is matched if it has at least one docid that is matched""" + debug('cli_slow_match_single_recid checking: %s' % recid) + deleted_docs = getattr(options, 'deleted_docs', None) + deleted_recs = getattr(options, 'deleted_recs', None) + empty_recs = getattr(options, 'empty_recs', None) + docname = cli2docname(options) + bibrecdocs = BibRecDocs(recid, deleted_too=(deleted_docs != 'no')) + if bibrecdocs.deleted_p() and (deleted_recs == 'no'): + return False + elif not bibrecdocs.deleted_p() and (deleted_recs != 'only'): + if docids: + for bibdoc in bibrecdocs.list_bibdocs(): + if bibdoc.get_id() in docids: + break + else: + return False + if docname: + for other_docname in bibrecdocs.get_bibdoc_names(): + if docname and fnmatch.fnmatchcase(other_docname, docname): + break + else: + return False + if bibrecdocs.empty_p() and (empty_recs != 'no'): + return True + elif not bibrecdocs.empty_p() and (empty_recs != 'only'): + return True + return False + +def cli_slow_match_single_docid(options, docid, recids=None, docids=None): + """Apply all the given queries in order to assert wethever a recid + match or not.""" + debug('cli_slow_match_single_docid checking: %s' % docid) + empty_docs = getattr(options, 'empty_docs', None) + docname = getattr(options, 'docname', None) + if recids is None: + recids = cli_quick_match_all_recids(options) + bibdoc = BibDoc(docid) + if docname and not fnmatch.fnmatchcase(bibdoc.get_docname(), docname): + debug('docname %s does not match the pattern %s' % (repr(bibdoc.get_docname()), repr(docname))) + return False + elif bibdoc.get_recid() and bibdoc.get_recid() not in recids: + debug('recid %s is not in pattern %s' % (repr(bibdoc.get_recid()), repr(recids))) + return False + elif empty_docs == 'no' and bibdoc.empty_p(): + debug('bibdoc is empty') + return False + elif empty_docs == 'only' and not bibdoc.empty_p(): + debug('bibdoc is not empty') + return False + else: + return True + +def cli2recid(options, recids=None, docids=None): + """Given the command line options return a recid.""" + recids = list(cli_recids_iterator(options, recids=recids, docids=docids)) + if len(recids) == 1: + return recids[0] + if recids: + raise StandardError, "More than one recid has been matched: %s" % recids + else: + raise StandardError, "No recids matched" + +def cli2docid(options, recids=None, docids=None): + """Given the command line options return a docid.""" + docids = list(cli_docids_iterator(options, recids=recids, docids=docids)) + if len(docids) == 1: + return docids[0] + if docids: + raise StandardError, "More than one docid has been matched: %s" % docids + else: + raise StandardError, "No docids matched" + +def cli2description(options): + """Return a good value for the description.""" + description = getattr(options, 'set_description', None) + if description is None: + description = KEEP_OLD_VALUE + return description + +def cli2restriction(options): + """Return a good value for the restriction.""" + restriction = getattr(options, 'set_restriction', None) + if restriction is None: + restriction = KEEP_OLD_VALUE + return restriction + +def cli2comment(options): + """Return a good value for the comment.""" + comment = getattr(options, 'set_comment', None) + if comment is None: + comment = KEEP_OLD_VALUE + return comment + +def cli2doctype(options): + """Return a good value for the doctype.""" + doctype = getattr(options, 'set_doctype', None) + if not doctype: + return 'Main' + return doctype + +def cli2docname(options, docid=None, url=None): + """Given the command line options and optional precalculated docid + returns the corresponding docname.""" + if docid: + bibdoc = BibDoc(docid=docid) + return bibdoc.get_docname() + docname = getattr(options, 'docname', None) + if docname is not None: + return docname + if url is not None: + return get_docname_from_url(url) + else: + return None + +def cli2format(options, url=None): + """Given the command line options returns the corresponding format.""" + format = getattr(options, 'format', None) + if format is not None: + return format + elif url is not None: + ## FIXME: to deploy once conversion-tools branch is merged + #return guess_format_from_url(url) + return guess_format_from_url(url) + else: + raise OptionValueError("Not enough information to retrieve a valid format") + +def cli_recids_iterator(options, recids=None, docids=None): + """Slow iterator over all the matched recids. + if with_docids is True, the recid must be attached to at least a matched docid""" + debug('cli_recids_iterator') + if recids is None: + recids = cli_quick_match_all_recids(options) + debug('working on recids: %s, docids: %s' % (recids, docids)) + for recid in recids: + if cli_slow_match_single_recid(options, recid, recids, docids): + yield recid + raise StopIteration + +def cli_docids_iterator(options, recids=None, docids=None): + """Slow iterator over all the matched docids.""" + if recids is None: + recids = cli_quick_match_all_recids(options) + if docids is None: + docids = cli_quick_match_all_docids(options) + for docid in docids: + if cli_slow_match_single_docid(options, docid, recids, docids): + yield docid + raise StopIteration class OptionParserSpecial(OptionParser): def format_help(self, *args, **kwargs): result = OptionParser.format_help(self, *args, **kwargs) if hasattr(self, 'trailing_text'): return "%s\n%s\n" % (result, self.trailing_text) else: return result def prepare_option_parser(): """Parse the command line options.""" - parser = OptionParserSpecial(usage="usage: %prog [options]", + + def _ids_ranges_callback(option, opt, value, parser): + """Callback for optparse to parse a set of ids ranges in the form + nnn1-nnn2,mmm1-mmm2... returning the corresponding intbitset. + """ + try: + debug('option: %s, opt: %s, value: %s, parser: %s' % (option, opt, value, parser)) + debug('Parsing range: %s' % value) + value = ranges2ids(value) + setattr(parser.values, option.dest, value) + except Exception, e: + raise OptionValueError("It's impossible to parse the range '%s' for option %s: %s" % (value, opt, e)) + + def _date_range_callback(option, opt, value, parser): + """Callback for optparse to parse a range of dates in the form + [date1],[date2]. Both date1 and date2 could be optional. + the date can be expressed absolutely ("%Y-%m-%d %H:%M:%S") + or relatively (([-\+]{0,1})([\d]+)([dhms])) to the current time.""" + try: + value = _parse_date_range(value) + setattr(parser.values, option.dest, value) + except Exception, e: + raise OptionValueError("It's impossible to parse the range '%s' for option %s: %s" % (value, opt, e)) + + parser = OptionParserSpecial(usage="usage: %prog [options]", #epilog="""With you select the range of record/docnames/single files to work on. Note that some actions e.g. delete, append, revise etc. works at the docname level, while others like --set-comment, --set-description, at single file level and other can be applied in an iterative way to many records in a single run. Note that specifing docid(2) takes precedence over recid(2) which in turns takes precedence over pattern/collection search.""", version=__revision__) parser.trailing_text = """ Examples: $ bibdocfile --append foo.tar.gz --recid=1 - $ bibdocfile --revise http://foo.com?search=123 --docname='sam' - --format=pdf --recid=3 --new-docname='pippo' - $ bibdocfile --delete *sam --all - $ bibdocfile --undelete -c "Test Collection" + $ bibdocfile --revise http://foo.com?search=123 --with-docname='sam' + --format=pdf --recid=3 --set-docname='pippo' # revise for record 3 + # the document sam, renaming it to pippo. + $ bibdocfile --delete --with-docname="*sam" --all # delete all documents + # starting ending + # with "sam" + $ bibdocfile --undelete -c "Test Collection" # undelete documents for + # the collection + $ bibdocfile --get-info --recids=1-4,6-8 # obtain informations + $ bibdocfile -r 1 --with-docname=foo --set-docname=bar # Rename a document """ - parser.trailing_text += wrap_text_in_a_box(""" -The bibdocfile command line tool is in a state of high developement. Please -do not rely on the command line parameters to remain compatible for the next -release. You should in particular be aware that if you need to build scripts -on top of the bibdocfile command line interfaces, you will probably need to -revise them with the next release of CDS Invenio.""", 'WARNING') - query_options = OptionGroup(parser, 'Query parameters') - query_options.add_option('-a', '--all', action='store_true', dest='all', help='Select all the records') - query_options.add_option('--show-deleted', action='store_true', dest='show_deleted', help='Show deleted docname, too') - query_options.add_option('-p', '--pattern', dest='pattern', help='select by specifying the search pattern') - query_options.add_option('-c', '--collection', dest='collection', help='select by collection') - query_options.add_option('-r', '--recid', type='int', dest='recid', help='select the recid (or the first recid in a range)') - query_options.add_option('--recid2', type='int', dest='recid2', help='select the end of the range') - query_options.add_option('-d', '--docid', type='int', dest='docid', help='select by docid (or the first docid in a range)') - query_options.add_option('--docid2', type='int', dest='docid2', help='select the end of the range') - query_options.add_option('--docname', dest='docname', help='specify the docname to work on') - query_options.add_option('--new-docname', dest='newdocname', help='specify the desired new docname for revising') - query_options.add_option('--doctype', dest='doctype', help='specify the new doctype') - query_options.add_option('--format', dest='format', help='specify the format') - query_options.add_option('--icon', dest='icon', help='specify the URL/path for an icon') - query_options.add_option('--description', dest='description', help='specify a description') - query_options.add_option('--comment', dest='comment', help='specify a comment') - query_options.add_option('--restriction', dest='restriction', help='specify a restriction tag') + query_options = OptionGroup(parser, 'Query options') + query_options.add_option('-r', '--recids', action="callback", callback=_ids_ranges_callback, type='string', dest='recids', help='matches records by recids, e.g.: --recids=1-3,5-7') + query_options.add_option('-d', '--docids', action="callback", callback=_ids_ranges_callback, type='string', dest='docids', help='matches documents by docids, e.g.: --docids=1-3,5-7') + query_options.add_option('-a', '--all', action='store_true', dest='all', help='Select all the records') + query_options.add_option("--with-deleted-recs", choices=['yes', 'no', 'only'], type="choice", dest="deleted_recs", help="'Yes' to also match deleted records, 'no' to exclude them, 'only' to match only deleted ones", metavar="yes/no/only", default='no') + query_options.add_option("--with-deleted-docs", choices=['yes', 'no', 'only'], type="choice", dest="deleted_docs", help="'Yes' to also match deleted documents, 'no' to exclude them, 'only' to match only deleted ones (e.g. for undeletion)", metavar="yes/no/only", default='no') + query_options.add_option("--with-empty-recs", choices=['yes', 'no', 'only'], type="choice", dest="empty_recs", help="'Yes' to also match records without attached documents, 'no' to exclude them, 'only' to consider only such records (e.g. for statistics)", metavar="yes/no/only", default='no') + query_options.add_option("--with-empty-docs", choices=['yes', 'no', 'only'], type="choice", dest="empty_docs", help="'Yes' to also match documents without attached files, 'no' to exclude them, 'only' to consider only such documents (e.g. for sanity checking)", metavar="yes/no/only", default='no') + query_options.add_option("--with-record-modification-date", action="callback", callback=_date_range_callback, dest="md_rec", nargs=1, type="string", default=(None, None), help="matches records modified date1 and date2; dates can be expressed relatively, e.g.:\"-5m,2030-2-23 04:40\" # matches records modified since 5 minutes ago until the 2030...", metavar="date1,date2") + query_options.add_option("--with-record-creation-date", action="callback", callback=_date_range_callback, dest="cd_rec", nargs=1, type="string", default=(None, None), help="matches records created between date1 and date2; dates can be expressed relatively", metavar="date1,date2") + query_options.add_option("--with-document-modification-date", action="callback", callback=_date_range_callback, dest="md_doc", nargs=1, type="string", default=(None, None), help="matches documents modified between date1 and date2; dates can be expressed relatively", metavar="date1,date2") + query_options.add_option("--with-document-creation-date", action="callback", callback=_date_range_callback, dest="cd_doc", nargs=1, type="string", default=(None, None), help="matches documents created between date1 and date2; dates can be expressed relatively", metavar="date1,date2") + query_options.add_option("--url", dest="url", help='matches the document referred by the URL, e.g. "%s/record/1/files/foobar.pdf?version=2"' % CFG_SITE_URL) + query_options.add_option("--path", dest="path", help='matches the document referred by the internal filesystem path, e.g. %s/g0/1/foobar.pdf\\;1' % CFG_WEBSUBMIT_FILEDIR) + query_options.add_option("--with-docname", dest="docname", help='matches documents with the given docname (accept wildcards)') + query_options.add_option("--with-doctype", dest="doctype", help='matches documents with the given doctype') + query_options.add_option('-p', '--pattern', dest='pattern', help='matches records by pattern') + query_options.add_option('-c', '--collection', dest='collection', help='matches records by collection') + query_options.add_option('--force', dest='force', help='force an action even when it\'s not necessary e.g. textify on an already textified bibdoc.', action='store_true', default=False) parser.add_option_group(query_options) - action_options = OptionGroup(parser, 'Actions') - for (action, help) in _actions: - action_options.add_option('--%s' % action, action='store_const', const=action, dest='action', help=help) - parser.add_option_group(action_options) - action_with_parameters = OptionGroup(parser, 'Actions with parameter') - for action, (dest, help) in _actions_with_parameter.iteritems(): - action_with_parameters.add_option('--%s' % action, dest=dest, help=help) - parser.add_option_group(action_with_parameters) - parser.add_option('-v', '--verbose', type='int', dest='verbose', default=1) - parser.add_option('--yes-i-know', action='store_true', dest='yes-i-know') - parser.add_option('-H', '--human-readable', dest='human_readable', action='store_true', default=False, help='print sizes in human readable format (e.g., 1KB 234MB 2GB)') - return parser -def get_recids_from_query(pattern, collection, recid, recid2, docid, docid2): - """Return the proper set of recids corresponding to the given - parameters.""" - if docid: - ret = intbitset() - if not docid2: - docid2 = docid - for adocid in xrange(docid, docid2 + 1): - try: - bibdoc = BibDoc(adocid) - if bibdoc and bibdoc.get_recid(): - ret.add(bibdoc.get_recid()) - except (InvenioWebSubmitFileError, TypeError): - pass - return ret - elif recid: - if not recid2: - recid2 = recid - recid_range = intbitset(xrange(recid, recid2 + 1)) - recid_set = intbitset(run_sql('select id from bibrec')) - recid_set &= recid_range - return recid_set - elif pattern or collection: - return intbitset(perform_request_search(cc=collection or "", p=pattern or "")) - else: - print >> sys.stderr, "ERROR: no record specified." - sys.exit(1) + getting_information_options = OptionGroup(parser, 'Actions for getting information') + getting_information_options.add_option('--get-info', dest='action', action='store_const', const='get-info', help='print all the informations about the matched record/documents') + getting_information_options.add_option('--get-disk-usage', dest='action', action='store_const', const='get-disk-usage', help='print disk usage statistics of the matched documents') + getting_information_options.add_option('--get-history', dest='action', action='store_const', const='get-history', help='print the matched documents history') + parser.add_option_group(getting_information_options) -def get_docids_from_query(recid_set, docname, docid, docid2, show_deleted=False): - """Given a set of recid and an optional range of docids - return a corresponding docids set. The range of docids - takes precedence over the recid_set.""" - if docname: - ret = intbitset() - for recid in recid_set: - bibrec = BibRecDocs(recid, deleted_too=show_deleted) - for bibdoc in bibrec.list_bibdocs(): - if fnmatch.fnmatch(bibdoc.get_docname(), docname): - ret.add(bibdoc.get_id()) - return ret - elif docid: - ret = intbitset() - if not docid2: - docid2 = docid - for adocid in xrange(docid, docid2 + 1): - try: - bibdoc = BibDoc(adocid) - if bibdoc: - ret.add(adocid) - except (InvenioWebSubmitFileError, TypeError): - pass - return ret - else: - ret = intbitset() - for recid in recid_set: - bibrec = BibRecDocs(recid, deleted_too=show_deleted) - for bibdoc in bibrec.list_bibdocs(): - ret.add(bibdoc.get_id()) - icon = bibdoc.get_icon() - if icon: - ret.add(icon.get_id()) - return ret + setting_information_options = OptionGroup(parser, 'Actions for setting information') + setting_information_options.add_option('--set-doctype', dest='set_doctype', help='specify the new doctype', metavar='doctype') + setting_information_options.add_option('--set-description', dest='set_description', help='specify a description', metavar='description') + setting_information_options.add_option('--set-comment', dest='set_comment', help='specify a comment', metavar='comment') + setting_information_options.add_option('--set-restriction', dest='set_restriction', help='specify a restriction tag', metavar='restriction') + setting_information_options.add_option('--set-docname', dest='new_docname', help='specifies a new docname for renaming', metavar='docname') + setting_information_options.add_option("--unset-comment", action="store_const", const='', dest="set_comment", help="remove any comment") + setting_information_options.add_option("--unset-descriptions", action="store_const", const='', dest="set_description", help="remove any description") + setting_information_options.add_option("--unset-restrictions", action="store_const", const='', dest="set_restriction", help="remove any restriction") + setting_information_options.add_option("--hide", dest="action", action='store_const', const='hide', help="hides matched documents and revisions") + setting_information_options.add_option("--unhide", dest="action", action='store_const', const='unhide', help="hides matched documents and revisions") + parser.add_option_group(setting_information_options) + + revising_options = OptionGroup(parser, 'Action for revising content') + revising_options.add_option("--append", dest='append_path', help='specify the URL/path of the file that will appended to the bibdoc', metavar='PATH/URL') + revising_options.add_option("--revise", dest='revise_path', help='specify the URL/path of the file that will revise the bibdoc', metavar='PATH/URL') + revising_options.add_option("--revert", dest='action', action='store_const', const='revert', help='reverts a document to the specified version') + revising_options.add_option("--delete", action='store_const', const='delete', dest='action', help='soft-delete the matched documents (applies to all revisions and formats)') + revising_options.add_option("--hard-delete", action='store_const', const='hard_delete', dest='action', help='hard-delete the matched documents (applies to matched revisions and formats)') + revising_options.add_option("--undelete", action='store_const', const='undelete', dest='action', help='undelete previosuly soft-deleted documents (applies to all revisions and formats)') + revising_options.add_option("--purge", action='store_const', const='purge', dest='action', help='purge (i.e. hard-delete previous versions) the matched documents') + revising_options.add_option("--expunge", action='store_const', const='expunge', dest='action', help='expunge (i.e. hard-delete any version and formats) the matched documents') + revising_options.add_option("--with-versions", dest="version", help="specifies the version(s) to be used with hard-delete, hide, revert, e.g.: 1-2,3 or all") + revising_options.add_option("--with-format", dest="format", help='to specify a format when appending/revising/deleting/reverting a document, e.g. "pdf"', metavar='FORMAT') + revising_options.add_option("--with-hide-previous", dest='hide_previous', action='store_true', help='when revising, hides previous versions', default=False) + parser.add_option_group(revising_options) + + housekeeping_options = OptionGroup(parser, 'Actions for housekeeping') + housekeeping_options.add_option("--check-md5", action='store_const', const='check-md5', dest='action', help='check md5 checksum validity of files') + housekeeping_options.add_option("--check-format", action='store_const', const='check-format', dest='action', help='check if any format-related inconsistences exists') + housekeeping_options.add_option("--check-duplicate-docnames", action='store_const', const='check-duplicate-docnames', dest='action', help='check for duplicate docnames associated with the same record') + housekeeping_options.add_option("--update-md5", action='store_const', const='update-md5', dest='action', help='update md5 checksum of files') + housekeeping_options.add_option("--fix-all", action='store_const', const='fix-all', dest='action', help='fix inconsistences in filesystem vs database vs MARC') + housekeeping_options.add_option("--fix-marc", action='store_const', const='fix-marc', dest='action', help='synchronize MARC after filesystem/database') + housekeeping_options.add_option("--fix-format", action='store_const', const='fix-format', dest='action', help='fix format related inconsistences') + housekeeping_options.add_option("--fix-duplicate-docnames", action='store_const', const='fix-duplicate-docnames', dest='action', help='fix duplicate docnames associated with the same record') + parser.add_option_group(housekeeping_options) + + experimental_options = OptionGroup(parser, 'Experimental options (do not expect to find them in the next release)') + experimental_options.add_option('--textify', dest='action', action='store_const', const='textify', help='extract text from matched documents and store it for later indexing') + experimental_options.add_option('--with-ocr', dest='perform_ocr', action='store_true', default=False, help='when used with --textify, wether to perform OCR') + parser.add_option_group(experimental_options) + + parser.add_option('-D', '--debug', action='store_true', dest='debug', default=False) + parser.add_option('-H', '--human-readable', dest='human_readable', action='store_true', default=False, help='print sizes in human readable format (e.g., 1KB 234MB 2GB)') + parser.add_option('--yes-i-know', action='store_true', dest='yes-i-know', help='use with care!') + return parser def print_info(recid, docid, info): """Nicely print info about a recid, docid pair.""" print '%i:%i:%s' % (recid, docid, info) -def bibupload_ffts(ffts, append=False): +def bibupload_ffts(ffts, append=False, debug=False): """Given an ffts dictionary it creates the xml and submit it.""" xml = ffts_to_xml(ffts) if xml: print xml tmp_file_fd, tmp_file_name = mkstemp(suffix='.xml', prefix="bibdocfile_%s" % time.strftime("%Y-%m-%d_%H:%M:%S"), dir=CFG_TMPDIR) os.write(tmp_file_fd, xml) os.close(tmp_file_fd) os.chmod(tmp_file_name, 0644) if append: wait_for_user("This will be appended via BibUpload") - task = task_low_level_submission('bibupload', 'bibdocfile', '-a', tmp_file_name) + if debug: + task = task_low_level_submission('bibupload', 'bibdocfile', '-a', tmp_file_name, '-N', 'FFT', '-S2', '-v9') + else: + task = task_low_level_submission('bibupload', 'bibdocfile', '-a', tmp_file_name, '-N', 'FFT', '-S2', '-v9') print "BibUpload append submitted with id %s" % task else: wait_for_user("This will be corrected via BibUpload") - task = task_low_level_submission('bibupload', 'bibdocfile', '-c', tmp_file_name) + if debug: + task = task_low_level_submission('bibupload', 'bibdocfile', '-c', tmp_file_name, '-N', 'FFT', '-S2', '-v9') + else: + task = task_low_level_submission('bibupload', 'bibdocfile', '-c', tmp_file_name, '-N', 'FFT', '-S2') print "BibUpload correct submitted with id %s" % task else: - print "WARNING: no MARC to upload." + print >> sys.stderr, "WARNING: no MARC to upload." return True -def cli_append(recid=None, docid=None, docname=None, doctype=None, url=None, format=None, icon=None, description=None, comment=None, restriction=None): +def ranges2ids(parse_string): + """Parse a string and return the intbitset of the corresponding ids.""" + ids = intbitset() + ranges = parse_string.split(",") + for arange in ranges: + tmp_ids = arange.split("-") + if len(tmp_ids)==1: + ids.add(int(tmp_ids[0])) + else: + if int(tmp_ids[0]) > int(tmp_ids[1]): # sanity check + tmp = tmp_ids[0] + tmp_ids[0] = tmp_ids[1] + tmp_ids[1] = tmp + ids += xrange(int(tmp_ids[0]), int(tmp_ids[1]) + 1) + return ids + +def cli_append(options, append_path): """Create a bibupload FFT task submission for appending a format.""" - if docid is not None: - bibdoc = BibDoc(docid) - if recid is not None and recid != bibdoc.get_recid(): - print >> sys.stderr, "ERROR: Provided recid %i is not linked with provided docid %i" % (recid, docid) - return False - if docname is not None and docname != bibdoc.get_docname(): - print >> sys.stderr, "ERROR: Provided docid %i is not named as the provided docname %s" % (docid, docname) - return False - recid = bibdoc.get_recid() - docname = bibdoc.get_docname() - elif recid is None: - print >> sys.stderr, "ERROR: Not enough information to identify the record and desired document" - return False - try: - url = clean_url(url) - check_valid_url(url) - except StandardError, e: - print >> sys.stderr, "ERROR: Not a valid url has been specified: %s" % e - return False - if docname is None: - docname = get_docname_from_url(url) + recid = cli2recid(options) + comment = getattr(options, 'comment', None) + description = getattr(options, 'comment', None) + restriction = getattr(options, 'restriction', None) + doctype = getattr(options, 'doctype', None) or 'Main' + docname = cli2docname(options, url=append_path) if not docname: - print >> sys.stderr, "ERROR: Not enough information to decide a docname!" - return False - if format is None: - format = get_format_from_url(url) - if not format: - print >> sys.stderr, "ERROR: Not enough information to decide a format!" - return False - if icon is not None and icon != KEEP_OLD_VALUE: - try: - icon = clean_url(icon) - check_valid_url(url) - except StandardError, e: - print >> sys.stderr, "ERROR: Not a valid url has been specified for the icon: %s" % e - return False - if doctype is None: - doctype = 'Main' - - fft = { - 'url' : url, + raise OptionValueError, 'Not enough information to retrieve a valid docname' + format = cli2format(options, append_path) + url = clean_url(append_path) + check_valid_url(url) + ffts = {recid: [{ 'docname' : docname, - 'format' :format, - 'icon' : icon, 'comment' : comment, 'description' : description, 'restriction' : restriction, - 'doctype' : doctype - } - ffts = {recid : [fft]} + 'doctype' : doctype, + 'format' : format, + 'url' : url + }]} return bibupload_ffts(ffts, append=True) -def cli_revise(recid=None, docid=None, docname=None, new_docname=None, doctype=None, url=None, format=None, icon=None, description=None, comment=None, restriction=None, hide_previous=False): +def cli_revise(options, revise_path): """Create a bibupload FFT task submission for appending a format.""" - if docid is not None: - bibdoc = BibDoc(docid) - if recid is not None and recid != bibdoc.get_recid(): - print >> sys.stderr, "ERROR: Provided recid %i is not linked with provided docid %i" % (recid, docid) - return False - if docname is not None and docname != bibdoc.get_docname(): - print >> sys.stderr, "ERROR: Provided docid %i is not named as the provided docname %s" % (docid, docname) - return False - recid = bibdoc.get_recid() - docname = bibdoc.get_docname() - elif recid is None: - print >> sys.stderr, "ERROR: Not enough information to identify the record and desired document" - return False - if url is not None: - try: - url = clean_url(url) - check_valid_url(url) - except StandardError, e: - print >> sys.stderr, "ERROR: Not a valid url has been specified: %s" % e - return False - if docname is None and url is not None: - docname = get_docname_from_url(url) + recid = cli2recid(options) + comment = cli2comment(options) + description = cli2description(options) + restriction = cli2restriction(options) + docname = cli2docname(options, url=revise_path) + hide_previous = getattr(options, 'hide_previous', None) if not docname: - print >> sys.stderr, "ERROR: Not enough information to decide a docname!" - return False - if docname not in BibRecDocs(recid).get_bibdoc_names(): - print >> sys.stderr, "ERROR: docname %s is not connected with recid %s!" % (docname, recid) - return False - if format is None and url is not None: - format = get_format_from_url(url) - if not format: - print >> sys.stderr, "ERROR: Not enough information to decide a format!" - return False - if icon is not None and icon != KEEP_OLD_VALUE: - try: - icon = clean_url(icon) - check_valid_url(url) - except StandardError, e: - print >> sys.stderr, "ERROR: Not a valid url has been specified for the icon: %s" % e - return False - if doctype is None: - doctype = 'Main' - - fft = { - 'url' : url, + raise OptionValueError, 'Not enough information to retrieve a valid docname' + format = cli2format(options, revise_path) + doctype = cli2doctype(options) + url = clean_url(revise_path) + new_docname = getattr(options, 'new_docname', None) + check_valid_url(url) + ffts = {recid : [{ 'docname' : docname, - 'newdocname' : new_docname, - 'format' :format, - 'icon' : icon, + 'new_docname' : new_docname, 'comment' : comment, 'description' : description, 'restriction' : restriction, - 'doctype' : doctype - } - if hide_previous: - fft['options'] = 'HIDE_PREVIOUS' - ffts = {recid : [fft]} + 'doctype' : doctype, + 'format' : format, + 'url' : url, + 'options' : hide_previous and ['PERFORM_HIDE_PREVIOUS'] or None + }]} + return bibupload_ffts(ffts) + +def cli_set_batch(options): + """Change in batch the doctype, description, comment and restriction.""" + ffts = {} + doctype = getattr(options, 'set_doctype', None) + description = getattr(options, 'set_description', None) + comment = getattr(options, 'set_comment', None) + restriction = getattr(options, 'set_restriction', None) + with_format = getattr(options, 'format', None) + for docid in cli_docids_iterator(options): + bibdoc = BibDoc(docid) + recid = bibdoc.get_recid() + docname = bibdoc.get_docname() + fft = [] + if description is not None or comment is not None: + for bibdocfile in bibdoc.list_latest_files(): + format = bibdocfile.get_format() + if not with_format or with_format == format: + fft.append({ + 'docname': docname, + 'restriction': restriction, + 'comment': comment, + 'description': description, + 'format': format, + 'doctype': doctype + }) + else: + fft.append({ + 'docname': docname, + 'restriction': restriction, + 'doctype': doctype, + }) + ffts[recid] = fft return bibupload_ffts(ffts, append=False) -def cli_get_history(docid_set): - """Print the history of a docid_set.""" - for docid in docid_set: +def cli_textify(options): + """Extract text to let indexing on fulltext be possible.""" + force = getattr(options, 'force', None) + perform_ocr = getattr(options, 'perform_ocr', None) + if perform_ocr: + if not can_perform_ocr(): + print >> sys.stderr, "WARNING: OCR requested but OCR is not possible" + perform_ocr = False + if perform_ocr: + additional = ' using OCR (this might take some time)' + else: + additional = '' + for docid in cli_docids_iterator(options): bibdoc = BibDoc(docid) - history = bibdoc.get_history() - for row in history: - print_info(bibdoc.get_recid(), docid, row) + print 'Extracting text for docid %s%s...' % (docid, additional), + sys.stdout.flush() + if force or not bibdoc.has_text(require_up_to_date=True): + try: + bibdoc.extract_text(perform_ocr=perform_ocr) + print "DONE" + except InvenioWebSubmitFileError, e: + print >> sys.stderr, "WARNING: %s" % e + else: + print "not needed" -def cli_fix_all(recid_set): +def cli_rename(options): + """Rename a docname within a recid.""" + new_docname = getattr(options, 'new_docname', None) + docid = cli2docid(options) + bibdoc = BibDoc(docid) + docname = bibdoc.get_docname() + recid = bibdoc.get_recid() + ffts = {recid : [{'docname' : docname, 'new_docname' : new_docname}]} + return bibupload_ffts(ffts, append=False) + +def cli_fix_all(options): """Fix all the records of a recid_set.""" ffts = {} - for recid in recid_set: + for recid in cli_recids_iterator(options): ffts[recid] = [] for docname in BibRecDocs(recid).get_bibdoc_names(): ffts[recid].append({'docname' : docname, 'doctype' : 'FIX-ALL'}) return bibupload_ffts(ffts, append=False) -def cli_fix_marc(recid_set): +def cli_fix_marc(options, explicit_recid_set=None): """Fix all the records of a recid_set.""" ffts = {} - for recid in recid_set: - ffts[recid] = [] - for docname in BibRecDocs(recid).get_bibdoc_names(): - ffts[recid].append({'docname' : docname, 'doctype' : 'FIX-MARC'}) + if explicit_recid_set is not None: + for recid in explicit_recid_set: + ffts[recid] = [{'doctype' : 'FIX-MARC'}] + else: + for recid in cli_recids_iterator(options): + ffts[recid] = [{'doctype' : 'FIX-MARC'}] return bibupload_ffts(ffts, append=False) -def cli_check_format(recid_set): +def cli_check_format(options): """Check if any format-related inconsistences exists.""" count = 0 + tot = 0 duplicate = False - for recid in recid_set: + for recid in cli_recids_iterator(options): + tot += 1 bibrecdocs = BibRecDocs(recid) if not bibrecdocs.check_duplicate_docnames(): print >> sys.stderr, "recid %s has duplicate docnames!" broken = True duplicate = True else: broken = False for docname in bibrecdocs.get_bibdoc_names(): if not bibrecdocs.check_format(docname): print >> sys.stderr, "recid %s with docname %s need format fixing" % (recid, docname) broken = True if broken: count += 1 if count: - result = "%d out of %d records need their formats to be fixed." % (count, len(recid_set)) + result = "%d out of %d records need their formats to be fixed." % (count, tot) else: result = "All records appear to be correct with respect to formats." if duplicate: result += " Note however that at least one record appear to have duplicate docnames. You should better fix this situation by using --fix-duplicate-docnames." print wrap_text_in_a_box(result, style="conclusion") return not(duplicate or count) -def cli_check_duplicate_docnames(recid_set): +def cli_check_duplicate_docnames(options): """Check if some record is connected with bibdoc having the same docnames.""" count = 0 - for recid in recid_set: + tot = 0 + for recid in cli_recids_iterator(options): + tot += 1 bibrecdocs = BibRecDocs(recid) if bibrecdocs.check_duplicate_docnames(): count += 1 print sys.stderr, "recid %s has duplicate docnames!" if count: - result = "%d out of %d records have duplicate docnames." % (count, len(recid_set)) + print "%d out of %d records have duplicate docnames." % (count, tot) return False else: - result = "All records appear to be correct with respect to duplicate docnames." + print "All records appear to be correct with respect to duplicate docnames." return True -def cli_fix_format(recid_set): +def cli_fix_format(options): """Fix format-related inconsistences.""" fixed = intbitset() - for recid in recid_set: + tot = 0 + for recid in cli_recids_iterator(options): + tot += 1 bibrecdocs = BibRecDocs(recid) for docname in bibrecdocs.get_bibdoc_names(): if not bibrecdocs.check_format(docname): if bibrecdocs.fix_format(docname, skip_check=True): print >> sys.stderr, "%i has been fixed for docname %s" % (recid, docname) else: print >> sys.stderr, "%i has been fixed for docname %s. However note that a new bibdoc might have been created." % (recid, docname) fixed.add(recid) if fixed: print "Now we need to synchronize MARC to reflect current changes." - cli_fix_marc(fixed) - print wrap_text_in_a_box("%i out of %i record needed to be fixed." % (len(recid_set), len(fixed)), style="conclusion") + cli_fix_marc(options, explicit_recid_set=fixed) + print wrap_text_in_a_box("%i out of %i record needed to be fixed." % (tot, len(fixed)), style="conclusion") return not fixed -def cli_fix_duplicate_docnames(recid_set): +def cli_fix_duplicate_docnames(options): """Fix duplicate docnames.""" fixed = intbitset() - for recid in recid_set: + tot = 0 + for recid in cli_recids_iterator(): + tot += 1 bibrecdocs = BibRecDocs(recid) if not bibrecdocs.check_duplicate_docnames(): bibrecdocs.fix_duplicate_docnames(skip_check=True) print >> sys.stderr, "%i has been fixed for duplicate docnames." % recid fixed.add(recid) if fixed: print "Now we need to synchronize MARC to reflect current changes." - cli_fix_marc(fixed) - print wrap_text_in_a_box("%i out of %i record needed to be fixed." % (len(recid_set), len(fixed)), style="conclusion") + cli_fix_marc(options, explicit_recid_set=fixed) + print wrap_text_in_a_box("%i out of %i record needed to be fixed." % (tot, len(fixed)), style="conclusion") return not fixed -def cli_delete(docid_set): +def cli_delete(options): """Delete the given docid_set.""" ffts = {} - for docid in docid_set: + for docid in cli_docids_iterator(options): bibdoc = BibDoc(docid) - if not bibdoc.icon_p(): - ## Icons are indirectly deleted with the relative bibdoc. - docname = bibdoc.get_docname() - recid = bibdoc.get_recid() + docname = bibdoc.get_docname() + recid = bibdoc.get_recid() + if recid not in ffts: ffts[recid] = [{'docname' : docname, 'doctype' : 'DELETE'}] - if ffts: - return bibupload_ffts(ffts, append=False) - else: - print >> sys.stderr, 'ERROR: nothing to delete' - return False + else: + ffts[recid].append({'docname' : docname, 'doctype' : 'DELETE'}) + return bibupload_ffts(ffts) + +def cli_delete_file(options): + """Delete the given file irreversibely.""" + docid = cli2docid(options) + recid = cli2recid(options, docids=intbitset([docid])) + format = cli2format(options) + docname = BibDoc(docid).get_docname() + version = getattr(options, 'version', None) + ffts = {recid : [{'docname' : docname, 'version' : version, 'format' : format, 'doctype' : 'DELETE-FILE'}]} + return bibupload_ffts(ffts) + +def cli_revert(options): + """Revert a bibdoc to a given version.""" + docid = cli2docid(options) + recid = cli2recid(options, docids=intbitset([docid])) + docname = BibDoc(docid).get_docname() + version = getattr(options, 'version', None) + try: + version = int(version) + if 0 >= version: + raise ValueError + except ValueError: + raise OptionValueError, 'when reverting, version should be valid positive integer, not %s' % version + ffts = {recid : [{'docname' : docname, 'version' : version, 'doctype' : 'REVERT'}]} + return bibupload_ffts(ffts) -def cli_undelete(recid_set, docname, status): +def cli_undelete(options): """Delete the given docname""" - fix_marc = intbitset() + docname = cli2docname(options) + restriction = getattr(options, 'restriction', None) count = 0 + if not docname: + docname = 'DELETED-*-*' if not docname.startswith('DELETED-'): docname = 'DELETED-*-' + docname - for recid in recid_set: - bibrecdocs = BibRecDocs(recid, deleted_too=True) - for bibdoc in bibrecdocs.list_bibdocs(): - if bibdoc.get_status() == 'DELETED' and fnmatch.fnmatch(bibdoc.get_docname(), docname): - bibdoc.undelete(status) - fix_marc.add(recid) - count += 1 - cli_fix_marc(fix_marc) - print wrap_text_in_a_box("%s bibdoc successfuly undeleted with status '%s'" % (count, status), style="conclusion") - -def cli_merge_into(recid, docname, into_docname): - """Merge docname into_docname for the given recid.""" - bibrecdocs = BibRecDocs(recid) - docnames = bibrecdocs.get_bibdoc_names() - if docname in docnames and into_docname in docnames: - try: - bibrecdocs.merge_bibdocs(into_docname, docname) - except InvenioWebSubmitFileError, e: - print >> sys.stderr, e - else: - cli_fix_marc(intbitset((recid))) + to_be_undeleted = intbitset() + fix_marc = intbitset() + setattr(options, 'deleted_docs', 'only') + for docid in cli_docids_iterator(options): + bibdoc = BibDoc(docid) + if bibdoc.get_status() == 'DELETED' and fnmatch.fnmatch(bibdoc.get_docname(), docname): + to_be_undeleted.add(docid) + fix_marc.add(bibdoc.get_recid()) + count += 1 + print '%s (docid %s from recid %s) will be undeleted to restriction: %s' % (bibdoc.get_docname(), docid, bibdoc.get_recid(), restriction) + wait_for_user("I'll proceed with the undeletion") + for docid in to_be_undeleted: + bibdoc = BibDoc(docid) + bibdoc.undelete(restriction) + cli_fix_marc(options, explicit_recid_set=fix_marc) + print wrap_text_in_a_box("%s bibdoc successfuly undeleted with status '%s'" % (count, restriction), style="conclusion") + +def cli_get_info(options): + """Print all the info of the matched docids or recids.""" + debug('Getting info!') + human_readable = bool(getattr(options, 'human_readable', None)) + debug('human_readable: %s' % human_readable) + deleted_docs = getattr(options, 'deleted_docs', None) in ('yes', 'only') + debug('deleted_docs: %s' % deleted_docs) + if getattr(options, 'docids', None): + for docid in cli_docids_iterator(options): + sys.stdout.write(str(BibDoc(docid, human_readable=human_readable))) else: - print >> sys.stderr, 'ERROR: Either %s or %s is not a valid docname for recid %s' % (docname, into_docname, recid) + for recid in cli_recids_iterator(options): + sys.stdout.write(str(BibRecDocs(recid, deleted_too=deleted_docs, human_readable=human_readable))) -def cli_get_info(recid_set, show_deleted=False, human_readable=False): - """Print all the info of a recid_set.""" - for recid in recid_set: - print BibRecDocs(recid, deleted_too=show_deleted, human_readable=human_readable) +def cli_purge(options): + """Purge the matched docids.""" + ffts = {} + for docid in cli_docids_iterator(options): + bibdoc = BibDoc(docid) + recid = bibdoc.get_recid() + docname = bibdoc.get_docname() + if recid: + if recid not in ffts: + ffts[recid] = [] + ffts[recid].append({ + 'docname' : docname, + 'doctype' : 'PURGE', + }) + return bibupload_ffts(ffts) -def cli_get_docnames(docid_set): - """Print all the docnames of a docid_set.""" - for docid in docid_set: +def cli_expunge(options): + """Expunge the matched docids.""" + ffts = {} + for docid in cli_docids_iterator(options): bibdoc = BibDoc(docid) - print_info(bibdoc.get_recid(), docid, bibdoc.get_docname()) + recid = bibdoc.get_recid() + docname = bibdoc.get_docname() + if recid: + if recid not in ffts: + ffts[recid] = [] + ffts[recid].append({ + 'docname' : docname, + 'doctype' : 'EXPUNGE', + }) + return bibupload_ffts(ffts) -def cli_get_disk_usage(docid_set, human_readable=False): +def cli_get_history(options): + """Print the history of a docid_set.""" + for docid in cli_docids_iterator(options): + bibdoc = BibDoc(docid) + history = bibdoc.get_history() + for row in history: + print_info(bibdoc.get_recid(), docid, row) + +def cli_get_disk_usage(options): """Print the space usage of a docid_set.""" + human_readable = getattr(options, 'human_readable', None) total_size = 0 total_latest_size = 0 - for docid in docid_set: + for docid in cli_docids_iterator(options): bibdoc = BibDoc(docid) size = bibdoc.get_total_size() total_size += size latest_size = bibdoc.get_total_size_latest_version() total_latest_size += latest_size if human_readable: print_info(bibdoc.get_recid(), docid, 'size=%s' % nice_size(size)) print_info(bibdoc.get_recid(), docid, 'latest version size=%s' % nice_size(latest_size)) else: print_info(bibdoc.get_recid(), docid, 'size=%s' % size) print_info(bibdoc.get_recid(), docid, 'latest version size=%s' % latest_size) if human_readable: print wrap_text_in_a_box('total size: %s\n\nlatest version total size: %s' % (nice_size(total_size), nice_size(total_latest_size)), style='conclusion') else: print wrap_text_in_a_box('total size: %s\n\nlatest version total size: %s' % (total_size, total_latest_size), style='conclusion') - -def cli_check_md5(docid_set): +def cli_check_md5(options): """Check the md5 sums of a docid_set.""" failures = 0 - for docid in docid_set: + for docid in cli_docids_iterator(options): bibdoc = BibDoc(docid) if bibdoc.md5s.check(): print_info(bibdoc.get_recid(), docid, 'checksum OK') else: for afile in bibdoc.list_all_files(): if not afile.check(): failures += 1 print_info(bibdoc.get_recid(), docid, '%s failing checksum!' % afile.get_full_path()) if failures: print wrap_text_in_a_box('%i files failing' % failures , style='conclusion') else: print wrap_text_in_a_box('All files are correct', style='conclusion') -def cli_update_md5(docid_set): +def cli_update_md5(options): """Update the md5 sums of a docid_set.""" - for docid in docid_set: + for docid in cli_docids_iterator(options): bibdoc = BibDoc(docid) if bibdoc.md5s.check(): print_info(bibdoc.get_recid(), docid, 'checksum OK') else: for afile in bibdoc.list_all_files(): if not afile.check(): print_info(bibdoc.get_recid(), docid, '%s failing checksum!' % afile.get_full_path()) wait_for_user('Updating the md5s of this document can hide real problems.') bibdoc.md5s.update(only_new=False) -def cli_assert_recid(options): - """Check for recid to be correctly set.""" - try: - assert(int(options.recid) > 0) - return True - except: - print >> sys.stderr, 'ERROR: recid not correctly set: "%s"' % options.recid - return False - -def cli_assert_docname(options): - """Check for recid to be correctly set.""" - try: - assert(options.docname) - return True - except: - print >> sys.stderr, 'ERROR: docname not correctly set: "%s"' % options.docname - return False +def cli_hide(options): + """Hide the matched versions of documents.""" + documents_to_be_hidden = {} + to_be_fixed = intbitset() + versions = getattr(options, 'versions', 'all') + if versions != 'all': + try: + versions = ranges2ids(versions) + except: + raise OptionValueError, 'You should specify correct versions. Not %s' % versions + else: + versions = intbitset(trailing_bits=True) + for docid in cli_docids_iterator(options): + bibdoc = BibDoc(docid) + recid = bibdoc.get_recid() + if recid: + for bibdocfile in bibdoc.list_all_files(): + this_version = bibdocfile.get_version() + this_format = bibdocfile.get_format() + if this_version in versions: + if docid not in documents_to_be_hidden: + documents_to_be_hidden[docid] = [] + documents_to_be_hidden[docid].append((this_version, this_format)) + to_be_fixed.add(recid) + print '%s (docid: %s, recid: %s) will be hidden' % (bibdocfile.get_full_name(), docid, recid) + wait_for_user('Proceeding to hide the matched documents...') + for docid, documents in documents_to_be_hidden.iteritems(): + bibdoc = BibDoc(docid) + for version, format in documents: + bibdoc.set_flag('HIDDEN', format, version) + return cli_fix_marc(options, to_be_fixed) -def get_all_recids(): - """Return all the existing recids.""" - return intbitset(run_sql('select id from bibrec')) +def cli_unhide(options): + """Unhide the matched versions of documents.""" + documents_to_be_unhidden = {} + to_be_fixed = intbitset() + versions = getattr(options, 'versions', 'all') + if versions != 'all': + try: + versions = ranges2ids(versions) + except: + raise OptionValueError, 'You should specify correct versions. Not %s' % versions + else: + versions = intbitset(trailing_bits=True) + for docid in cli_docids_iterator(options): + bibdoc = BibDoc(docid) + recid = bibdoc.get_recid() + if recid: + for bibdocfile in bibdoc.list_all_files(): + this_version = bibdocfile.get_version() + this_format = bibdocfile.get_format() + if this_version in versions: + if docid not in documents_to_be_unhidden: + documents_to_be_unhidden[docid] = [] + documents_to_be_unhidden[docid].append((this_version, this_format)) + to_be_fixed.add(recid) + print '%s (docid: %s, recid: %s) will be unhidden' % (bibdocfile.get_full_name(), docid, recid) + wait_for_user('Proceeding to unhide the matched documents...') + for docid, documents in documents_to_be_unhidden.iteritems(): + bibdoc = BibDoc(docid) + for version, format in documents: + bibdoc.unset_flag('HIDDEN', format, version) + return cli_fix_marc(options, to_be_fixed) def main(): parser = prepare_option_parser() (options, args) = parser.parse_args() - if options.all: - recid_set = get_all_recids() - else: - recid_set = get_recids_from_query(options.pattern, options.collection, options.recid, options.recid2, options.docid, options.docid2) - docid_set = get_docids_from_query(recid_set, options.docname, options.docid, options.docid2, options.show_deleted is True or options.action == 'undelete') + if getattr(options, 'debug', None): + getLogger().setLevel(DEBUG) + debug('test') + debug('options: %s, args: %s' % (options, args)) try: - if options.action == 'get-history': - cli_get_history(docid_set) - elif options.action == 'get-info': - cli_get_info(recid_set, options.show_deleted is True, options.human_readable) - elif options.action == 'get-docnames': - cli_get_docnames(docid_set) - elif options.action == 'get-disk-usage': - cli_get_disk_usage(docid_set, options.human_readable) - elif options.action == 'check-md5': - cli_check_md5(docid_set) - elif options.action == 'update-md5': - cli_update_md5(docid_set) - elif options.action == 'fix-all': - cli_fix_all(recid_set) - elif options.action == 'fix-marc': - cli_fix_marc(recid_set) - elif options.action == 'delete': - cli_delete(docid_set) - elif options.action == 'fix-duplicate-docnames': - cli_fix_duplicate_docnames(recid_set) - elif options.action == 'fix-format': - cli_fix_format(recid_set) - elif options.action == 'check-duplicate-docnames': - cli_check_duplicate_docnames(recid_set) - elif options.action == 'check-format': - cli_check_format(recid_set) - elif options.action == 'undelete': - cli_undelete(recid_set, options.docname or '*', options.restriction or '') - elif options.append_path: - if cli_assert_recid(options): - res = cli_append(options.recid, options.docid, options.docname, options.doctype, options.append_path, options.format, options.icon, options.description, options.comment, options.restriction) - if not res: - sys.exit(1) - elif options.revise_path: - if cli_assert_recid(options): - res = cli_revise(options.recid, options.docid, options.docname, - options.newdocname, options.doctype, options.revise_path, options.format, - options.icon, options.description, options.comment, options.restriction) - if not res: - sys.exit(1) - elif options.revise_hide_path: - if cli_assert_recid(options): - res = cli_revise(options.recid, options.docid, options.docname, - options.newdocname, options.doctype, options.revise_path, options.format, - options.icon, options.description, options.comment, options.restriction, True) - if not res: - sys.exit(1) - elif options.into_docname: - if options.recid and options.docname: - cli_merge_into(options.recid, options.docname, options.into_docname) + if not getattr(options, 'action', None) and \ + not getattr(options, 'append_path', None) and \ + not getattr(options, 'revise_path', None): + if getattr(options, 'set_doctype', None) is not None or \ + getattr(options, 'set_comment', None) is not None or \ + getattr(options, 'set_description', None) is not None or \ + getattr(options, 'set_restriction', None) is not None: + cli_set_batch(options) + elif getattr(options, 'new_docname', None): + cli_rename(options) else: - print >> sys.stderr, "ERROR: You have to specify both the recid and a docname for using --merge-into" + print >> sys.stderr, "ERROR: no action specified" + sys.exit(1) + elif getattr(options, 'append_path', None): + cli_append(options, getattr(options, 'append_path', None)) + elif getattr(options, 'revise_path', None): + cli_revise(options, getattr(options, 'revise_path', None)) + elif options.action == 'textify': + cli_textify(options) + elif getattr(options, 'action', None) == 'get-history': + cli_get_history(options) + elif getattr(options, 'action', None) == 'get-info': + cli_get_info(options) + elif getattr(options, 'action', None) == 'get-disk-usage': + cli_get_disk_usage(options) + elif getattr(options, 'action', None) == 'check-md5': + cli_check_md5(options) + elif getattr(options, 'action', None) == 'update-md5': + cli_update_md5(options) + elif getattr(options, 'action', None) == 'fix-all': + cli_fix_all(options) + elif getattr(options, 'action', None) == 'fix-marc': + cli_fix_marc(options) + elif getattr(options, 'action', None) == 'delete': + cli_delete(options) + elif getattr(options, 'action', None) == 'delete-file': + cli_delete_file(options) + elif getattr(options, 'action', None) == 'fix-duplicate-docnames': + cli_fix_duplicate_docnames(options) + elif getattr(options, 'action', None) == 'fix-format': + cli_fix_format(options) + elif getattr(options, 'action', None) == 'check-duplicate-docnames': + cli_check_duplicate_docnames(options) + elif getattr(options, 'action', None) == 'check-format': + cli_check_format(options) + elif getattr(options, 'action', None) == 'undelete': + cli_undelete(options) + elif getattr(options, 'action', None) == 'purge': + cli_purge(options) + elif getattr(options, 'action', None) == 'expunge': + cli_expunge(options) + elif getattr(options, 'action', None) == 'revert': + cli_revert(options) + elif getattr(options, 'action', None) == 'hide': + cli_hide(options) + elif getattr(options, 'action', None) == 'unhide': + cli_unhide(options) else: - print >> sys.stderr, "ERROR: Action %s is not valid" % options.action + print >> sys.stderr, "ERROR: Action %s is not valid" % getattr(options, 'action', None) sys.exit(1) - except InvenioWebSubmitFileError, e: - print >> sys.stderr, 'ERROR: Exception caught: %s' % e + except StandardError, e: + register_exception() + print >> sys.stderr, 'ERROR: %s' % e sys.exit(1) if __name__ == '__main__': main() diff --git a/modules/websubmit/lib/fulltext_files_migration_kit.py b/modules/websubmit/lib/fulltext_files_migration_kit.py index db5484f37..e0645da74 100644 --- a/modules/websubmit/lib/fulltext_files_migration_kit.py +++ b/modules/websubmit/lib/fulltext_files_migration_kit.py @@ -1,142 +1,142 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __revision__ = "$Id$" """This script updates the filesystem structure of fulltext files in order to make it coherent with bibdocfile implementation (bibdocfile.py structure is backward compatible with file.py structure, but the viceversa is not true). """ import sys from invenio.intbitset import intbitset from invenio.textutils import wrap_text_in_a_box from invenio.config import CFG_LOGDIR, CFG_SITE_SUPPORT_EMAIL from invenio.dbquery import run_sql, OperationalError from invenio.bibdocfile import BibRecDocs, InvenioWebSubmitFileError from datetime import datetime def retrieve_fulltext_recids(): """Returns the list of all the recid number linked with at least a fulltext file.""" res = run_sql('SELECT DISTINCT id_bibrec FROM bibrec_bibdoc') return intbitset(res) def fix_recid(recid, logfile): """Fix a given recid.""" print "Upgrading record %s ->" % recid, print >> logfile, "Upgrading record %s:" % recid bibrec = BibRecDocs(recid) print >> logfile, bibrec docnames = bibrec.get_bibdoc_names() try: for docname in docnames: print docname, new_bibdocs = bibrec.fix(docname) new_bibdocnames = [bibdoc.get_docname() for bibdoc in new_bibdocs] if new_bibdocnames: print "(created bibdocs: '%s')" % "', '".join(new_bibdocnames), print >> logfile, "(created bibdocs: '%s')" % "', '".join(new_bibdocnames) except InvenioWebSubmitFileError, e: print >> logfile, BibRecDocs(recid) print "%s -> ERROR", e return False else: print >> logfile, BibRecDocs(recid) print "-> OK" return True def backup_tables(drop=False): """This function create a backup of bibrec_bibdoc, bibdoc and bibdoc_bibdoc tables. Returns False in case dropping of previous table is needed.""" if drop: run_sql('DROP TABLE bibrec_bibdoc_backup') run_sql('DROP TABLE bibdoc_backup') run_sql('DROP TABLE bibdoc_bibdoc_backup') try: run_sql("""CREATE TABLE bibrec_bibdoc_backup (KEY id_bibrec(id_bibrec), KEY id_bibdoc(id_bibdoc)) SELECT * FROM bibrec_bibdoc""") run_sql("""CREATE TABLE bibdoc_backup (PRIMARY KEY id(id)) SELECT * FROM bibdoc""") run_sql("""CREATE TABLE bibdoc_bibdoc_backup (KEY id_bibdoc1(id_bibdoc1), KEY id_bibdoc2(id_bibdoc2)) SELECT * FROM bibdoc_bibdoc""") except OperationalError, e: if not drop: return False - raise e + raise return True def check_yes(): """Return True if the user types 'yes'.""" try: return raw_input().strip() == 'yes' except KeyboardInterrupt: return False def main(): """Core loop.""" logfilename = '%s/fulltext_files_migration_kit-%s.log' % (CFG_LOGDIR, datetime.today().strftime('%Y%m%d%H%M%S')) try: logfile = open(logfilename, 'w') except IOError, e: print wrap_text_in_a_box('NOTE: it\'s impossible to create the log:\n\n %s\n\nbecause of:\n\n %s\n\nPlease run this migration kit as the same user who runs Invenio (e.g. Apache)' % (logfilename, e), style='conclusion', break_long=False) sys.exit(1) recids = retrieve_fulltext_recids() print wrap_text_in_a_box ("""This script migrate the filesystem structure used to store fulltext files to the new stricter structure. This script must not be run during normal Invenio operations. It is safe to run this script. No file will be deleted. Anyway it is recommended to run a backup of the filesystem structure just in case. A backup of the database tables involved will be automatically performed.""", style='important') print "%s records will be migrated/fixed." % len(recids) print "Please type yes if you want to go further:", if not check_yes(): print "INTERRUPTED" sys.exit(1) print "Backing up database tables" try: if not backup_tables(): print wrap_text_in_a_box("""It appears that is not the first time that you run this script. Backup tables have been already created by a previous run. In order for the script to go further they need to be removed.""", style='important') print "Please, type yes if you agree to remove them and go further:", if not check_yes(): print wrap_text_in_a_box("INTERRUPTED", style='conclusion') sys.exit(1) print "Backing up database tables (after dropping previous backup)", - backup_tables() + backup_tables(drop=True) print "-> OK" else: print "-> OK" except Exception, e: print wrap_text_in_a_box("Unexpected error while backing up tables. Please, do your checks: %s" % e, style='conclusion') sys.exit(1) print "Created a complete log file into %s" % logfilename for recid in recids: if not fix_recid(recid, logfile): logfile.close() print wrap_text_in_a_box(title="INTERRUPTED BECAUSE OF ERROR!", body="""Please see the log file %s for what was the status of record %s prior to the error. Contact %s in case of problems, attaching the log.""" % (logfilename, recid, CFG_SITE_SUPPORT_EMAIL), style='conclusion') sys.exit(1) print wrap_text_in_a_box("DONE", style='conclusion') if __name__ == '__main__': main() diff --git a/modules/websubmit/lib/functions/Create_Upload_Files_Interface.py b/modules/websubmit/lib/functions/Create_Upload_Files_Interface.py index d93073bad..4d0ab244b 100644 --- a/modules/websubmit/lib/functions/Create_Upload_Files_Interface.py +++ b/modules/websubmit/lib/functions/Create_Upload_Files_Interface.py @@ -1,1892 +1,1889 @@ ## $Id: Revise_Files.py,v 1.37 2009/03/26 15:11:05 jerome Exp $ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """WebSubmit function - Displays a generic interface to upload, delete and revise files. To be used on par with Move_Uploaded_Files_to_Storage function: - Create_Upload_Files_Interface records the actions performed by user. - Move_Uploaded_Files_to_Storage execute the recorded actions. NOTE: - Comments are kept until they are changed: it is impossible to remove a comment... - Due to the way WebSubmit works, this function can only work when positionned at step 1 in WebSubmit admin, and Move_Uploaded_Files_to_Storage is at step 2 FIXME: - better differentiate between revised file, and added format (currently when adding a format, the whole bibdoc is marked as updated, and all links are removed) - After a file has been revised or added, add a 'check' icon - One issue: if we allow deletion or renaming, we might lose track of a bibdoc: someone adds X, renames X->Y, and adds again another file with name X: when executing actions, we will add the second X, and rename it to Y -> need to go back in previous action when renaming... or check that name has never been used.. """ __revision__ = "$Id$" from invenio.config import \ CFG_SITE_SUPPORT_EMAIL, \ CFG_SITE_URL, \ CFG_SITE_LANG import os import time from invenio.bibdocfile import \ decompose_file, \ calculate_md5, \ BibRecDocs, \ BibDocFile from invenio.websubmit_functions.Shared_Functions import \ createRelatedFormats from invenio.messages import gettext_set_language, wash_language allowed_actions = ['revise', 'delete', 'add', 'addFormat'] def Create_Upload_Files_Interface(parameters, curdir, form, user_info=None): """ List files for revisions. You should use Move_Uploaded_Files_to_Storage.py function in your submission to apply the changes performed by users with this interface. @param parameters:(dictionary) - must contain: + maxsize: the max size allowed for uploaded files + minsize: the max size allowed for uploaded files + doctypes: the list of doctypes (like 'Main' or 'Additional') and their description that users can choose from when adding new files. - When no value is provided, users cannot add new file (they can only revise/delete/add format) - When a single value is given, it is used as default doctype for all new documents Eg: main=Main document|additional=Figure, schema. etc ('=' separates doctype and description '|' separates each doctype/description group) + restrictions: the list of restrictions (like 'Restricted' or 'No Restriction') and their description that users can choose from when adding/revising files. Restrictions can then be configured at the level of WebAccess. - When no value is provided, no restriction is applied - When a single value is given, it is used as default resctriction for all documents. - The first value of the list is used as default restriction if the user if not given the choice of the restriction. CHOOSE THE ORDER! Eg: =No restriction|restr=Restricted ('=' separates restriction and description '|' separates each restriction/description group) + canDeleteDoctypes: the list of doctypes that users are allowed to delete. Eg: Main|Additional ('|' separated values) Use '*' for all doctypes + canReviseDoctypes: the list of doctypes that users are allowed to revise Eg: Main|Additional ('|' separated values) Use '*' for all doctypes + canDescribeDoctypes: the list of doctypes that users are allowed to describe Eg: Main|Additional ('|' separated values) Use '*' for all doctypes + canCommentDoctypes: the list of doctypes that users are allowed to comment Eg: Main|Additional ('|' separated values) Use '*' for all doctypes + canKeepDoctypes: the list of doctypes for which users can choose to keep previous versions visible when revising a file (i.e. 'Keep previous version' checkbox). See also parameter 'keepDefault'. Note that this parameter is ~ignored when revising the attributes of a file (comment, description) without uploading a new file. See also parameter Move_Uploaded_Files_to_Storage.forceFileRevision Eg: Main|Additional ('|' separated values) Use '*' for all doctypes + canAddFormatDoctypes: the list of doctypes for which users can add new formats. If there is no value, then no 'add format' link nor warning about losing old formats are displayed. Eg: Main|Additional ('|' separated values) Use '*' for all doctypes + canRestrictDoctypes: the list of doctypes for which users can choose the access restrictions when adding or revising a file. If no value is given: - no restriction is applied if none is defined in the 'restrictions' parameter. - else the *first* value of the 'restrictions' parameter is used as default restriction. Eg: Main|Additional ('|' separated values) Use '*' for all doctypes + canRenameDoctypes: the list of doctypes that users are allowed to rename (when revising) Eg: Main|Additional ('|' separated values) Use '*' for all doctypes + canNameNewFiles: if user can choose the name of the files they upload (1) or not (0) + defaultFilenameDoctypes: Rename uploaded files to admin-chosen values. List here the the files in current submission directory that contain the names to use for each doctype. Eg: Main=RN|Additional=additional_filename ('=' separates doctype and file in curdir '|' separates each doctype/file group). If the same doctype is submitted several times, a"-%i" suffix is added to the name defined in the file. The default filenames are overriden by user-chosen names if you allow 'canNameNewFiles' or 'canRenameDoctypes'. + maxFilesDoctypes: the maximum number of files that users can upload for each doctype. Eg: Main=1|Additional=2 ('|' separated values) Do not specify the doctype here to have an unlimited number of files for a given doctype. + createRelatedFormats: if uploaded files get converted to whatever format we can (1) or not (0) + keepDefault: the default behaviour for keeping or not previous version of files when users cannot choose (no value in canKeepDoctypes): keep (1) or not (0) Note that this parameter is ignored when revising the attributes of a file (comment, description) without uploading a new file. See also parameter Move_Uploaded_Files_to_Storage.forceFileRevision + showLinks: if we display links to files (1) when possible or not (0) + fileLabel: the label for the file field + filenameLabel: the label for the file name field + descriptionLabel: the label for the description field + commentLabel: the label for the comments field + restrictionLabel: the label in front of the restrictions list + startDoc: the name of a file in curdir that contains some text/markup to be printed *before* the file revision box + endDoc: the name of a file in curdir that contains some text/markup to be printed *after* the file revision box """ global sysno ln = wash_language(form['ln']) _ = gettext_set_language(ln) out = '' ## Fetch parameters defined for this function (minsize, maxsize, doctypes_and_desc, doctypes, can_delete_doctypes, can_revise_doctypes, can_describe_doctypes, can_comment_doctypes, can_keep_doctypes, can_rename_doctypes, can_add_format_to_doctypes, createRelatedFormats_p, can_name_new_files, keep_default, show_links, file_label, filename_label, description_label, comment_label, startDoc, endDoc, restrictions_and_desc, can_restrict_doctypes, restriction_label, doctypes_to_default_filename, max_files_for_doctype) = \ wash_function_parameters(parameters, curdir, ln) # Get the existing bibdocs as well as the actions performed during # the former revise sessions of the user, to build an updated list # of documents. We will use it to check if last action performed # by user is allowed. bibrecdocs = [] if sysno: bibrecdocs = BibRecDocs(sysno) bibdocs = bibrecdocs.list_bibdocs() performed_actions = read_actions_log(curdir) # "merge": abstract_bibdocs = build_updated_files_list(bibdocs, performed_actions, sysno or -1) ## Get and clean parameters received from user (file_action, file_target, file_target_doctype, keep_previous_files, file_description, file_comment, file_rename, file_doctype, file_restriction) = \ wash_form_parameters(form, abstract_bibdocs, can_keep_doctypes, keep_default, can_describe_doctypes, can_comment_doctypes, can_rename_doctypes, can_name_new_files, can_restrict_doctypes, doctypes_to_default_filename) ## Check the last action performed by user, and log it if ## everything is ok if os.path.exists("%s/myfile" % curdir) and \ ((file_action == 'add' and (file_doctype in doctypes)) or \ (file_action == 'revise' and \ ((file_target_doctype in can_revise_doctypes) or \ '*' in can_revise_doctypes)) or (file_action == 'addFormat' and \ ((file_target_doctype in can_add_format_to_doctypes) or \ '*' in can_add_format_to_doctypes))): # A file has been uploaded (user has revised or added a file, # or a format) file_desc = open("%s/myfile" % curdir, "r") myfile = file_desc.read() file_desc.close() dirname, filename, extension = decompose_file(myfile) fullpath = os.path.join(curdir, 'files', 'myfile', myfile) os.unlink("%s/myfile" % curdir) if minsize.isdigit() and os.path.getsize(fullpath) < int(minsize): os.unlink(fullpath) out += '' % \ (_("The uploaded file is too small (<%i o) and has therefore not been considered") % \ int(minsize)).replace('"', '\\"') elif maxsize.isdigit() and os.path.getsize(fullpath) > int(maxsize): os.unlink(fullpath) out += '' % \ (_("The uploaded file is too big (>%i o) and has therefore not been considered") % \ int(maxsize)).replace('"', '\\"') elif len(filename) + len(extension) + 4 > 255: # Max filename = 256, including extension and version that # will be appended later by BibDoc os.unlink(fullpath) out += '' % \ _("The uploaded file name is too long and has therefore not been considered").replace('"', '\\"') elif file_action == 'add' and \ max_files_for_doctype.has_key(file_doctype) and \ max_files_for_doctype[file_doctype] < \ (len([bibdoc for bibdoc in abstract_bibdocs \ if bibdoc['get_type'] == file_doctype]) + 1): # User has tried to upload more than allowed for this # doctype. Should never happen, unless the user did some # nasty things os.unlink(fullpath) out += '' % \ _("You have already reached the maximum number of files for this type of document").replace('"', '\\"') else: # Prepare to move file to # curdir/files/updated/doctype/bibdocname/ folder_doctype = file_doctype or \ bibrecdocs.get_bibdoc(file_target).get_type() folder_bibdocname = file_rename or file_target or filename new_fullpath = os.path.join(curdir, 'files', 'updated', folder_doctype, folder_bibdocname, myfile) # First check that we do not conflict with an already # existing bibdoc name if file_action == "add" and \ ((filename in [bibdoc['get_docname'] for bibdoc \ in abstract_bibdocs] and not file_rename) or \ file_rename in [bibdoc['get_docname'] for bibdoc \ in abstract_bibdocs]): # A file with that name already exist. Cancel action # and tell user. os.unlink(fullpath) out += '' % \ (_("A file named %s already exists. Please choose another name.") % \ (file_rename or filename)).replace('"', '\\"') elif file_action == "revise" and \ file_rename != file_target and \ file_rename in [bibdoc['get_docname'] for bibdoc \ in abstract_bibdocs]: # A file different from the one to revise already has # the same bibdocname os.unlink(fullpath) out += '' % \ (_("A file named %s already exists. Please choose another name.") % \ file_rename).replace('"', '\\"') elif file_action == "addFormat" and \ (extension in \ get_extensions_for_docname(file_target, abstract_bibdocs)): # A file with that extension already exists. Cancel # action and tell user. os.unlink(fullpath) out += '' % \ (_("A file with format '%s' already exists. Please upload another format.") % \ extension).replace('"', '\\"') elif '.' in file_rename or '/' in file_rename or "\\" in file_rename or \ not os.path.abspath(new_fullpath).startswith(os.path.join(curdir, 'files', 'updated')): # We forbid usage of a few characters, for the good of # everybody... os.unlink(fullpath) out += '' % \ _("You are not allowed to use dot '.', slash '/', or backslash '\\\\' in file names. Choose a different name and upload your file again. In particular, note that you should not include the extension in the renaming field.").replace('"', '\\"') else: # No conflict with file name # When revising, delete previously uploaded files for # this entry, so that we do not execute the # corresponding action if file_action == "revise": for path_to_delete in \ get_uploaded_files_for_docname(curdir, file_target): delete(curdir, path_to_delete) # Move uploaded file to curdir/files/updated/doctype/bibdocname/ os.renames(fullpath, new_fullpath) if file_action == "add": # if not bibrecdocs.check_file_exists(new_fullpath): # No need to check: done before... # Log if file_rename != '': # at this point, bibdocname is specified # name, no need to 'rename' filename = file_rename log_action(curdir, file_action, filename, new_fullpath, file_rename, file_description, file_comment, file_doctype, keep_previous_files, file_restriction) # Automatically create additional formats when # possible. additional_formats = [] if createRelatedFormats_p: additional_formats = createRelatedFormats(new_fullpath, overwrite=False) for additional_format in additional_formats: # Log log_action(curdir, 'addFormat', filename, additional_format, file_rename, file_description, file_comment, file_doctype, True, file_restriction) if file_action == "revise" and file_target != "": # Log log_action(curdir, file_action, file_target, new_fullpath, file_rename, file_description, file_comment, file_target_doctype, keep_previous_files, file_restriction) # Automatically create additional formats when # possible. additional_formats = [] if createRelatedFormats_p: additional_formats = createRelatedFormats(new_fullpath, overwrite=False) for additional_format in additional_formats: # Log log_action(curdir, 'addFormat', (file_rename or file_target), additional_format, file_rename, file_description, file_comment, file_target_doctype, True, file_restriction) if file_action == "addFormat" and file_target != "": # We have already checked above that this format does # not already exist. # Log log_action(curdir, file_action, file_target, new_fullpath, file_rename, file_description, file_comment, file_target_doctype, keep_previous_files, file_restriction) elif file_action in ["add", "addFormat"]: # No file found, but action involved adding file: ask user to # select a file out += """""" elif file_action == "revise" and file_target != "": # User has chosen to revise attributes of a file (comment, # name, etc.) without revising the file itself. if file_rename != file_target and \ file_rename in [bibdoc['get_docname'] for bibdoc \ in abstract_bibdocs]: # A file different from the one to revise already has # the same bibdocname out += '' % \ (_("A file named %s already exists. Please choose another name.") % \ file_rename).replace('"', '\\"') else: # Log log_action(curdir, file_action, file_target, "", file_rename, file_description, file_comment, file_target_doctype, keep_previous_files, file_restriction) elif file_action == "delete" and file_target != "" and \ ((file_target_doctype in can_delete_doctypes) or \ '*' in can_delete_doctypes): # Delete previously uploaded files for this entry for path_to_delete in get_uploaded_files_for_docname(curdir, file_target): delete(curdir, path_to_delete) # Log log_action(curdir, file_action, file_target, "", file_rename, file_description, file_comment, "", keep_previous_files, file_restriction) ## Display # Create the list of files based on current files and performed # actions performed_actions = read_actions_log(curdir) bibdocs = bibrecdocs.list_bibdocs() abstract_bibdocs = build_updated_files_list(bibdocs, performed_actions, sysno or -1) abstract_bibdocs.sort(lambda x, y: x['order'] - y['order']) # Display form and necessary CSS + Javscript out += '

    ' out += css out += javascript % {'can_describe_doctypes': repr({}.fromkeys(can_describe_doctypes, '')), 'can_comment_doctypes': repr({}.fromkeys(can_comment_doctypes, '')), 'can_restrict_doctypes': repr({}.fromkeys(can_restrict_doctypes, ''))} # Prepare to display file revise panel "balloon". Check if we # should display the list of doctypes or if it is not necessary (0 # or 1 doctype). Also make sure that we do not exceed the maximum # number of files specified per doctype. cleaned_doctypes = [doctype for doctype in doctypes if not max_files_for_doctype.has_key(doctype) or (max_files_for_doctype[doctype] > \ len([bibdoc for bibdoc in abstract_bibdocs \ if bibdoc['get_type'] == doctype]))] doctypes_list = "" if len(cleaned_doctypes) > 1: doctypes_list = '' elif len(cleaned_doctypes) == 1: doctypes_list = '' % cleaned_doctypes[0] # Check if we should display the list of access restrictions or if # it is not necessary restrictions_list = "" if len(restrictions_and_desc) > 1: restrictions_list = '' restrictions_list = ''' %(restrictions_list)s [?]''' % \ {'restrictions_list': restrictions_list, 'restriction_label': restriction_label, 'restriction_help': _('Choose how you want to restrict access to this file.').replace("'", "\\'")} elif len(restrictions_and_desc) == 1: restrictions_list = '' % {'restriction': restrictions_and_desc[0][0]} else: restrictions_list = '' out += revise_balloon % \ {'CFG_SITE_URL': CFG_SITE_URL, 'doctypes': '', 'file_label': file_label, 'filename_label': filename_label, 'description_label': description_label, 'comment_label': comment_label, 'restrictions': restrictions_list, 'previous_versions_help': _('You can decide to hide or not previous version(s) of this file.').replace("'", "\\'"), 'revise_format_help': _('When you revise a file, the additional formats that you might have previously uploaded are removed, since they no longer up-to-date with the new file.').replace("'", "\\'"), 'revise_format_warning': _('Alternative formats uploaded for current version of this file will be removed'), 'previous_versions_label': _('Keep previous versions'), 'cancel': _('Cancel'), 'upload': _('Upload')} # List the files out += '''
    ''' i = 0 for bibdoc in abstract_bibdocs: if bibdoc['list_latest_files']: i += 1 out += create_file_row(bibdoc, can_delete_doctypes, can_rename_doctypes, can_revise_doctypes, can_describe_doctypes, can_comment_doctypes, can_keep_doctypes, can_add_format_to_doctypes, show_links, can_restrict_doctypes, even=not (i % 2), ln=ln) out += '
    ' if len(cleaned_doctypes) > 0: out += '''%(add_new_file)s)''' % \ {'display_revise_panel':javascript_display_revise_panel(action='add', target='', show_doctypes=True, show_keep_previous_versions=False, show_rename=can_name_new_files, show_description=True, show_comment=True, bibdocname='', description='', comment='', show_restrictions=True, restriction=len(restrictions_and_desc) > 0 and restrictions_and_desc[0][0] or ''), 'defaultSelectedDoctype': doctypes[0], 'add_new_file': _("Add new file")} out += '
    ' # End submission button out += '''

    ''' % \ {'apply_changes': _("Apply changes")} if startDoc: # Add a prefix prefix = read_file(curdir, startDoc) if prefix: out = prefix + out if endDoc: # Add a suffix suffix = read_file(curdir, endDoc) if suffix: out += '
    ' + suffix + '
    ' # Close form out += '' # Display a link to support email in case users have problem # revising/adding files mailto_link = '%(CFG_SITE_SUPPORT_EMAIL)s' % \ {'CFG_SITE_SUPPORT_EMAIL': CFG_SITE_SUPPORT_EMAIL, 'email_subject': "Need%%20help%%20revising%%20or%%20adding%%20record%%20%(sysno)s" % {'sysno': sysno or '(new)'}, 'email_body': "Dear%%20CDS%%20Support,%%0D%%0A%%0D%%0AI%%20need%%20help%%20to%%20revise%%20or%%20add%%20a%%20file%%20in%%20record%%20%(sysno)s.%%20I%%20have%%20attached%%20the%%20new%%20version%%20to%%20this%%20mail.%%0D%%0A%%0D%%0ABest%%20regards" % {'sysno': sysno or '(new)'}} problem_revising = _('Having a problem revising a file? Send the revised version to %(mailto_link)s.') % {'mailto_link': mailto_link} if len(cleaned_doctypes) > 0: # We can add files, so change note problem_revising = 'Having a problem adding or revising a file? Send the new/revised version to %(mailto_link)s.' % {'mailto_link': mailto_link} out += '
    ' out += problem_revising out += '
    ' return out def create_file_row(abstract_bibdoc, can_delete_doctypes, can_rename_doctypes, can_revise_doctypes, can_describe_doctypes, can_comment_doctypes, can_keep_doctypes, can_add_format_to_doctypes, show_links, can_restrict_doctypes, even=False, ln=CFG_SITE_LANG): """ Creates a row in the files list. Parameters : abstract_bibdoc - list of "fake" BibDocs: it is a list of dictionaries with keys 'list_latest_files' and 'get_docname' with values corresponding to what you would expect to receive when calling their counterpart function on a real BibDoc object. can_delete_doctypes - list of doctypes for which we allow users to delete documents can_revise_doctypes - the list of doctypes that users are allowed to revise. can_describe_doctypes - the list of doctypes that users are allowed to describe. can_comment_doctypes - the list of doctypes that users are allowed to comment. can_keep_doctypes - the list of doctypes for which users can choose to keep previous versions visible when revising a file (i.e. 'Keep previous version' checkbox). can_rename_doctypes - the list of doctypes that users are allowed to rename (when revising) can_add_format_to_doctypes - the list of doctypes for which users can add new formats show_links - if we display links to files even - if the row is even or odd on the list """ _ = gettext_set_language(ln) # Try to retrieve "main format", to display as link for the # file. There is no such concept in BibDoc, but let's just try to # get the pdf file if it exists main_bibdocfile = [bibdocfile for bibdocfile in abstract_bibdoc['list_latest_files'] \ if bibdocfile.get_format().strip('.').lower() == 'pdf'] if len(main_bibdocfile) > 0: main_bibdocfile = main_bibdocfile[0] else: main_bibdocfile = abstract_bibdoc['list_latest_files'][0] main_bibdocfile_description = main_bibdocfile.get_description() if main_bibdocfile_description is None: main_bibdocfile_description = '' updated = abstract_bibdoc['updated'] # Has BibDoc been updated? # Main file row out = '' % (even and ' class="even"' or '') out += '' if not updated and show_links: out += '' out += abstract_bibdoc['get_docname'] if not updated and show_links: out += '' if main_bibdocfile_description: out += ' (' + main_bibdocfile_description + ')' out += '' (description, comment) = get_description_and_comment(abstract_bibdoc['list_latest_files']) restriction = abstract_bibdoc['get_status'] # Revise link out += '' if main_bibdocfile.get_type() in can_revise_doctypes or \ '*' in can_revise_doctypes: out += '[%(revise)s]' % \ {'display_revise_panel': javascript_display_revise_panel( action='revise', target=abstract_bibdoc['get_docname'], show_doctypes=False, show_keep_previous_versions=(main_bibdocfile.get_type() in can_keep_doctypes) or '*' in can_keep_doctypes, show_rename=(main_bibdocfile.get_type() in can_rename_doctypes) or '*' in can_rename_doctypes, show_description=(main_bibdocfile.get_type() in can_describe_doctypes) or '*' in can_describe_doctypes, show_comment=(main_bibdocfile.get_type() in can_comment_doctypes) or '*' in can_comment_doctypes, bibdocname=abstract_bibdoc['get_docname'], description=description, comment=comment, show_restrictions=(main_bibdocfile.get_type() in can_restrict_doctypes) or '*' in can_restrict_doctypes, restriction=restriction), 'revise': _("revise") } # Delete link if main_bibdocfile.get_type() in can_delete_doctypes or \ '*' in can_delete_doctypes: out += '''[%(delete)s] ''' % {'bibdocname': abstract_bibdoc['get_docname'].replace("'", "\\'").replace('"', '"'), 'delete': _("delete")} out += '''''' # Format row out += ''' ''' % (even and ' class="even"' or '', CFG_SITE_URL) for bibdocfile in abstract_bibdoc['list_latest_files']: if not updated and show_links: out += '' out += bibdocfile.get_format().strip('.') if not updated and show_links: out += '' out += ' ' # Add format link out += '' if main_bibdocfile.get_type() in can_add_format_to_doctypes or \ '*' in can_add_format_to_doctypes: out += '[%(add_format)s]' % \ {'display_revise_panel':javascript_display_revise_panel( action='addFormat', target=abstract_bibdoc['get_docname'], show_doctypes=False, show_keep_previous_versions=False, show_rename=False, show_description=False, show_comment=False, bibdocname='', description='', comment='', show_restrictions=False, restriction=restriction), 'add_format':_("add format")} out += '' return out def log_action(log_dir, action, bibdoc_name, file_path, rename, description, comment, doctype, keep_previous_versions, file_restriction): """ Logs a new action performed by user on a BibDoc file. Parameters: log_dir - directory where to save the log (ie. curdir) action - the performed action (one of 'revise', 'delete', 'add', 'addFormat') bibdoc_name - the name of the bibdoc on which the change is applied file_path - the path to the file that is going to be integrated as bibdoc, if any (should be "" in case of action="delete", or action="revise" when revising only attributes of a file) rename - the name used to display the bibdoc, instead of the filename (can be None for no renaming) description - a description associated with the file comment - a comment associated with the file doctype - the category in which the file is going to be integrated keep_previous_versions - if the previous versions of this file are to be hidden (0) or not (1) file_restriction - the restriction applied to the file. Empty string if no restriction There is one action per line in the file, each column being split by '---' ('---' is escaped from values 'rename', 'description', 'comment' and 'bibdoc_name') Newlines are also reserved, and are escaped from the input values (necessary for the 'comment' field, which is the only one allowing newlines from the browser) Each line starts with the time of the action in the following format: '2008-06-20 08:02:04 --> ' """ log_file = os.path.join(log_dir, 'bibdocactions.log') - try: - file_desc = open(log_file, "a+") - # We must escape new lines from comments in some way: - comment = str(comment).replace('\\', '\\\\').replace('\r\n', '\\n\\r') - msg = action + '---' + \ - bibdoc_name.replace('---', '___') + '---' + \ - file_path + '---' + \ - str(rename).replace('---', '___') + '---' + \ - str(description).replace('---', '___') + '---' + \ - comment.replace('---', '___') + '---' + \ - doctype + '---' + \ - str(int(keep_previous_versions)) + '---' + \ - file_restriction + '\n' - file_desc.write("%s --> %s" %(time.strftime("%Y-%m-%d %H:%M:%S"), msg)) - file_desc.close() - except Exception ,e: - raise e + file_desc = open(log_file, "a+") + # We must escape new lines from comments in some way: + comment = str(comment).replace('\\', '\\\\').replace('\r\n', '\\n\\r') + msg = action + '---' + \ + bibdoc_name.replace('---', '___') + '---' + \ + file_path + '---' + \ + str(rename).replace('---', '___') + '---' + \ + str(description).replace('---', '___') + '---' + \ + comment.replace('---', '___') + '---' + \ + doctype + '---' + \ + str(int(keep_previous_versions)) + '---' + \ + file_restriction + '\n' + file_desc.write("%s --> %s" %(time.strftime("%Y-%m-%d %H:%M:%S"), msg)) + file_desc.close() def read_actions_log(log_dir): """ Reads the logs of action to be performed on files See log_action(..) for more information about the structure of the log file. """ actions = [] log_file = os.path.join(log_dir, 'bibdocactions.log') try: file_desc = open(log_file, "r") for line in file_desc.readlines(): (timestamp, action) = line.split(' --> ', 1) try: (action, bibdoc_name, file_path, rename, description, comment, doctype, keep_previous_versions, file_restriction) = action.rstrip('\n').split('---') except ValueError, e: # Malformed action log pass # Clean newline-escaped comment: comment = comment.replace('\\n\\r', '\r\n').replace('\\\\', '\\') # Perform some checking if action not in allowed_actions: # Malformed action log pass try: keep_previous_versions = int(keep_previous_versions) except: # Malformed action log keep_previous_versions = 1 pass actions.append((action, bibdoc_name, file_path, rename, \ description, comment, doctype, keep_previous_versions, file_restriction)) file_desc.close() except: pass return actions def build_updated_files_list(bibdocs, actions, recid): """ Parses the list of BibDocs and builds an updated version to reflect the changes performed by the user of the file It is necessary to abstract the BibDocs since user wants to perform action on the files that are committed only at the end of the session. """ abstract_bibdocs = {} i = 0 for bibdoc in bibdocs: i += 1 status = bibdoc.get_status() if status == "DELETED": status = '' abstract_bibdocs[bibdoc.get_docname()] = \ {'list_latest_files': bibdoc.list_latest_files(), 'get_docname': bibdoc.get_docname(), 'updated': False, 'get_type': bibdoc.get_type(), 'get_status': status, 'order': i} for action, bibdoc_name, file_path, rename, description, \ comment, doctype, keep_previous_versions, \ file_restriction in actions: dirname, filename, format = decompose_file(file_path) i += 1 if action in ["add", "revise"] and \ os.path.exists(file_path): checksum = calculate_md5(file_path) order = i if action == "revise" and \ abstract_bibdocs.has_key(bibdoc_name): # Keep previous values order = abstract_bibdocs[bibdoc_name]['order'] doctype = abstract_bibdocs[bibdoc_name]['get_type'] if bibdoc_name.strip() == '' and rename.strip() == '': bibdoc_name = os.path.extsep.join(filename.split(os.path.extsep)[:-1]) elif rename.strip() != '' and \ abstract_bibdocs.has_key(bibdoc_name): # Keep previous position del abstract_bibdocs[bibdoc_name] abstract_bibdocs[(rename or bibdoc_name)] = \ {'list_latest_files': [BibDocFile(file_path, doctype, version=1, name=(rename or bibdoc_name), format=format, recid=int(recid), docid=-1, status=file_restriction, checksum=checksum, description=description, comment=comment)], 'get_docname': rename or bibdoc_name, 'get_type': doctype, 'updated': True, 'get_status': file_restriction, 'order': order} abstract_bibdocs[(rename or bibdoc_name)]['updated'] = True elif action == "revise" and not file_path: # revision of attributes of a file (description, name, # comment or restriction) but no new file. abstract_bibdocs[bibdoc_name]['get_docname'] = rename or bibdoc_name abstract_bibdocs[bibdoc_name]['get_status'] = file_restriction set_description_and_comment(abstract_bibdocs[bibdoc_name]['list_latest_files'], description, comment) abstract_bibdocs[bibdoc_name]['updated'] = True elif action == "delete": if abstract_bibdocs.has_key(bibdoc_name): del abstract_bibdocs[bibdoc_name] elif action == "addFormat" and \ os.path.exists(file_path): checksum = calculate_md5(file_path) # Preserve type and status doctype = abstract_bibdocs[bibdoc_name]['get_type'] file_restriction = abstract_bibdocs[bibdoc_name]['get_status'] abstract_bibdocs[bibdoc_name]['list_latest_files'].append(\ BibDocFile(file_path, doctype, version=1, name=(rename or bibdoc_name), format=format, recid=int(recid), docid=-1, status='', checksum=checksum, description=description, comment=comment)) abstract_bibdocs[bibdoc_name]['updated'] = True return abstract_bibdocs.values() def get_uploaded_files_for_docname(log_dir, docname): """ Given a docname, returns the paths to the files uploaded for this revision session. """ return [file_path for action, bibdoc_name, file_path, rename, \ description, comment, doctype, keep_previous_versions , \ file_restriction in read_actions_log(log_dir) \ if bibdoc_name == docname and os.path.exists(file_path)] def get_bibdoc_for_docname(docname, abstract_bibdocs): """ Given a docname, returns the corresponding bibdoc from the 'abstract' bibdocs. Return None if not found """ bibdocs = [bibdoc for bibdoc in abstract_bibdocs \ if bibdoc['get_docname'] == docname] if len(bibdocs) > 0: return bibdocs[0] else: return None def get_extensions_for_docname(docname, abstract_bibdocs): """Returns the list of extensions that exists for given bibdoc name in the given 'abstract' bibdocs.""" bibdocfiles = [bibdoc['list_latest_files'] for bibdoc \ in abstract_bibdocs \ if bibdoc['get_docname'] == docname] if len(bibdocfiles) > 0: # There should always be at most 1 matching docname, or 0 if # it is a new file return [bibdocfile.get_format() for bibdocfile \ in bibdocfiles[0]] return [] def delete(curdir, file_path): """ Deletes a file at given path from the file. In fact, we just move it to curdir/files/trash """ if os.path.exists(file_path): filename = os.path.split(file_path)[1] move_to = os.path.join(curdir, 'files', 'trash', filename +'_' + str(time.time())) os.renames(file_path, move_to) def wash_function_parameters(parameters, curdir, ln=CFG_SITE_LANG): """ Returns the functions (admin-defined) parameters washed and initialized properly, as a tuple: Parameters: check Create_Upload_Files_Interface(..) docstring Returns: tuple (minsize, maxsize, doctypes_and_desc, doctypes, can_delete_doctypes, can_revise_doctypes, can_describe_doctypes can_comment_doctypes, can_keep_doctypes, can_rename_doctypes, can_add_format_to_doctypes, createRelatedFormats_p, can_name_new_files, keep_default, show_links, file_label, filename_label, description_label, comment_label, startDoc, endDoc, access_restrictions_and_desc, can_restrict_doctypes, restriction_label, doctypes_to_default_filename, max_files_for_doctype) """ _ = gettext_set_language(ln) # The min and max files sizes that users can upload minsize = parameters['minsize'] maxsize = parameters['maxsize'] # The list of doctypes + description that users can select when # adding new files. If there are no values, then user cannot add # new files. '|' is used to separate doctypes groups, and '=' to # separate doctype and description. Eg: # main=Main document|additional=Figure, schema. etc doctypes_and_desc = [doctype.strip().split("=") for doctype \ in parameters['doctypes'].split('|') \ if doctype.strip() != ''] doctypes = [doctype for (doctype, desc) in doctypes_and_desc] doctypes_and_desc = [[doctype, _(desc)] for \ (doctype, desc) in doctypes_and_desc] # The list of doctypes users are allowed to delete # (list of values separated by "|") can_delete_doctypes = [doctype.strip() for doctype \ in parameters['canDeleteDoctypes'].split('|') \ if doctype.strip() != ''] # The list of doctypes users are allowed to revise # (list of values separated by "|") can_revise_doctypes = [doctype.strip() for doctype \ in parameters['canReviseDoctypes'].split('|') \ if doctype.strip() != ''] # The list of doctypes users are allowed to describe # (list of values separated by "|") can_describe_doctypes = [doctype.strip() for doctype \ in parameters['canDescribeDoctypes'].split('|') \ if doctype.strip() != ''] # The list of doctypes users are allowed to comment # (list of values separated by "|") can_comment_doctypes = [doctype.strip() for doctype \ in parameters['canCommentDoctypes'].split('|') \ if doctype.strip() != ''] # The list of doctypes for which users are allowed to decide # if they want to keep old files or not when revising # (list of values separated by "|") can_keep_doctypes = [doctype.strip() for doctype \ in parameters['canKeepDoctypes'].split('|') \ if doctype.strip() != ''] # The list of doctypes users are allowed to rename # (list of values separated by "|") can_rename_doctypes = [doctype.strip() for doctype \ in parameters['canRenameDoctypes'].split('|') \ if doctype.strip() != ''] # The mapping from doctype to default filename. # '|' is used to separate doctypes groups, and '=' to # separate doctype and file in curdir where the default name is. Eg: # main=main_filename|additional=additional_filename. etc default_doctypes_and_curdir_files = [doctype.strip().split("=") for doctype \ in parameters['defaultFilenameDoctypes'].split('|') \ if doctype.strip() != ''] doctypes_to_default_filename = {} for doctype, curdir_file in default_doctypes_and_curdir_files: default_filename = read_file(curdir, curdir_file) if default_filename: doctypes_to_default_filename[doctype] = os.path.basename(default_filename) # The maximum number of files that can be uploaded for each doctype # Eg: # main=1|additional=3 doctypes_and_max_files = [doctype.strip().split("=") for doctype \ in parameters['maxFilesDoctypes'].split('|') \ if doctype.strip() != ''] max_files_for_doctype = {} for doctype, max_files in doctypes_and_max_files: if max_files.isdigit(): max_files_for_doctype[doctype] = int(max_files) # The list of doctypes for which users are allowed to add new formats # (list of values separated by "|") can_add_format_to_doctypes = [doctype.strip() for doctype \ in parameters['canAddFormatDoctypes'].split('|') \ if doctype.strip() != ''] # The list of access restrictions + description that users can # select when adding new files. If there are no values, no # restriction is applied . '|' is used to separate access # restrictions groups, and '=' to separate access restriction and # description. Eg: main=Main document|additional=Figure, # schema. etc access_restrictions_and_desc = [access.strip().split("=") for access \ in parameters['restrictions'].split('|') \ if access.strip() != ''] access_restrictions_and_desc = [[access, _(desc)] for \ (access, desc) in access_restrictions_and_desc] # The list of doctypes users are allowed to restrict # (list of values separated by "|") can_restrict_doctypes = [restriction.strip() for restriction \ in parameters['canRestrictDoctypes'].split('|') \ if restriction.strip() != ''] # If we should create additional formats when applicable (1) or # not (0) try: createRelatedFormats_p = int(parameters['createRelatedFormats']) except ValueError, e: createRelatedFormats_p = False # If users can name the files they add # Value should be 0 (Cannot rename) or 1 (Can rename) try: can_name_new_files = int(parameters['canNameNewFiles']) except ValueError, e: can_name_new_files = False # The default behaviour wrt keeping previous files or not. # 0 = do not keep, 1 = keep try: keep_default = int(parameters['keepDefault']) except ValueError, e: keep_default = False # If we display links to files (1) or not (0) try: show_links = int(parameters['showLinks']) except ValueError, e: show_links = True file_label = parameters['fileLabel'] if file_label == "": file_label = _('Choose a file') filename_label = parameters['filenameLabel'] if filename_label == "": filename_label = _('Name') description_label = parameters['descriptionLabel'] if description_label == "": description_label = _('Description') comment_label = parameters['commentLabel'] if comment_label == "": comment_label = _('Comment') restriction_label = parameters['restrictionLabel'] if restriction_label == "": restriction_label = _('Access') startDoc = parameters['startDoc'] endDoc = parameters['endDoc'] return (minsize, maxsize, doctypes_and_desc, doctypes, can_delete_doctypes, can_revise_doctypes, can_describe_doctypes, can_comment_doctypes, can_keep_doctypes, can_rename_doctypes, can_add_format_to_doctypes, createRelatedFormats_p, can_name_new_files, keep_default, show_links, file_label, filename_label, description_label, comment_label, startDoc, endDoc, access_restrictions_and_desc, can_restrict_doctypes, restriction_label, doctypes_to_default_filename, max_files_for_doctype) def wash_form_parameters(form, abstract_bibdocs, can_keep_doctypes, keep_default, can_describe_doctypes, can_comment_doctypes, can_rename_doctypes, can_name_new_files, can_restrict_doctypes, doctypes_to_default_filename): """ Washes the (user-defined) form parameters, taking into account the current state of the files and the admin defaults. Parameters: -form: the form of the function -abstract_bibdocs: a representation of the current state of the files, as returned by build_updated_file_list(..) -can_keep_doctypes *list* the list of doctypes for which we allow users to choose to keep or not the previous versions when revising. -keep_default *bool* the admin-defined default for when users cannot choose to keep or not previous version of a revised file -can_describe_doctypes: *list* the list of doctypes for which we let users define descriptions. -can_comment_doctypes: *list* the list of doctypes for which we let users define comments. -can_rename_doctypes: *list* the list of doctypes for which we let users rename bibdoc when revising. -can_name_new_files: *bool* if we let users choose a name when adding new files. -can_restrict_doctypes: *list* the list of doctypes for which we let users define access restrictions. -doctypes_to_default_filename: *dict* mapping from doctype to admin-chosen name for uploaded file. Returns: tuple (file_action, file_target, file_target_doctype, keep_previous_files, file_description, file_comment, file_rename, file_doctype, file_restriction) where: file_action: *str* the performed action ('add', 'revise','addFormat' or 'delete') file_target: *str* the bibdocname of the file on which the action is performed (empty string when file_action=='add') file_target_doctype: *str* the doctype of the file we will work on. Eg: ('Main', 'Additional'). Empty string with file_action=='add'. keep_previous_files: *bool* if we keep the previous version of the file or not. Only useful when revising files. file_description: *str* the user-defined description to apply to the file. Empty string when no description defined or when not applicable file_comment: *str* the user-defined comment to apply to the file. Empty string when no comment defined or when not applicable file_rename: *str* the new name chosen by user for the bibdoc. Empty string when not defined or when not applicable. file_doctype: *str* the user-chosen doctype for the bibdoc when file_action=='add', or the current doctype of the file_target in other cases (doctype must be preserved). file_restriction: *str* the user-selected restriction for the file. Emptry string if not defined or when not applicable """ # Action performed ... if form.has_key("fileAction") and \ form['fileAction'] in allowed_actions: file_action = form['fileAction'] # "add", "revise", # "addFormat" or "delete" else: file_action = "" # ... on file ... if form.has_key("fileTarget"): file_target = form['fileTarget'] # contains bibdocname # Also remember its doctype to make sure we do valid actions # on it corresponding_bibdoc = get_bibdoc_for_docname(file_target, abstract_bibdocs) if corresponding_bibdoc is not None: file_target_doctype = corresponding_bibdoc['get_type'] else: file_target_doctype = "" else: file_target = "" file_target_doctype = "" # ... with doctype? # Only useful when adding file: otherwise fileTarget doctype is # preserved file_doctype = file_target_doctype if form.has_key("fileDoctype") and \ file_action == 'add': file_doctype = form['fileDoctype'] # ... keeping previous version? ... if file_target_doctype != '' and \ not form.has_key("keepPreviousFiles"): # no corresponding key. Two possibilities: if file_target_doctype in can_keep_doctypes or \ '*' in can_keep_doctypes: # User decided no to keep keep_previous_files = 0 else: # No choice for user. Use default admin has chosen keep_previous_files = keep_default else: # Checkbox seems to be checked ... if file_target_doctype in can_keep_doctypes or \ '*' in can_keep_doctypes: # ...and this is allowed keep_previous_files = 1 else: # ...but this is not allowed keep_previous_files = keep_default # ... and decription? ... #if file_action == 'add': #raise repr((file_target_doctype, can_describe_doctypes)) #raise repr((form.has_key("description"), (file_target_doctype in can_describe_doctypes), '*' in can_describe_doctypes)) if form.has_key("description") and \ (((file_action == 'revise' and \ (file_target_doctype in can_describe_doctypes)) or \ (file_action == 'add' and \ (file_doctype in can_describe_doctypes))) \ or '*' in can_describe_doctypes): file_description = form['description'] else: file_description = '' # ... and comment? ... if form.has_key("comment") and \ (((file_action == 'revise' and \ (file_target_doctype in can_comment_doctypes)) or \ (file_action == 'add' and \ (file_doctype in can_comment_doctypes))) \ or '*' in can_comment_doctypes): file_comment = form['comment'] else: file_comment = '' # ... and rename to ? ... if form.has_key("rename") and \ ((file_action == "revise" and \ ((file_target_doctype in can_rename_doctypes) or \ '*' in can_rename_doctypes)) or \ (file_action == "add" and \ can_name_new_files)): file_rename = form['rename'] # contains new bibdocname if applicable elif file_action == "add" and \ doctypes_to_default_filename.has_key(file_doctype): # Admin-chosen name. Ensure it is unique by appending a suffix file_rename = doctypes_to_default_filename[file_doctype] file_counter = 2 while get_bibdoc_for_docname(file_rename, abstract_bibdocs): if file_counter == 2: file_rename += '-2' else: file_rename = file_rename[:-len(str(file_counter))] + \ str(file_counter) file_counter += 1 else: file_rename = '' # ... and file restriction ? ... file_restriction = '' if form.has_key("fileRestriction"): # We cannot clean that value as it could be a restriction # declared in another submission. We keep this value. file_restriction = form['fileRestriction'] return (file_action, file_target, file_target_doctype, keep_previous_files, file_description, file_comment, file_rename, file_doctype, file_restriction) def get_description_and_comment(bibdocfiles): """ Returns the first description and comment as tuple (description, comment) found in the given list of bibdocfile description and/or comment can be None. This function is needed since we do consider that there is one comment/description per bibdoc, and not per bibdocfile as APIs state. @see: set_description_and_comment """ description = None comment = None all_descriptions = [bibdocfile.get_description() for bibdocfile \ in bibdocfiles if bibdocfile.get_description() not in ['', None]] if len(all_descriptions) > 0: description = all_descriptions[0] all_comments = [bibdocfile.get_comment() for bibdocfile \ in bibdocfiles if bibdocfile.get_comment() not in ['', None]] if len(all_comments) > 0: comment = all_comments[0] return (description, comment) def set_description_and_comment(abstract_bibdocfiles, description, comment): """ Set the description and comment to the given (abstract) bibdocfiles. description and/or comment can be None. This function is needed since we do consider that there is one comment/description per bibdoc, and not per bibdocfile as APIs state. @see: get_description_and_comment """ for bibdocfile in abstract_bibdocfiles: bibdocfile.description = description bibdocfile.comment = comment def read_file(curdir, filename): """ Reads a file in curdir. Returns None if does not exist, cannot be read, or if file is not really in curdir """ try: file_path = os.path.abspath(os.path.join(curdir, filename)) if not file_path.startswith(curdir): return None file_desc = file(file_path, 'r') content = file_desc.read() file_desc.close() except: content = None return content def javascript_display_revise_panel(action, target, show_doctypes, show_keep_previous_versions, show_rename, show_description, show_comment, bibdocname, description, comment, show_restrictions, restriction): """ Returns a correctly encoded call to the javascript function to display the revision panel. """ def escape_js_string_param(input): "Escape string parameter to be used in Javascript function" return input.replace('\\', '\\\\').replace('\r', '\\r').replace('\n', '\\n').replace("'", "\\'").replace('"', '"') return '''display_revise_panel(this, '%(action)s', '%(target)s', %(showDoctypes)s, %(showKeepPreviousVersions)s, %(showRename)s, %(showDescription)s, %(showComment)s, '%(bibdocname)s', '%(description)s', '%(comment)s', %(showRestrictions)s, '%(restriction)s')''' % \ {'action': action, 'showDoctypes': show_doctypes and 'true' or 'false', 'target': escape_js_string_param(target), 'bibdocname': escape_js_string_param(bibdocname), 'showRename': show_rename and 'true' or 'false', 'showKeepPreviousVersions': show_keep_previous_versions and 'true' or 'false', 'showComment': show_comment and 'true' or 'false', 'showDescription': show_description and 'true' or 'false', 'description': description and escape_js_string_param(description) or '', 'comment': comment and escape_js_string_param(comment) or '', 'showRestrictions': show_restrictions and 'true' or 'false', 'restriction': escape_js_string_param(restriction)} ## Javascript + HTML + CSS for the web interface # The Javascript function embedded in the page to provide interaction # with the revise panel javascript = ''' ''' # The CSS embedded in the page for the revise panel css = ''' ''' % {'CFG_SITE_URL': CFG_SITE_URL} # The HTML markup of the revise panel revise_balloon = ''' ''' diff --git a/modules/websubmit/lib/functions/Makefile.am b/modules/websubmit/lib/functions/Makefile.am index 432c84aee..772fe5edc 100644 --- a/modules/websubmit/lib/functions/Makefile.am +++ b/modules/websubmit/lib/functions/Makefile.am @@ -1,90 +1,89 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. pylibdir=$(libdir)/python/invenio/websubmit_functions pylib_DATA = __init__.py \ Add_Files.py \ Allocate_ALEPH_SYS.py \ Ask_For_Record_Details_Confirmation.py \ CaseEDS.py \ Check_Group.py \ Convert_RecXML_to_RecALEPH.py \ Convert_RecXML_to_RecALEPH_DELETE.py \ Create_Cplx_Approval.py \ Create_Modify_Interface.py \ Create_Recid.py \ Finish_Submission.py \ Format_Record.py \ Generate_Group_File.py \ Get_Info.py \ Get_Recid.py \ Get_Report_Number.py \ Get_Sysno.py \ Insert_Modify_Record.py \ Insert_Record.py \ Is_Original_Submitter.py \ Is_Referee.py \ Mail_Approval_Request_to_Committee_Chair.py \ Mail_Approval_Request_to_Referee.py \ Mail_Approval_Withdrawn_to_Referee.py \ Mail_Submitter.py \ Mail_New_Record_Notification.py \ Make_Dummy_MARC_XML_Record.py \ Make_Modify_Record.py \ Make_Record.py \ Move_Files_Archive.py \ Move_From_Pending.py \ Move_to_Done.py \ Move_to_Pending.py \ Print_Success.py \ Print_Success_Approval_Request.py \ Print_Success_APP.py \ Print_Success_CPLX.py \ Print_Success_DEL.py \ Print_Success_MBI.py \ Print_Success_SRV.py \ Register_Approval_Request.py \ Register_Referee_Decision.py \ Withdraw_Approval_Request.py \ Report_Number_Generation.py \ Retrieve_Data.py \ Second_Report_Number_Generation.py \ Send_APP_Mail.py \ Send_Approval_Request.py \ Send_Delete_Mail.py \ Send_Modify_Mail.py \ Send_Request_For_Direct_Approval.py \ Send_Request_For_Publication.py \ Send_Request_For_Refereeing_Process.py \ Send_SRV_Mail.py \ Stamp_Replace_Single_File_Approval.py \ Stamp_Uploaded_Files.py \ Test_Status.py \ Update_Approval_DB.py \ - Upload_Files.py \ User_is_Record_Owner_or_Curator.py \ Shared_Functions.py \ Move_Files_to_Storage.py \ Move_FCKeditor_Files_to_Storage.py \ Create_Upload_Files_Interface.py \ Move_Uploaded_Files_to_Storage.py \ Move_Revised_Files_to_Storage.py EXTRA_DIST = $(pylib_DATA) CLEANFILES = *~ *.tmp *.pyc diff --git a/modules/websubmit/lib/functions/Shared_Functions.py b/modules/websubmit/lib/functions/Shared_Functions.py index 23e0adb7d..721b9b6c6 100644 --- a/modules/websubmit/lib/functions/Shared_Functions.py +++ b/modules/websubmit/lib/functions/Shared_Functions.py @@ -1,173 +1,179 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Functions shared by websubmit_functions""" __revision__ = "$Id$" from invenio.config import \ - CFG_PATH_ACROREAD, \ CFG_PATH_CONVERT, \ - CFG_PATH_DISTILLER, \ CFG_PATH_GUNZIP, \ CFG_PATH_GZIP from invenio.bibdocfile import decompose_file +from invenio.websubmit_file_converter import convert_file, InvenioWebSubmitFileConverterError +from invenio.websubmit_config import InvenioWebSubmitFunctionError import re import os def createRelatedFormats(fullpath, overwrite=True): """Given a fullpath, this function extracts the file's extension and finds in which additional format the file can be converted and converts it. @param fullpath: (string) complete path to file @param overwrite: (bool) overwrite already existing formats Return a list of the paths to the converted files """ createdpaths = [] basedir, filename, extension = decompose_file(fullpath) extension = extension.lower() if extension == ".pdf": - if overwrite == True or \ + if overwrite or \ not os.path.exists("%s/%s.ps" % (basedir, filename)): # Create PostScript - os.system("%s -toPostScript %s" % (CFG_PATH_ACROREAD, fullpath)) - if overwrite == True or \ + try: + convert_file(fullpath, "%s/%s.ps" % (basedir, filename)) + createdpaths.append("%s/%s.ps" % (basedir, filename)) + except InvenioWebSubmitFileConverterError: + pass + if overwrite or \ not os.path.exists("%s/%s.ps.gz" % (basedir, filename)): if os.path.exists("%s/%s.ps" % (basedir, filename)): os.system("%s %s/%s.ps" % (CFG_PATH_GZIP, basedir, filename)) createdpaths.append("%s/%s.ps.gz" % (basedir, filename)) if extension == ".ps": - if overwrite == True or \ + if overwrite or \ not os.path.exists("%s/%s.pdf" % (basedir, filename)): # Create PDF - os.system("%s %s %s/%s.pdf" % (CFG_PATH_DISTILLER, fullpath, \ - basedir, filename)) - if os.path.exists("%s/%s.pdf" % (basedir, filename)): + try: + convert_file(fullpath, "%s/%s.pdf" % (basedir, filename)) createdpaths.append("%s/%s.pdf" % (basedir, filename)) + except InvenioWebSubmitFileConverterError: + pass if extension == ".ps.gz": - if overwrite == True or \ + if overwrite or \ not os.path.exists("%s/%s.ps" % (basedir, filename)): #gunzip file os.system("%s %s" % (CFG_PATH_GUNZIP, fullpath)) - if overwrite == True or \ + if overwrite or \ not os.path.exists("%s/%s.pdf" % (basedir, filename)): # Create PDF - os.system("%s %s/%s.ps %s/%s.pdf" % (CFG_PATH_DISTILLER, basedir, \ - filename, basedir, filename)) - if os.path.exists("%s/%s.pdf" % (basedir, filename)): + try: + convert_file("%s/%s.ps" % (basedir, filename), "%s/%s.pdf" % (basedir, filename)) createdpaths.append("%s/%s.pdf" % (basedir, filename)) + except InvenioWebSubmitFileConverterError: + pass #gzip file if not os.path.exists("%s/%s.ps.gz" % (basedir, filename)): os.system("%s %s/%s.ps" % (CFG_PATH_GZIP, basedir, filename)) return createdpaths def createIcon(fullpath, iconsize): """Given a fullpath, this function extracts the file's extension and if the format is compatible it converts it to icon. @param fullpath: (string) complete path to file Return the iconpath if successful otherwise None """ basedir = os.path.dirname(fullpath) filename = os.path.basename(fullpath) filename, extension = os.path.splitext(filename) if extension == filename: extension == "" iconpath = "%s/icon-%s.gif" % (basedir, filename) if os.path.exists(fullpath) and extension.lower() in ['.pdf', '.gif', '.jpg', '.jpeg', '.ps']: os.system("%s -scale %s %s %s" % (CFG_PATH_CONVERT, iconsize, fullpath, iconpath)) if os.path.exists(iconpath): return iconpath else: return None def get_dictionary_from_string(dict_string): """Given a string version of a "dictionary", split the string into a python dictionary. For example, given the following string: {'TITLE' : 'EX_TITLE', 'AUTHOR' : 'EX_AUTHOR', 'REPORTNUMBER' : 'EX_RN'} A dictionary in the following format will be returned: { 'TITLE' : 'EX_TITLE', 'AUTHOR' : 'EX_AUTHOR', 'REPORTNUMBER' : 'EX_RN', } @param dict_string: (string) - the string version of the dictionary. @return: (dictionary) - the dictionary build from the string. """ ## First, strip off the leading and trailing spaces and braces: dict_string = dict_string.strip(" {}") ## Next, split the string on commas (,) that have not been escaped ## So, the following string: """'hello' : 'world', 'click' : 'here'""" will be split ## into the following list: ["'hello' : 'world'", " 'click' : 'here'"] ## ## However, The following string: """'hello\, world' : '!', 'click' : 'here'""" ## will be split into: ["'hello\, world' : '!'", " 'click' : 'here'"] ## I.e. the comma that was escaped in the string has been kept. ## ## So basically, split on unescaped parameters at first: key_vals = re.split(r'(? %s]. The " \ "filename, however, was not considered valid. " \ "Please report this to the administrator." \ % (varname, varvalue) raise InvenioWebSubmitFunctionError(err_msg) ## Put the 'fixed' values into the file_stamper_options dictionary: file_stamper_options['latex-template'] = latex_template file_stamper_options['latex-template-var'] = latex_template_vars file_stamper_options['stamp'] = stamp ## Put the input file and output file into the file_stamper_options ## dictionary: file_stamper_options['input-file'] = bibdocfile_file_to_stamp.fullpath file_stamper_options['output-file'] = bibdocfile_file_to_stamp.fullname ## ## Before attempting to stamp the file, log the dictionary of arguments ## that will be passed to websubmit_file_stamper: try: fh_log = open("%s/websubmit_file_stamper-calls-options.log" \ % curdir, "a+") fh_log.write("%s\n" % file_stamper_options) fh_log.flush() fh_log.close() except IOError: ## Unable to log the file stamper options. exception_prefix = "Unable to write websubmit_file_stamper " \ "options to log file " \ "%s/websubmit_file_stamper-calls-options.log" \ % curdir register_exception(prefix=exception_prefix) try: ## Try to stamp the file: (stamped_file_path_only, stamped_file_name) = \ websubmit_file_stamper.stamp_file(file_stamper_options) except InvenioWebSubmitFileStamperError: ## It wasn't possible to stamp this file. ## Register the exception along with an informational message: wrn_msg = "Warning in Stamp_Replace_Single_File_Approval: " \ "There was a problem stamping the file with the name [%s] " \ "and the fullpath [%s]. The file has not been stamped. " \ "The submission ID is [%s] and the record ID is [%s]." \ % (name_file_to_stamp, \ file_stamper_options['input-file'], \ access, \ recid) register_exception(prefix=wrn_msg) raise InvenioWebSubmitFunctionWarning(wrn_msg) else: ## Stamping was successful. The BibDocFile must now be revised with ## the latest (stamped) version of the file: file_comment = "Stamped by WebSubmit: %s" \ % time.strftime("%d/%m/%Y", time.localtime()) try: dummy = \ bibrecdocs.add_new_version("%s/%s" \ % (stamped_file_path_only, \ stamped_file_name), \ name_file_to_stamp, \ - comment=file_comment) + comment=file_comment, \ + flags=('STAMPED', )) except InvenioWebSubmitFileError: ## Unable to revise the file with the newly stamped version. wrn_msg = "Warning in Stamp_Replace_Single_File_Approval: " \ "After having stamped the file with the name [%s] " \ "and the fullpath [%s], it wasn't possible to revise " \ "that file with the newly stamped version. Stamping " \ "was unsuccessful. The submission ID is [%s] and the " \ "record ID is [%s]." \ % (name_file_to_stamp, \ file_stamper_options['input-file'], \ access, \ recid) register_exception(prefix=wrn_msg) raise InvenioWebSubmitFunctionWarning(wrn_msg) else: ## File revised. If the file should be renamed after stamping, ## do so. if new_file_name != "": try: bibdoc_file_to_stamp.change_name(new_file_name) except (IOError, InvenioWebSubmitFileError): ## Unable to change the name wrn_msg = "Warning in Stamp_Replace_Single_File_Approval" \ ": After having stamped and revised the file " \ "with the name [%s] and the fullpath [%s], it " \ "wasn't possible to rename it to [%s]. The " \ "submission ID is [%s] and the record ID is " \ "[%s]." \ % (name_file_to_stamp, \ file_stamper_options['input-file'], \ new_file_name, \ access, \ recid) ## Finished. return "" def get_dictionary_from_string(dict_string): """Given a string version of a "dictionary", split the string into a python dictionary. For example, given the following string: {'TITLE' : 'EX_TITLE', 'AUTHOR' : 'EX_AUTHOR', 'REPORTNUMBER' : 'EX_RN'} A dictionary in the following format will be returned: { 'TITLE' : 'EX_TITLE', 'AUTHOR' : 'EX_AUTHOR', 'REPORTNUMBER' : 'EX_RN', } @param dict_string: (string) - the string version of the dictionary. @return: (dictionary) - the dictionary build from the string. """ ## First, strip off the leading and trailing spaces and braces: dict_string = dict_string.strip(" {}") ## Next, split the string on commas (,) that have not been escaped ## So, the following string: """'hello' : 'world', 'click' : 'here'""" ## will be split into the following list: ## ["'hello' : 'world'", " 'click' : 'here'"] ## ## However, the string """'hello\, world' : '!', 'click' : 'here'""" ## will be split into: ["'hello\, world' : '!'", " 'click' : 'here'"] ## I.e. the comma that was escaped in the string has been kept. ## ## So basically, split on unescaped parameters at first: key_vals = re.split(r'(?alert("your file was too small (<%s o) and was deleted");""" % minsize - elif os.path.getsize(fullpath) > int(maxsize): - os.unlink("%s/myfile" % curdir) - os.unlink(fullpath) - t+= """""" % maxsize - else: - bibdoc = None - if fileAction == "AddMain": - if not bibrecdocs.check_file_exists(fullpath): - bibdoc = bibrecdocs.add_new_file(fullpath, "Main", never_fail=True) - if fileAction == "AddAdditional": - if not bibrecdocs.check_file_exists(fullpath): - bibdoc = bibrecdocs.add_new_file(fullpath, "Additional", never_fail=True) - if fileAction == "ReviseAdditional" and mybibdocname != "": - if not bibrecdocs.check_file_exists(fullpath): - bibdoc = bibrecdocs.add_new_version(fullpath, mybibdocname) - if fileAction == "AddAdditionalFormat" and mybibdocname != "": - bibdoc = bibrecdocs.add_new_format(fullpath, mybibdocname) - if type == "fulltext" and fileAction != "AddMainFormat" and fileAction != "AddAdditionalFormat": - additionalformats = createRelatedFormats(fullpath) - if len(additionalformats) > 0 and bibdoc is not None: - for additionalformat in additionalformats: - bibdoc.add_file_new_format(additionalformat) - if type == "picture" and fileAction != "AddMainFormat" and fileAction != "AddAdditionalFormat": - iconpath = createIcon(fullpath,iconsize) - if iconpath is not None and bibdoc is not None: - bibdoc.add_icon(iconpath) - os.unlink(iconpath) - elif bibdoc is not None: - bibdoc.delete_icon() - bibrecdocs.build_bibdoc_list() - os.unlink(fullpath) - os.unlink("%s/myfile" % curdir) - t+="
    " - t=t+Display_Form(bibrecdocs) - t=t+Display_File_List(bibrecdocs) - t=t+ "
    " - t+="
    " - return t - -def Display_File_List(bibrecdocs): - t="""

    """ - bibdocs = bibrecdocs.list_bibdocs() - if len(bibdocs) > 0: - types = list_types_from_array(bibdocs) - for mytype in types: - if len(bibrecdocs.list_bibdocs(mytype)) > 1: - plural = "s" - else: - plural = "" - t+="%s document%s:" % (mytype,plural) - for bibdoc in bibdocs: - if mytype == bibdoc.get_type(): - t+="
    " - t+="



    " % (bibdoc.get_docname(),bibdoc.get_docname(),CFG_SITE_URL) - t+="
    " - t+=bibdoc.display() - t+="
    " - t+="""
    """ - return t - -def Display_Form(bibrecdocs): - #output the upload files form. - t="" - t=t+""" -Don't forget to click on the \"End Submission\" button when you have finished managing the files.

    - - - -
    -Please complete the form below to upload a new file: -
    - - - - - - - - - - - - - - - -
    - 1 - - -
    - 2 - - -
    - 3 - - -
    -
    - """ - return t - diff --git a/modules/websubmit/lib/hocrlib.py b/modules/websubmit/lib/hocrlib.py new file mode 100644 index 000000000..6592340a7 --- /dev/null +++ b/modules/websubmit/lib/hocrlib.py @@ -0,0 +1,206 @@ +## This file is part of CDS Invenio. +## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. +## +## CDS Invenio is free software; you can redistribute it and/or +## modify it under the terms of the GNU General Public License as +## published by the Free Software Foundation; either version 2 of the +## License, or (at your option) any later version. +## +## CDS Invenio is distributed in the hope that it will be useful, but +## WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +## General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., +## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. + +"""hOCR parser and tools""" + +from htmlentitydefs import entitydefs +import HTMLParser +import re +import os.path +from logging import info +from reportlab.pdfgen.canvas import Canvas +from reportlab.lib.pagesizes import A4 +from reportlab.lib.colors import green, red + +_RE_PARSE_HOCR_BBOX = re.compile(r'\bbbox\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)') +_RE_CLEAN_SPACES = re.compile(r'\s+') +def extract_hocr(hocr_text): + """ + Parse hocr_text and return a structure suitable to be used by create_pdf. + """ + class HOCRReader(HTMLParser.HTMLParser): + def __init__(self): + HTMLParser.HTMLParser.__init__(self) + self.lines = [] + self.bbox = None + self.text = "" + self.image = '' + self.page_bbox = None + self.pages = [] + self.started = False + + def store_current_page(self): + if self.image: + self.store_current_line() + self.sort_current_lines() + self.pages.append((self.page_bbox, self.image, self.lines)) + self.page_bbox = None + self.image = '' + self.lines = [] + + def sort_current_lines(self): + def line_cmp(a, b): + y0_a = a[0][1] + y0_b = b[0][1] + return cmp(y0_b, y0_a) + + self.lines.sort(line_cmp) + + def store_current_line(self): + if self.bbox: + self.lines.append((self.bbox, _RE_CLEAN_SPACES.sub(' ', self.text).strip())) + self.bbox = None + self.text = "" + + def extract_hocr_properties(self, title): + properties = title.split(';') + ret = {} + for prop in properties: + prop = prop.strip() + key, value = prop.split(' ', 1) + key = key.strip().lower() + value = value.strip() + ret[key] = value + return ret + + def handle_starttag(self, tag, attrs): + attrs = dict(attrs) + if attrs.get('class') == 'ocr_line': + self.started = True + self.store_current_line() + properties = self.extract_hocr_properties(attrs.get('title', '')) + try: + self.bbox = tuple(map(lambda x: int(x), properties['bbox'].split(' ', 4))) + except: + ## If no bbox is retrievable, let's skip this line + pass + elif attrs.get('class') == 'ocr_page': + self.store_current_page() + properties = self.extract_hocr_properties(attrs.get('title', '')) + try: + self.page_bbox = tuple(map(lambda x: int(x), properties['bbox'].split(' ', 4))) + except: + ## If no bbox is retrievable, let's skip this line + pass + try: + self.image = os.path.abspath(properties['image']) + except: + pass + + def handle_entityref(self, name): + if self.started and name in entitydefs: + self.text += entitydefs[name].decode('latin1').encode('utf8') + + def handle_data(self, data): + if self.started and data.strip(): + self.text += data + + def handle_charref(self, data): + if self.started: + try: + self.text += unichr(int(data)).encode('utf8') + except: + pass + + def close(self): + HTMLParser.HTMLParser.close(self) + self.store_current_page() + + hocr_reader = HOCRReader() + hocr_reader.feed(hocr_text) + hocr_reader.close() + return hocr_reader.pages + +def create_pdf(hocr, filename, font="Courier", author=None, keywords=None, subject=None, title=None, image_path=None, draft=False): + """ transform hOCR information into a searchable PDF. + @param hocr the hocr structure as coming from extract_hocr. + @param filename the name of the PDF generated in output. + @param font the default font (e.g. Courier, Times-Roman). + @param author the author name. + @param subject the subject of the document. + @param title the title of the document. + @param image_path the default path where images are stored. If not specified + relative image paths will be resolved to the current directory. + @param draft whether to enable debug information in the output. + + """ + def adjust_image_size(width, height): + return max(width / A4[0], height / A4[1]) + + canvas = Canvas(filename) + + if author: + canvas.setAuthor(author) + + if keywords: + canvas.setKeywords(keywords) + + if title: + canvas.setTitle(title) + + if subject: + canvas.setSubject(subject) + + for bbox, image, lines in hocr: + if not image.startswith('/') and image_path: + image = os.path.abspath(os.path.join(image_path, image)) + img_width, img_height = bbox[2:] + ratio = adjust_image_size(img_width, img_height) + if draft: + canvas.drawImage(image, 0, A4[1] - img_height / ratio , img_width / ratio, img_height / ratio) + canvas.setFont(font, 12) + for bbox, line in lines: + if draft: + canvas.setFillColor(red) + x0, y0, x1, y1 = bbox + width = (x1 - x0) / ratio + height = ((y1 - y0) / ratio) + x0 = x0 / ratio + #for ch in 'gjpqy,(){}[];$@': + #if ch in line: + #y0 = A4[1] - (y0 / ratio) - height + #break + #else: + y0 = A4[1] - (y0 / ratio) - height / 1.3 + #canvas.setFontSize(height * 1.5) + canvas.setFontSize(height) + text_width = canvas.stringWidth(line) + if text_width: + ## If text_width != 0 + text_object = canvas.beginText(x0, y0) + text_object.setHorizScale(1.0 * width / text_width * 100) + text_object.textOut(line) + canvas.drawText(text_object) + else: + info('%s, %s has width 0' % (bbox, line)) + if draft: + canvas.setStrokeColor(green) + canvas.rect(x0, y0, width, height) + if draft: + canvas.circle(0, 0, 10, fill=1) + canvas.circle(0, A4[1], 10, fill=1) + canvas.circle(A4[0], 0, 10, fill=1) + canvas.circle(A4[0], A4[1], 10, fill=1) + canvas.setFillColor(green) + canvas.setStrokeColor(green) + canvas.circle(0, A4[1] - img_height / ratio, 5, fill=1) + canvas.circle(img_width / ratio, img_height /ratio, 5, fill=1) + else: + canvas.drawImage(image, 0, A4[1] - img_height / ratio , img_width / ratio, img_height / ratio) + + canvas.save() + diff --git a/modules/websubmit/lib/fulltext_files_migration_kit.py b/modules/websubmit/lib/icon_migration_kit.py similarity index 52% copy from modules/websubmit/lib/fulltext_files_migration_kit.py copy to modules/websubmit/lib/icon_migration_kit.py index db5484f37..3b0b97c8d 100644 --- a/modules/websubmit/lib/fulltext_files_migration_kit.py +++ b/modules/websubmit/lib/icon_migration_kit.py @@ -1,142 +1,148 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. -__revision__ = "$Id$" +""" +This script updates the filesystem and database structure WRT icons. -"""This script updates the filesystem structure of fulltext files in order -to make it coherent with bibdocfile implementation (bibdocfile.py structure is backward -compatible with file.py structure, but the viceversa is not true). +In particular it will move all the icons information out of bibdoc_bibdoc +tables and into the normal bibdoc + subformat infrastructure. """ import sys -from invenio.intbitset import intbitset -from invenio.textutils import wrap_text_in_a_box -from invenio.config import CFG_LOGDIR, CFG_SITE_SUPPORT_EMAIL -from invenio.dbquery import run_sql, OperationalError -from invenio.bibdocfile import BibRecDocs, InvenioWebSubmitFileError from datetime import datetime -def retrieve_fulltext_recids(): - """Returns the list of all the recid number linked with at least a fulltext - file.""" - res = run_sql('SELECT DISTINCT id_bibrec FROM bibrec_bibdoc') - return intbitset(res) +from invenio.textutils import wrap_text_in_a_box, wait_for_user +from invenio.bibtask import check_running_process_user +from invenio.dbquery import run_sql, OperationalError +from invenio.bibdocfile import BibDoc, BibRecDocs +from invenio.config import CFG_LOGDIR, CFG_SITE_SUPPORT_EMAIL +from invenio.bibdocfilecli import cli_fix_marc +from invenio.errorlib import register_exception +from invenio.intbitset import intbitset -def fix_recid(recid, logfile): - """Fix a given recid.""" - print "Upgrading record %s ->" % recid, - print >> logfile, "Upgrading record %s:" % recid +def retrieve_bibdoc_bibdoc(): + return run_sql('SELECT id_bibdoc1, id_bibdoc2 from bibdoc_bibdoc') - bibrec = BibRecDocs(recid) - print >> logfile, bibrec - docnames = bibrec.get_bibdoc_names() - try: - for docname in docnames: - print docname, - new_bibdocs = bibrec.fix(docname) - new_bibdocnames = [bibdoc.get_docname() for bibdoc in new_bibdocs] - if new_bibdocnames: - print "(created bibdocs: '%s')" % "', '".join(new_bibdocnames), - print >> logfile, "(created bibdocs: '%s')" % "', '".join(new_bibdocnames) - except InvenioWebSubmitFileError, e: - print >> logfile, BibRecDocs(recid) - print "%s -> ERROR", e - return False - else: - print >> logfile, BibRecDocs(recid) - print "-> OK" - return True +def get_recid_from_docid(docid): + return run_sql('SELECT id_bibrec FROM bibrec_bibdoc WHERE id_bibdoc=%s', (docid, )) def backup_tables(drop=False): """This function create a backup of bibrec_bibdoc, bibdoc and bibdoc_bibdoc tables. Returns False in case dropping of previous table is needed.""" if drop: - run_sql('DROP TABLE bibrec_bibdoc_backup') - run_sql('DROP TABLE bibdoc_backup') - run_sql('DROP TABLE bibdoc_bibdoc_backup') + run_sql('DROP TABLE bibdoc_bibdoc_backup_for_icon') try: - run_sql("""CREATE TABLE bibrec_bibdoc_backup (KEY id_bibrec(id_bibrec), - KEY id_bibdoc(id_bibdoc)) SELECT * FROM bibrec_bibdoc""") - run_sql("""CREATE TABLE bibdoc_backup (PRIMARY KEY id(id)) - SELECT * FROM bibdoc""") - run_sql("""CREATE TABLE bibdoc_bibdoc_backup (KEY id_bibdoc1(id_bibdoc1), + run_sql("""CREATE TABLE bibdoc_bibdoc_backup_for_icon (KEY id_bibdoc1(id_bibdoc1), KEY id_bibdoc2(id_bibdoc2)) SELECT * FROM bibdoc_bibdoc""") except OperationalError, e: if not drop: return False raise e return True -def check_yes(): - """Return True if the user types 'yes'.""" +def fix_bibdoc_bibdoc(id_bibdoc1, id_bibdoc2, logfile): + """ + Migrate an icon. + """ + + try: + the_bibdoc = BibDoc(id_bibdoc1) + except Exception, err: + msg = "WARNING: when opening docid %s: %s" % (id_bibdoc1, err) + print >> logfile, msg + print msg + return True try: - return raw_input().strip() == 'yes' - except KeyboardInterrupt: + recid = the_bibdoc.get_recid() + msg = "Fixing icon for recid %s: document %s (docid %s)" % (recid, the_bibdoc.get_docname(), id_bibdoc1) + print msg, + print >> logfile, msg, + the_icon = BibDoc(id_bibdoc2) + for a_file in the_icon.list_latest_files(): + the_bibdoc.add_icon(a_file.get_full_path(), format=a_file.get_format()) + the_icon.delete() + run_sql("DELETE FROM bibdoc_bibdoc WHERE id_bibdoc1=%s AND id_bibdoc2=%s", (id_bibdoc1, id_bibdoc2)) + print "OK" + print >> logfile, "OK" + return True + except Exception, err: + print "ERROR: %s" % err + print >> logfile, "ERROR: %s" % err + register_exception() return False def main(): """Core loop.""" + check_running_process_user() logfilename = '%s/fulltext_files_migration_kit-%s.log' % (CFG_LOGDIR, datetime.today().strftime('%Y%m%d%H%M%S')) try: logfile = open(logfilename, 'w') except IOError, e: print wrap_text_in_a_box('NOTE: it\'s impossible to create the log:\n\n %s\n\nbecause of:\n\n %s\n\nPlease run this migration kit as the same user who runs Invenio (e.g. Apache)' % (logfilename, e), style='conclusion', break_long=False) sys.exit(1) - recids = retrieve_fulltext_recids() - print wrap_text_in_a_box ("""This script migrate the filesystem structure used to store fulltext files to the new stricter structure. + bibdoc_bibdoc = retrieve_bibdoc_bibdoc() + + print wrap_text_in_a_box ("""This script migrate the filesystem structure used to store icons files to the new stricter structure. This script must not be run during normal Invenio operations. It is safe to run this script. No file will be deleted. Anyway it is recommended to run a backup of the filesystem structure just in case. A backup of the database tables involved will be automatically performed.""", style='important') - print "%s records will be migrated/fixed." % len(recids) - print "Please type yes if you want to go further:", - - if not check_yes(): - print "INTERRUPTED" - sys.exit(1) + if not bibdoc_bibdoc: + print wrap_text_in_a_box("No need for migration", style='conclusion') + return + print "%s icons will be migrated/fixed." % len(bibdoc_bibdoc) + wait_for_user() print "Backing up database tables" try: if not backup_tables(): print wrap_text_in_a_box("""It appears that is not the first time that you run this script. Backup tables have been already created by a previous run. In order for the script to go further they need to be removed.""", style='important') - print "Please, type yes if you agree to remove them and go further:", - - if not check_yes(): - print wrap_text_in_a_box("INTERRUPTED", style='conclusion') - sys.exit(1) + wait_for_user() print "Backing up database tables (after dropping previous backup)", - backup_tables() + backup_tables(drop=True) print "-> OK" else: print "-> OK" except Exception, e: print wrap_text_in_a_box("Unexpected error while backing up tables. Please, do your checks: %s" % e, style='conclusion') sys.exit(1) + to_fix_marc = intbitset() print "Created a complete log file into %s" % logfilename - for recid in recids: - if not fix_recid(recid, logfile): + try: + try: + for id_bibdoc1, id_bibdoc2 in bibdoc_bibdoc: + for recid in get_recid_from_docid(id_bibdoc1): + to_fix_marc.add(recid[0]) + if not fix_bibdoc_bibdoc(id_bibdoc1, id_bibdoc2, logfile): + raise StandardError("Error when correcting document ID %s" % id_bibdoc1) + print wrap_text_in_a_box("DONE", style='conclusion') + except: logfile.close() - print wrap_text_in_a_box(title="INTERRUPTED BECAUSE OF ERROR!", body="""Please see the log file %s for what was the status of record %s prior to the error. Contact %s in case of problems, attaching the log.""" % (logfilename, recid, CFG_SITE_SUPPORT_EMAIL), + register_exception() + print wrap_text_in_a_box(title="INTERRUPTED BECAUSE OF ERROR!", body="""Please see the log file %s for what was the status of record %s prior to the error. Contact %s in case of problems, attaching the log.""" % (logfilename, BibDoc(id_bibdoc1).get_recid(), CFG_SITE_SUPPORT_EMAIL), style='conclusion') sys.exit(1) - print wrap_text_in_a_box("DONE", style='conclusion') + finally: + print "Scheduling FIX-MARC to synchronize MARCXML for updated records." + cli_fix_marc(options={}, explicit_recid_set=to_fix_marc) + if __name__ == '__main__': main() diff --git a/modules/websubmit/lib/unoconv.py b/modules/websubmit/lib/unoconv.py new file mode 100755 index 000000000..e0d6db69b --- /dev/null +++ b/modules/websubmit/lib/unoconv.py @@ -0,0 +1,686 @@ +# -*- coding: utf-8 -*- +### This program is free software; you can redistribute it and/or modify +### it under the terms of the GNU General Public License as published by +### the Free Software Foundation; version 2 only +### +### This program is distributed in the hope that it will be useful, +### but WITHOUT ANY WARRANTY; without even the implied warranty of +### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +### GNU General Public License for more details. +### +### You should have received a copy of the GNU General Public License +### along with this program; if not, write to the Free Software +### Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +### Copyright 2007-2008 Dag Wieers + +import getopt, sys, os, glob, time + +global unopath + +### The first thing we ought to do is find a suitable OpenOffice installation +### with a compatible pyuno library that we can import. +### BEG Invenio customizations +#extrapaths = glob.glob('/usr/lib*/openoffice*/program') + \ + #glob.glob('/usr/lib*/ooo*/program') + \ + #glob.glob('/opt/openoffice*/program') + \ + #glob.glob('C:\\Program Files\\OpenOffice.org *\\program\\') + \ + #[ '/Applications/NeoOffice.app/Contents/program', '/usr/bin' ] +#for unopath in extrapaths: + #if os.path.exists(os.path.join(unopath, "pyuno.so")): + #filename = "pyuno.so" + #elif os.path.exists(os.path.join(unopath, "pyuno.dll")): + #filename = "pyuno.dll" + #else: + #continue + #sys.path.append(unopath) + #try: + #import uno, unohelper + #break + #except ImportError, e: + #sys.path.remove(unopath) + #print >>sys.stderr, e + #print >>sys.stderr, "WARNING: We found %s in %s, but could not import it." % (filename, unopath) + #continue +#else: + #print >>sys.stderr, "unoconv: Cannot find the pyuno.so library in sys.path and known paths." + #print >>sys.stderr, "ERROR: Please locate this library and send your feedback to: ." + #sys.exit(1) + +#### Export an environment that OpenOffice is pleased to work with +#os.environ['LD_LIBRARY_PATH'] = '%s' % unopath +#os.environ['PATH'] = '%s:' % unopath + os.environ['PATH'] + +import uno, unohelper +### END Invenio customizations + + +### Now that we have found a working pyuno library, let's import some classes +from com.sun.star.beans import PropertyValue +from com.sun.star.connection import NoConnectException +from com.sun.star.lang import DisposedException +from com.sun.star.io import IOException, XOutputStream +from com.sun.star.script import CannotConvertException +from com.sun.star.uno import Exception as UnoException + +__version__ = "$Revision$" +# $Source$ + +VERSION = '0.3svn' + +doctypes = ('document', 'graphics', 'presentation', 'spreadsheet') + +oopid = None +exitcode = 0 + +class Fmt: + def __init__(self, doctype, name, extension, summary, filter): + self.doctype = doctype + self.name = name + self.extension = extension + self.summary = summary + self.filter = filter + + def __str__(self): + return "%s [.%s]" % (self.summary, self.extension) + + def __repr__(self): + return "%s/%s" % (self.name, self.doctype) + +class FmtList: + def __init__(self): + self.list = [] + + def add(self, doctype, name, extension, summary, filter): + self.list.append(Fmt(doctype, name, extension, summary, filter)) + + def byname(self, name): + ret = [] + for fmt in self.list: + if fmt.name == name: + ret.append(fmt) + return ret + + def byextension(self, extension): + ret = [] + for fmt in self.list: + if '.'+fmt.extension == extension: + ret.append(fmt) + return ret + + def bydoctype(self, doctype, name): + ret = [] + for fmt in self.list: + if fmt.name == name and fmt.doctype == doctype: + ret.append(fmt) + return ret + + def display(self, doctype): + print >>sys.stderr, "The following list of %s formats are currently available:\n" % doctype + for fmt in self.list: + if fmt.doctype == doctype: + print >>sys.stderr, " %-8s - %s" % (fmt.name, fmt) + print >>sys.stderr + +class OutputStream( unohelper.Base, XOutputStream ): + def __init__( self ): + self.closed = 0 + + def closeOutput(self): + self.closed = 1 + + def writeBytes( self, seq ): + sys.stdout.write( seq.value ) + + def flush( self ): + pass + +fmts = FmtList() + +### Document / Writer +fmts.add('document', 'bib', 'bib', 'BibTeX', 'BibTeX_Writer') +fmts.add('document', 'doc', 'doc', 'Microsoft Word 97/2000/XP', 'MS Word 97') +fmts.add('document', 'doc6', 'doc', 'Microsoft Word 6.0', 'MS WinWord 6.0') +fmts.add('document', 'doc95', 'doc', 'Microsoft Word 95', 'MS Word 95') +fmts.add('document', 'docbook', 'xml', 'DocBook', 'DocBook File') +fmts.add('document', 'html', 'html', 'HTML Document (OpenOffice.org Writer)', 'HTML (StarWriter)') +fmts.add('document', 'odt', 'odt', 'Open Document Text', 'writer8') +fmts.add('document', 'ott', 'ott', 'Open Document Text', 'writer8_template') +fmts.add('document', 'ooxml', 'xml', 'Microsoft Office Open XML', 'MS Word 2003 XML') +fmts.add('document', 'pdb', 'pdb', 'AportisDoc (Palm)', 'AportisDoc Palm DB') +fmts.add('document', 'pdf', 'pdf', 'Portable Document Format', 'writer_pdf_Export') +fmts.add('document', 'psw', 'psw', 'Pocket Word', 'PocketWord File') +fmts.add('document', 'rtf', 'rtf', 'Rich Text Format', 'Rich Text Format') +fmts.add('document', 'latex', 'ltx', 'LaTeX 2e', 'LaTeX_Writer') +fmts.add('document', 'sdw', 'sdw', 'StarWriter 5.0', 'StarWriter 5.0') +fmts.add('document', 'sdw4', 'sdw', 'StarWriter 4.0', 'StarWriter 4.0') +fmts.add('document', 'sdw3', 'sdw', 'StarWriter 3.0', 'StarWriter 3.0') +fmts.add('document', 'stw', 'stw', 'Open Office.org 1.0 Text Document Template', 'writer_StarOffice_XML_Writer_Template') +fmts.add('document', 'sxw', 'sxw', 'Open Office.org 1.0 Text Document', 'StarOffice XML (Writer)') +fmts.add('document', 'text', 'txt', 'Text Encoded', 'Text (encoded)') +fmts.add('document', 'mediawiki', 'txt', 'Mediawiki', 'Mediawiki') +fmts.add('document', 'txt', 'txt', 'Plain Text', 'Text') +fmts.add('document', 'vor', 'vor', 'StarWriter 5.0 Template', 'StarWriter 5.0 Vorlage/Template') +fmts.add('document', 'vor4', 'vor', 'StarWriter 4.0 Template', 'StarWriter 4.0 Vorlage/Template') +fmts.add('document', 'vor3', 'vor', 'StarWriter 3.0 Template', 'StarWriter 3.0 Vorlage/Template') +fmts.add('document', 'xhtml', 'html', 'XHTML Document', 'XHTML Writer File') + +### Spreadsheet +fmts.add('spreadsheet', 'csv', 'csv', 'Text CSV', 'Text - txt - csv (StarCalc)') +fmts.add('spreadsheet', 'dbf', 'dbf', 'dBase', 'dBase') +fmts.add('spreadsheet', 'dif', 'dif', 'Data Interchange Format', 'DIF') +fmts.add('spreadsheet', 'html', 'html', 'HTML Document (OpenOffice.org Calc)', 'HTML (StarCalc)') +fmts.add('spreadsheet', 'ods', 'ods', 'Open Document Spreadsheet', 'calc8') +fmts.add('spreadsheet', 'ooxml', 'xml', 'Microsoft Excel 2003 XML', 'MS Excel 2003 XML') +fmts.add('spreadsheet', 'pdf', 'pdf', 'Portable Document Format', 'calc_pdf_Export') +fmts.add('spreadsheet', 'pts', 'pts', 'OpenDocument Spreadsheet Template', 'calc8_template') +fmts.add('spreadsheet', 'pxl', 'pxl', 'Pocket Excel', 'Pocket Excel') +fmts.add('spreadsheet', 'sdc', 'sdc', 'StarCalc 5.0', 'StarCalc 5.0') +fmts.add('spreadsheet', 'sdc4', 'sdc', 'StarCalc 4.0', 'StarCalc 4.0') +fmts.add('spreadsheet', 'sdc3', 'sdc', 'StarCalc 3.0', 'StarCalc 3.0') +fmts.add('spreadsheet', 'slk', 'slk', 'SYLK', 'SYLK') +fmts.add('spreadsheet', 'stc', 'stc', 'OpenOffice.org 1.0 Spreadsheet Template', 'calc_StarOffice_XML_Calc_Template') +fmts.add('spreadsheet', 'sxc', 'sxc', 'OpenOffice.org 1.0 Spreadsheet', 'StarOffice XML (Calc)') +fmts.add('spreadsheet', 'vor3', 'vor', 'StarCalc 3.0 Template', 'StarCalc 3.0 Vorlage/Template') +fmts.add('spreadsheet', 'vor4', 'vor', 'StarCalc 4.0 Template', 'StarCalc 4.0 Vorlage/Template') +fmts.add('spreadsheet', 'vor', 'vor', 'StarCalc 5.0 Template', 'StarCalc 5.0 Vorlage/Template') +fmts.add('spreadsheet', 'xhtml', 'xhtml', 'XHTML', 'XHTML Calc File') +fmts.add('spreadsheet', 'xls', 'xls', 'Microsoft Excel 97/2000/XP', 'MS Excel 97') +fmts.add('spreadsheet', 'xls5', 'xls', 'Microsoft Excel 5.0', 'MS Excel 5.0/95') +fmts.add('spreadsheet', 'xls95', 'xls', 'Microsoft Excel 95', 'MS Excel 95') +fmts.add('spreadsheet', 'xlt', 'xlt', 'Microsoft Excel 97/2000/XP Template', 'MS Excel 97 Vorlage/Template') +fmts.add('spreadsheet', 'xlt5', 'xlt', 'Microsoft Excel 5.0 Template', 'MS Excel 5.0/95 Vorlage/Template') +fmts.add('spreadsheet', 'xlt95', 'xlt', 'Microsoft Excel 95 Template', 'MS Excel 95 Vorlage/Template') + +### Graphics +fmts.add('graphics', 'bmp', 'bmp', 'Windows Bitmap', 'draw_bmp_Export') +fmts.add('graphics', 'emf', 'emf', 'Enhanced Metafile', 'draw_emf_Export') +fmts.add('graphics', 'eps', 'eps', 'Encapsulated PostScript', 'draw_eps_Export') +fmts.add('graphics', 'gif', 'gif', 'Graphics Interchange Format', 'draw_gif_Export') +fmts.add('graphics', 'html', 'html', 'HTML Document (OpenOffice.org Draw)', 'draw_html_Export') +fmts.add('graphics', 'jpg', 'jpg', 'Joint Photographic Experts Group', 'draw_jpg_Export') +fmts.add('graphics', 'met', 'met', 'OS/2 Metafile', 'draw_met_Export') +fmts.add('graphics', 'odd', 'odd', 'OpenDocument Drawing', 'draw8') +fmts.add('graphics', 'otg', 'otg', 'OpenDocument Drawing Template', 'draw8_template') +fmts.add('graphics', 'pbm', 'pbm', 'Portable Bitmap', 'draw_pbm_Export') +fmts.add('graphics', 'pct', 'pct', 'Mac Pict', 'draw_pct_Export') +fmts.add('graphics', 'pdf', 'pdf', 'Portable Document Format', 'draw_pdf_Export') +fmts.add('graphics', 'pgm', 'pgm', 'Portable Graymap', 'draw_pgm_Export') +fmts.add('graphics', 'png', 'png', 'Portable Network Graphic', 'draw_png_Export') +fmts.add('graphics', 'ppm', 'ppm', 'Portable Pixelmap', 'draw_ppm_Export') +fmts.add('graphics', 'ras', 'ras', 'Sun Raster Image', 'draw_ras_Export') +fmts.add('graphics', 'std', 'std', 'OpenOffice.org 1.0 Drawing Template', 'draw_StarOffice_XML_Draw_Template') +fmts.add('graphics', 'svg', 'svg', 'Scalable Vector Graphics', 'draw_svg_Export') +fmts.add('graphics', 'svm', 'svm', 'StarView Metafile', 'draw_svm_Export') +fmts.add('graphics', 'swf', 'swf', 'Macromedia Flash (SWF)', 'draw_flash_Export') +fmts.add('graphics', 'sxd', 'sxd', 'OpenOffice.org 1.0 Drawing', 'StarOffice XML (Draw)') +fmts.add('graphics', 'sxd3', 'sxd', 'StarDraw 3.0', 'StarDraw 3.0') +fmts.add('graphics', 'sxd5', 'sxd', 'StarDraw 5.0', 'StarDraw 5.0') +fmts.add('graphics', 'tiff', 'tiff', 'Tagged Image File Format', 'draw_tif_Export') +fmts.add('graphics', 'vor', 'vor', 'StarDraw 5.0 Template', 'StarDraw 5.0 Vorlage') +fmts.add('graphics', 'vor3', 'vor', 'StarDraw 3.0 Template', 'StarDraw 3.0 Vorlage') +fmts.add('graphics', 'wmf', 'wmf', 'Windows Metafile', 'draw_wmf_Export') +fmts.add('graphics', 'xhtml', 'xhtml', 'XHTML', 'XHTML Draw File') +fmts.add('graphics', 'xpm', 'xpm', 'X PixMap', 'draw_xpm_Export') + +### Presentation +fmts.add('presentation', 'bmp', 'bmp', 'Windows Bitmap', 'impress_bmp_Export') +fmts.add('presentation', 'emf', 'emf', 'Enhanced Metafile', 'impress_emf_Export') +fmts.add('presentation', 'eps', 'eps', 'Encapsulated PostScript', 'impress_eps_Export') +fmts.add('presentation', 'gif', 'gif', 'Graphics Interchange Format', 'impress_gif_Export') +fmts.add('presentation', 'html', 'html', 'HTML Document (OpenOffice.org Impress)', 'impress_html_Export') +fmts.add('presentation', 'jpg', 'jpg', 'Joint Photographic Experts Group', 'impress_jpg_Export') +fmts.add('presentation', 'met', 'met', 'OS/2 Metafile', 'impress_met_Export') +fmts.add('presentation', 'odd', 'odd', 'OpenDocument Drawing (Impress)', 'impress8_draw') +fmts.add('presentation', 'odg', 'odg', 'OpenOffice.org 1.0 Drawing (OpenOffice.org Impress)', 'impress_StarOffice_XML_Draw') +fmts.add('presentation', 'odp', 'odp', 'OpenDocument Presentation', 'impress8') +fmts.add('presentation', 'pbm', 'pbm', 'Portable Bitmap', 'impress_pbm_Export') +fmts.add('presentation', 'pct', 'pct', 'Mac Pict', 'impress_pct_Export') +fmts.add('presentation', 'pdf', 'pdf', 'Portable Document Format', 'impress_pdf_Export') +fmts.add('presentation', 'pgm', 'pgm', 'Portable Graymap', 'impress_pgm_Export') +fmts.add('presentation', 'png', 'png', 'Portable Network Graphic', 'impress_png_Export') +fmts.add('presentation', 'pot', 'pot', 'Microsoft PowerPoint 97/2000/XP Template', 'MS PowerPoint 97 Vorlage') +fmts.add('presentation', 'ppm', 'ppm', 'Portable Pixelmap', 'impress_ppm_Export') +fmts.add('presentation', 'ppt', 'ppt', 'Microsoft PowerPoint 97/2000/XP', 'MS PowerPoint 97') +fmts.add('presentation', 'pwp', 'pwp', 'PlaceWare', 'placeware_Export') +fmts.add('presentation', 'ras', 'ras', 'Sun Raster Image', 'impress_ras_Export') +fmts.add('presentation', 'sda', 'sda', 'StarDraw 5.0 (OpenOffice.org Impress)', 'StarDraw 5.0 (StarImpress)') +fmts.add('presentation', 'sdd', 'sdd', 'StarImpress 5.0', 'StarImpress 5.0') +fmts.add('presentation', 'sdd3', 'sdd', 'StarDraw 3.0 (OpenOffice.org Impress)', 'StarDraw 3.0 (StarImpress)') +fmts.add('presentation', 'sdd4', 'sdd', 'StarImpress 4.0', 'StarImpress 4.0') +fmts.add('presentation', 'sti', 'sti', 'OpenOffice.org 1.0 Presentation Template', 'impress_StarOffice_XML_Impress_Template') +fmts.add('presentation', 'stp', 'stp', 'OpenDocument Presentation Template', 'impress8_template') +fmts.add('presentation', 'svg', 'svg', 'Scalable Vector Graphics', 'impress_svg_Export') +fmts.add('presentation', 'svm', 'svm', 'StarView Metafile', 'impress_svm_Export') +fmts.add('presentation', 'swf', 'swf', 'Macromedia Flash (SWF)', 'impress_flash_Export') +fmts.add('presentation', 'sxi', 'sxi', 'OpenOffice.org 1.0 Presentation', 'StarOffice XML (Impress)') +fmts.add('presentation', 'tiff', 'tiff', 'Tagged Image File Format', 'impress_tif_Export') +fmts.add('presentation', 'vor', 'vor', 'StarImpress 5.0 Template', 'StarImpress 5.0 Vorlage') +fmts.add('presentation', 'vor3', 'vor', 'StarDraw 3.0 Template (OpenOffice.org Impress)', 'StarDraw 3.0 Vorlage (StarImpress)') +fmts.add('presentation', 'vor4', 'vor', 'StarImpress 4.0 Template', 'StarImpress 4.0 Vorlage') +fmts.add('presentation', 'vor5', 'vor', 'StarDraw 5.0 Template (OpenOffice.org Impress)', 'StarDraw 5.0 Vorlage (StarImpress)') +fmts.add('presentation', 'wmf', 'wmf', 'Windows Metafile', 'impress_wmf_Export') +fmts.add('presentation', 'xhtml', 'xml', 'XHTML', 'XHTML Impress File') +fmts.add('presentation', 'xpm', 'xpm', 'X PixMap', 'impress_xpm_Export') + +class Options: + def __init__(self, args): + self.stdout = False + self.showlist = False + self.listener = False + self.format = None + self.verbose = 0 + self.timeout = 3 + self.doctype = None + self.server = 'localhost' + self.port = '2002' + self.connection = None + self.filenames = [] + self.pipe = None + self.outputpath = None + self.outputfile = None ### Invenio customizations + + + ### Get options from the commandline + try: + opts, args = getopt.getopt (args, 'c:d:f:hi:Llo:p:s:T:t:v', + ['connection=', 'doctype=', 'format=', 'help', 'listener', 'outputpath=', 'pipe=', 'port=', 'server=', 'timeout=', 'show', 'stdout', 'verbose', 'version', 'outputfile='] ) + except getopt.error, exc: + print 'unoconv: %s, try unoconv -h for a list of all the options' % str(exc) + sys.exit(255) + + for opt, arg in opts: + if opt in ['-h', '--help']: + self.usage() + print + self.help() + sys.exit(1) + elif opt in ['-c', '--connection']: + self.connection = arg + elif opt in ['-d', '--doctype']: + self.doctype = arg + elif opt in ['-f', '--format']: + self.format = arg + elif opt in ['-i', '--pipe']: + self.pipe = arg + elif opt in ['-l', '--listener']: + self.listener = True + elif opt in ['-o', '--outputpath']: + self.outputpath = arg + elif opt in ['--outputfile']: ### Invenio customizations + self.outputfile = arg ### Invenio customizations + elif opt in ['-p', '--port']: + self.port = arg + elif opt in ['-s', '--server']: + self.server = arg + elif opt in ['--show']: + self.showlist = True + elif opt in ['-T', '--timeout']: + self.timeout = int(arg) + elif opt in ['--stdout']: + self.stdout = True + elif opt in ['-v', '--verbose']: + self.verbose = self.verbose + 1 + elif opt in ['--version']: + self.version() + sys.exit(255) + + ### Enable verbosity + if self.verbose >= 3: + print >>sys.stderr, 'Verbosity set to level %d' % (self.verbose - 1) + + self.filenames = args + + if not self.listener and not self.showlist and self.doctype != 'list' and not self.filenames: + print >>sys.stderr, 'unoconv: you have to provide a filename as argument' + print >>sys.stderr, 'Try `unoconv -h\' for more information.' + sys.exit(255) + + ### Set connection string + if not self.connection: + if not self.pipe: + self.connection = "socket,host=%s,port=%s;urp;StarOffice.ComponentContext" % (self.server, self.port) +# self.connection = "socket,host=%s,port=%s;urp;" % (self.server, self.port) + else: + self.connection = "pipe,name=%s;urp;StarOffice.ComponentContext" % (self.pipe) + if self.verbose >=3: + print >>sys.stderr, 'Connection type: %s' % self.connection + + ### Make it easier for people to use a doctype (first letter is enough) + if self.doctype: + for doctype in doctypes: + if doctype.startswith(self.doctype): + self.doctype = doctype + + ### Check if the user request to see the list of formats + if self.showlist or self.format == 'list': + if self.doctype: + fmts.display(self.doctype) + else: + for t in doctypes: + fmts.display(t) + sys.exit(0) + + ### If no format was specified, probe it or provide it + if not self.format: + l = sys.argv[0].split('2') + if len(l) == 2: + self.format = l[1] + else: + self.format = 'pdf' + + def version(self): + print 'unoconv %s' % VERSION + print 'Written by Dag Wieers ' + print 'Homepage at http://dag.wieers.com/home-made/unoconv/' + print + print 'platform %s/%s' % (os.name, sys.platform) + print 'python %s' % sys.version + print + print 'build revision $Rev$' + + def usage(self): + print >>sys.stderr, 'usage: unoconv [options] file [file2 ..]' + + def help(self): + print >>sys.stderr, '''Convert from and to any format supported by OpenOffice + +unoconv options: + -c, --connection=string use a custom connection string + -d, --doctype=type specify document type + (document, graphics, presentation, spreadsheet) + -f, --format=format specify the output format + -i, --pipe=name alternative method of connection using a pipe + -l, --listener start a listener to use by unoconv clients + -o, --outputpath=name output directory + -p, --port=port specify the port (default: 2002) + to be used by client or listener + -s, --server=server specify the server address (default: localhost) + to be used by client or listener + -T, --timeout=secs timeout after secs if connections to OpenOffice fail + --show list the available output formats + --stdout write output to stdout + -v, --verbose be more and more verbose +''' + +class Convertor: + def __init__(self): + global exitcode, oopid + unocontext = None + + ### Do the OpenOffice component dance + self.context = uno.getComponentContext() + resolver = self.context.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", self.context) + + ### Test for an existing connection (twice) + try: + unocontext = resolver.resolve("uno:%s" % op.connection) + except NoConnectException, e: + error(2, "Existing listener not found.\n%s" % e) + + ### Test if we can use an Openoffice *binary* in our (modified) path + for bin in ('soffice.bin', 'soffice', ): + + error(2, "Trying to launch our own listener using %s." % bin) + try: + oopid = os.spawnvp(os.P_NOWAIT, bin, [bin, "-headless", "-nologo", "-nodefault", "-norestore", "-nofirststartwizard", "-accept=%s" % op.connection]); + except: + error(3, "Launch of %s failed.\n%s" % (bin, e)) + continue + + ### Try connection to it for op.timeout seconds + timeout = 0 + while timeout <= op.timeout: + try: + unocontext = resolver.resolve("uno:%s" % op.connection) + break + except NoConnectException: + time.sleep(0.5) + timeout = timeout + 0.5 + else: + error(3, "Failed to connect to %s in %d seconds.\n%s" % (bin, op.timeout, e)) + continue + break + else: + die(250, "No proper binaries found to launch OpenOffice. Bailing out.") + + if not unocontext: + die(251, "Unable to connect or start own listener. Aborting.") + + ### And some more OpenOffice magic + unosvcmgr = unocontext.ServiceManager + self.desktop = unosvcmgr.createInstanceWithContext("com.sun.star.frame.Desktop", unocontext) + self.cwd = unohelper.systemPathToFileUrl( os.getcwd() ) + + def getformat(self, inputfn): + doctype = None + + ### Get the output format from mapping + if op.doctype: + outputfmt = fmts.bydoctype(op.doctype, op.format) + else: + outputfmt = fmts.byname(op.format) + + if not outputfmt: + outputfmt = fmts.byextension('.'+op.format) + + ### If no doctype given, check list of acceptable formats for input file ext doctype + ### FIXME: This should go into the for-loop to match each individual input filename + if outputfmt: + inputext = os.path.splitext(inputfn)[1] + inputfmt = fmts.byextension(inputext) + if inputfmt: + for fmt in outputfmt: + if inputfmt[0].doctype == fmt.doctype: + doctype = inputfmt[0].doctype + outputfmt = fmt + break + else: + outputfmt = outputfmt[0] + # print >>sys.stderr, 'unoconv: format `%s\' is part of multiple doctypes %s, selecting `%s\'.' % (format, [fmt.doctype for fmt in outputfmt], outputfmt[0].doctype) + else: + outputfmt = outputfmt[0] + + ### No format found, throw error + if not outputfmt: + if doctype: + print >>sys.stderr, 'unoconv: format [%s/%s] is not known to unoconv.' % (op.doctype, op.format) + else: + print >>sys.stderr, 'unoconv: format [%s] is not known to unoconv.' % op.format + die(1) + + return outputfmt + + def convert(self, inputfn): + global exitcode + + doc = None + outputfmt = self.getformat(inputfn) + + if op.verbose > 0: + print >>sys.stderr, 'Input file:', inputfn + + if not os.path.exists(inputfn): + print >>sys.stderr, 'unoconv: file `%s\' does not exist.' % inputfn + exitcode = 1 + + try: + ### Load inputfile + inputprops = ( PropertyValue( "Hidden", 0, True, 0 ), ) + + inputurl = unohelper.absolutize(self.cwd, unohelper.systemPathToFileUrl(inputfn)) + doc = self.desktop.loadComponentFromURL( inputurl , "_blank", 0, inputprops ) + + if not doc: + raise UnoException("File could not be loaded by OpenOffice", None) + +# standard = doc.getStyleFamilies().getByName('PageStyles').getByName('Standard') +# pageSize = Size() +# pageSize.Width=1480 +# pageSize.Height=3354 +# standard.setPropertyValue('Size', pageSize) + + error(1, "Selected output format: %s" % outputfmt) + error(1, "Selected ooffice filter: %s" % outputfmt.filter) + error(1, "Used doctype: %s" % outputfmt.doctype) + +#### ### Write outputfile +#### outputprops = ( +#### PropertyValue( "FilterName", 0, outputfmt.filter, 0), +#### PropertyValue( "Overwrite", 0, True, 0 ), +##### PropertyValue( "Size", 0, "A3", 0 ), +#### PropertyValue( "OutputStream", 0, OutputStream(), 0 ), +#### ) + + ### BEG Invenio customizations + outputprops = [ + PropertyValue( "FilterName" , 0, outputfmt.filter , 0 ), + PropertyValue( "Overwrite" , 0, True , 0 ), + PropertyValue( "OutputStream", 0, OutputStream(), 0), + ] + if outputfmt.filter == 'Text (encoded)': + ## To enable UTF-8 + outputprops.append(PropertyValue( "FilterFlags", 0, "UTF8, LF", 0)) + elif outputfmt.filter == 'writer_pdf_Export': + ## To enable PDF/A + outputprops.append(PropertyValue( "SelectPdfVersion", 0, 1, 0)) + + outputprops = tuple(outputprops) + ### END Invenio customizations + + if not op.stdout: + (outputfn, ext) = os.path.splitext(inputfn) + ### BEG Invenio customizations + if op.outputfile: + outputfn = op.outputfile + elif not op.outputpath: ### END Invenio customizations + outputfn = outputfn + '.' + outputfmt.extension + else: + outputfn = os.path.join(op.outputpath, os.path.basename(outputfn) + '.' + outputfmt.extension) + outputurl = unohelper.absolutize( self.cwd, unohelper.systemPathToFileUrl(outputfn) ) + doc.storeToURL(outputurl, outputprops) + error(1, "Output file: %s" % outputfn) + else: + doc.storeToURL("private:stream", outputprops) + + doc.dispose() + doc.close(True) + + except SystemError, e: + error(0, "unoconv: SystemError during conversion: %s" % e) + error(0, "ERROR: The provided document cannot be converted to the desired format.") + exitcode = 1 + + except UnoException, e: + error(0, "unoconv: UnoException during conversion in %s: %s" % (repr(e.__class__), e.Message)) + error(0, "ERROR: The provided document cannot be converted to the desired format. (code: %s)" % e.ErrCode) + exitcode = e.ErrCode + + except IOException, e: + error(0, "unoconv: IOException during conversion: %s" % e.Message) + error(0, "ERROR: The provided document cannot be exported to %s." % outputfmt) + exitcode = 3 + + except CannotConvertException, e: + error(0, "unoconv: CannotConvertException during conversion: %s" % e.Message) + exitcode = 4 + +class Listener: + def __init__(self): + error(1, "Start listener on %s:%s" % (op.server, op.port)) + for bin in ('soffice.bin', 'soffice', ): + error(2, "Warning: trying to launch %s." % bin) + try: + os.execvp(bin, [bin, "-headless", "-nologo", "-nodefault", "-norestore", "-nofirststartwizard", "-accept=%s" % op.connection]); + except: + error(3, "Launch of %s failed.\n%s" % (bin, e)) + continue + else: + die(254, "Failed to start listener with connection %s" % (op.connection)) + die(253, "Existing listener found, aborting.") + +def error(level, str): + "Output error message" + if level <= op.verbose: + print >>sys.stderr, str + +def info(level, str): + "Output info message" + if not op.stdout and level <= op.verbose: + print >>sys.stdout, str + +def die(ret, str=None): + "Print error and exit with errorcode" + global convertor, oopid + + if str: + error(0, 'Error: %s' % str) + + ### Did we start an instance ? + if oopid: + + ### If there is a GUI now attached to the instance, disable listener + if convertor.desktop.getCurrentFrame(): + for bin in ('soffice.bin', 'soffice', ): + try: + os.spawnvp(os.P_NOWAIT, bin, [bin, "-headless", "-nologo", "-nodefault", "-norestore", "-nofirststartwizard", "-unaccept=%s" % op.connection]); + error(2, 'OpenOffice listener successfully disabled.') + break + except Exception, e: + error(3, "Launch of %s failed.\n%s" % (bin, e)) + continue + + ### If there is no GUI attached to the instance, terminate instance + else: + try: + convertor.desktop.terminate() + except DisposedException: + error(2, 'OpenOffice instance successfully terminated.') + +# error(2, 'Taking down OpenOffice with pid %s.' % oopid) +# os.setpgid(oopid, 0) +# os.killpg(os.getpgid(oopid), 15) +# try: +# os.kill(oopid, 15) +# error(2, 'Waiting for OpenOffice with pid %s to disappear.' % oopid) +# os.waitpid(oopid, os.WUNTRACED) +# except: +# error(2, 'No OpenOffice with pid %s to take down' % oopid) + sys.exit(ret) + +def main(): + global convertor, exitcode + + try: + if op.listener: + listener = Listener() + else: + convertor = Convertor() + + for inputfn in op.filenames: + convertor.convert(inputfn) + + except NoConnectException, e: + error(0, "unoconv: could not find an existing connection to Open Office at %s:%s." % (op.server, op.port)) + if op.connection: + error(0, "Please start an OpenOffice instance on server '%s' by doing:\n\n unoconv --listener --server %s --port %s\n\nor alternatively:\n\n ooffice -nologo -nodefault -accept=\"%s\"" % (op.server, op.server, op.port, op.connection)) + else: + error(0, "Please start an OpenOffice instance on server '%s' by doing:\n\n unoconv --listener --server %s --port %s\n\nor alternatively:\n\n ooffice -nologo -nodefault -accept=\"socket,host=%s,port=%s;urp;\"" % (op.server, op.server, op.port, op.server, op.port)) + error(0, "Please start an ooffice instance on server '%s' by doing:\n\n ooffice -nologo -nodefault -accept=\"socket,host=localhost,port=%s;urp;\"" % (op.server, op.port)) + exitcode = 1 +# except UnboundLocalError: +# die(252, "Failed to connect to remote listener.") + except OSError: + error(0, "Warning: failed to launch OpenOffice. Aborting.") + +convertor = None + +### Main entrance +if __name__ == '__main__': + exitcode = 0 + + op = Options(sys.argv[1:]) + try: + main() + except KeyboardInterrupt, e: + die(6, 'Exiting on user request') + die(exitcode) diff --git a/modules/websubmit/lib/websubmit_config.py b/modules/websubmit/lib/websubmit_config.py index 2d0bd2789..5848447b0 100644 --- a/modules/websubmit/lib/websubmit_config.py +++ b/modules/websubmit/lib/websubmit_config.py @@ -1,183 +1,226 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDS Invenio Submission Web Interface config file.""" __revision__ = "$Id$" +import re + ## test: test = "FALSE" ## CC all action confirmation mails to administrator? (0 == NO; 1 == YES) CFG_WEBSUBMIT_COPY_MAILS_TO_ADMIN = 0 ## During submission, warn user if she is going to leave the ## submission by following some link on the page? ## Does not work with Opera and Konqueror. ## This requires all submission functions to set Javascript variable ## 'user_must_confirm_before_leaving_page' to 'false' before ## programmatically submitting a form , or else users will be asked ## confirmation after each submission step. ## (0 == NO; 1 == YES) CFG_WEBSUBMIT_CHECK_USER_LEAVES_SUBMISSION = 0 ## List of keywords/format parameters that should not write by default ## corresponding files in submission directory (`curdir'). Some other ## filenames not included here are reserved too, such as those ## containing non-alphanumeric chars (excepted underscores '_'), for ## eg all names containing a dot ('bibdocactions.log', ## 'performed_actions.log', etc.) CFG_RESERVED_SUBMISSION_FILENAMES = ['SuE', 'files', 'lastuploadedfile', 'curdir', 'function_log', 'SN'] +## CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM -- a comma-separated +## list of document extensions in decrescent order of preference +## to suggest what is considered the best format to extract text from. +CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM = ('txt', 'html', 'xml', 'odt', 'doc', 'docx', 'djvu', 'pdf', 'ps', 'ps.gz') + +## CFG_WEBSUBMIT_DESIRED_CONVERSIONS -- a dictionary having as keys +## a format and as values the corresponding list of desired converted +## formats. +CFG_WEBSUBMIT_DESIRED_CONVERSIONS = { + 'pdf' : ('ps.gz', ), + 'ps.gz' : ('pdf', ), + 'djvu' : ('ps.gz', 'pdf'), + 'docx' : ('doc', 'odt', 'pdf', 'ps.gz'), + 'doc' : ('odt', 'pdf', 'ps.gz'), + 'rtf' : ('pdf', 'odt', 'ps.gz'), + 'odt' : ('pdf', 'doc', 'ps.gz'), + 'pptx' : ('ppt', 'odp', 'pdf', 'ps.gz'), + 'ppt' : ('odp', 'pdf', 'ps.gz'), + 'odp' : ('pdf', 'ppt', 'ps.gz'), + 'xlsx' : ('xls', 'ods', 'csv'), + 'xls' : ('ods', 'csv'), + 'ods' : ('xls', 'csv'), + 'tiff' : ('pdf', 'ps.gz'), + 'tif' : ('pdf', 'ps.gz') +} + +## CFG_WEBSUBMIT_ICON_SUBFORMAT_RE -- a subformat is an Invenio concept to give +## file formats more semantic. For example "foo.gif;icon" has ".gif;icon" +## 'format', ".gif" 'superformat' and "icon" 'subformat'. That means that this +## particular format/instance of the "foo" document, not only is a ".gif" but +## is in the shape of an "icon", i.e. most probably it will be low-resolution. +## This configuration variable let the administrator to decide which implicit +## convention will be used to know which formats will be meant to be used +## as an icon. +CFG_WEBSUBMIT_ICON_SUBFORMAT_RE = re.compile(r"icon.*") + +## CFG_WEBSUBMIT_DEFAULT_ICON_SUBFORMAT -- this is the default subformat used +## when creating new icons. +CFG_WEBSUBMIT_DEFAULT_ICON_SUBFORMAT = "icon" + + class InvenioWebSubmitFunctionError(Exception): """This exception should only ever be raised by WebSubmit functions. It will be caught and handled by the WebSubmit core itself. It is used to signal to WebSubmit core that one of the functions encountered a FATAL ERROR situation that should all further execution of the submission. The exception will carry an error message in its "value" string. This message will probably be displayed on the user's browser in an Invenio "error" box, and may be logged for the admin to examine. Again: If this exception is raised by a WebSubmit function, an error message will displayed and the submission ends in failure. Extends: Exception. """ def __init__(self, value): """Set the internal "value" attribute to that of the passed "value" parameter. @param value: (string) - an error string to display to the user. """ Exception.__init__(self) self.value = value def __str__(self): """Return oneself as a string (actually, return the contents of self.value). @return: (string) """ return str(self.value) class InvenioWebSubmitFunctionStop(Exception): """This exception should only ever be raised by WebSubmit functions. It will be caught and handled by the WebSubmit core itself. It is used to signal to WebSubmit core that one of the functions encountered a situation that should prevent the functions that follow it from being executed, and that WebSubmit core should display some sort of message to the user. This message will be stored in the "value" attribute of the object. *** NOTE: In the current WebSubmit, this "value" is ususally a JavaScript string that redirects the user's browser back to the Web form phase of the submission. The use of JavaScript, however is going to be removed in the future, so the mechanism may change. *** Extends: Exception. """ def __init__(self, value): """Set the internal "value" attribute to that of the passed "value" parameter. @param value: (string) - a string to display to the user. """ Exception.__init__(self) self.value = value def __str__(self): """Return oneself as a string (actually, return the contents of self.value). @return: (string) """ return str(self.value) class InvenioWebSubmitFunctionWarning(Exception): """This exception should be raised by a WebSubmit function when unexpected behaviour is encountered during the execution of the function. The unexpected behaviour should not have been so serious that execution had to be halted, but since the function was unable to perform its task, the event must be logged. Logging of the exception will be performed by WebSubmit. Extends: Exception. """ def __init__(self, value): """Set the internal "value" attribute to that of the passed "value" parameter. @param value: (string) - a string to write to the log. """ Exception.__init__(self) self.value = value def __str__(self): """Return oneself as a string (actually, return the contents of self.value). @return: (string) """ return str(self.value) class InvenioWebSubmitFileStamperError(Exception): """This exception should be raised by websubmit_file_stamper when an error is encoutered that prevents a file from being stamped. When caught, this exception should be used to stop processing with a failure signal. Extends: Exception. """ def __init__(self, value): """Set the internal "value" attribute to that of the passed "value" parameter. @param value: (string) - a string to write to the log. """ Exception.__init__(self) self.value = value def __str__(self): """Return oneself as a string (actually, return the contents of self.value). @return: (string) """ return str(self.value) class InvenioWebSubmitIconCreatorError(Exception): """This exception should be raised by websubmit_icon_creator when an error is encoutered that prevents an icon from being created. When caught, this exception should be used to stop processing with a failure signal. Extends: Exception. """ def __init__(self, value): """Set the internal "value" attribute to that of the passed "value" parameter. @param value: (string) - a string to write to the log. """ Exception.__init__(self) self.value = value def __str__(self): """Return oneself as a string (actually, return the contents of self.value). @return: (string) """ return str(self.value) diff --git a/modules/websubmit/lib/websubmit_file_converter.py b/modules/websubmit/lib/websubmit_file_converter.py new file mode 100644 index 000000000..19613870e --- /dev/null +++ b/modules/websubmit/lib/websubmit_file_converter.py @@ -0,0 +1,1031 @@ +# -*- coding: utf-8 -*- +## This file is part of CDS Invenio. +## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. +## +## CDS Invenio is free software; you can redistribute it and/or +## modify it under the terms of the GNU General Public License as +## published by the Free Software Foundation; either version 2 of the +## License, or (at your option) any later version. +## +## CDS Invenio is distributed in the hope that it will be useful, but +## WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +## General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., +## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. + +""" +This module implement fulltext conversion between many different file formats. +""" + +import os +import re +import sys +import shutil +import tempfile +import HTMLParser +import time + +from logging import debug, error, DEBUG, getLogger +from htmlentitydefs import entitydefs +from optparse import OptionParser + +try: + from invenio.hocrlib import create_pdf, extract_hocr + CFG_HAS_REPORTLAB = True +except ImportError: + CFG_HAS_REPORTLAB = False + +from invenio.shellutils import run_process_with_timeout +from invenio.config import CFG_TMPDIR, CFG_ETCDIR, CFG_PYLIBDIR, \ + CFG_PATH_ANY2DJVU, \ + CFG_PATH_PDFINFO, \ + CFG_PATH_GS, \ + CFG_PATH_PDFOPT, \ + CFG_PATH_PDFTOPS, \ + CFG_PATH_GZIP, \ + CFG_PATH_GUNZIP, \ + CFG_PATH_PDFTOTEXT, \ + CFG_PATH_PDFTOPPM, \ + CFG_PATH_OCROSCRIPT, \ + CFG_PATH_DJVUPS, \ + CFG_PATH_DJVUTXT, \ + CFG_PATH_OPENOFFICE_PYTHON, \ + CFG_PATH_PSTOTEXT, \ + CFG_PATH_TIFF2PDF, \ + CFG_OPENOFFICE_SERVER_HOST, \ + CFG_OPENOFFICE_SERVER_PORT, \ + CFG_OPENOFFICE_USER, \ + CFG_PATH_CONVERT, \ + CFG_PATH_PAMFILE + +from invenio.websubmit_config import \ + CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM, \ + CFG_WEBSUBMIT_DESIRED_CONVERSIONS +from invenio.errorlib import register_exception + +#logger = getLogger() +#logger.setLevel(DEBUG) + +CFG_TWO2THREE_LANG_CODES = { + 'en': 'eng', + 'nl': 'nld', + 'es': 'spa', + 'de': 'deu', + 'it': 'ita', + 'fr': 'fra', +} + +CFG_OPENOFFICE_TMPDIR = os.path.join(CFG_TMPDIR, 'ooffice-tmp-files') + +_RE_CLEAN_SPACES = re.compile(r'\s+') + + +class InvenioWebSubmitFileConverterError(Exception): + pass + + +def get_conversion_map(): + """Return a dictionary of the form: + '.pdf' : {'.ps.gz' : ('pdf2ps', {param1 : value1...}) + """ + ret = { + '.csv': {}, + '.djvu': {}, + '.doc': {}, + '.docx': {}, + '.htm': {}, + '.html': {}, + '.odp': {}, + '.ods': {}, + '.odt': {}, + '.pdf': {}, + '.ppt': {}, + '.pptx': {}, + '.ps': {}, + '.ps.gz': {}, + '.rtf': {}, + '.tif': {}, + '.tiff': {}, + '.txt': {}, + '.xls': {}, + '.xlsx': {}, + '.xml': {}, + '.hocr': {}, + } + if CFG_PATH_GZIP: + ret['.ps']['.ps.gz'] = (gzip, {}) + if CFG_PATH_GUNZIP: + ret['.ps.gz']['.ps'] = (gunzip, {}) + if CFG_PATH_ANY2DJVU: + ret['.pdf']['.djvu'] = (any2djvu, {}) + ret['.ps']['.djvu'] = (any2djvu, {}) + ret['.ps.gz']['.djvu'] = (any2djvu, {}) + if CFG_PATH_DJVUPS: + ret['.djvu']['.ps'] = (djvu2ps, {'compress': False}) + if CFG_PATH_GZIP: + ret['.djvu']['.ps.gz'] = (djvu2ps, {'compress': True}) + if CFG_PATH_DJVUTXT: + ret['.djvu']['.txt'] = (djvu2text, {}) + if CFG_PATH_PSTOTEXT: + ret['.ps']['.txt'] = (pstotext, {}) + if CFG_PATH_GUNZIP: + ret['.ps.gz']['.txt'] = (pstotext, {}) + if CFG_PATH_GS: + ret['.ps']['.pdf'] = (ps2pdfa, {}) + if CFG_PATH_GUNZIP: + ret['.ps.gz']['.pdf'] = (ps2pdfa, {}) + if CFG_PATH_PDFTOPS: + ret['.pdf']['.ps'] = (pdf2ps, {'compress': False}) + if CFG_PATH_GZIP: + ret['.pdf']['.ps.gz'] = (pdf2ps, {'compress': True}) + if CFG_PATH_PDFTOTEXT: + ret['.pdf']['.txt'] = (pdf2text, {}) + if CFG_PATH_PDFTOPPM and CFG_PATH_OCROSCRIPT and CFG_PATH_PAMFILE: + ret['.pdf']['.hocr'] = (pdf2hocr, {}) + if CFG_PATH_PDFTOPS and CFG_PATH_GS and CFG_PATH_PDFOPT and CFG_PATH_PDFINFO: + ret['.pdf']['.pdf'] = (pdf2pdfa, {}) + ret['.txt']['.txt'] = (txt2text, {}) + ret['.csv']['.txt'] = (txt2text, {}) + ret['.html']['.txt'] = (html2text, {}) + ret['.htm']['.txt'] = (html2text, {}) + ret['.xml']['.txt'] = (html2text, {}) + if CFG_HAS_REPORTLAB: + ret['.hocr']['.pdf'] = (hocr2pdf, {}) + if CFG_PATH_TIFF2PDF: + ret['.tiff']['.pdf'] = (tiff2pdf, {}) + ret['.tif']['.pdf'] = (tiff2pdf, {}) + if CFG_PATH_OPENOFFICE_PYTHON and CFG_OPENOFFICE_SERVER_HOST: + ret['.rtf']['.odt'] = (unoconv, {'output_format': 'odt'}) + ret['.rtf']['.doc'] = (unoconv, {'output_format': 'doc'}) + ret['.rtf']['.pdf'] = (unoconv, {'output_format': 'pdf'}) + ret['.rtf']['.txt'] = (unoconv, {'output_format': 'text'}) + ret['.doc']['.odt'] = (unoconv, {'output_format': 'odt'}) + ret['.doc']['.pdf'] = (unoconv, {'output_format': 'pdf'}) + ret['.doc']['.txt'] = (unoconv, {'output_format': 'text'}) + ret['.docx']['.odt'] = (unoconv, {'output_format': 'odt'}) + ret['.docx']['.doc'] = (unoconv, {'output_format': 'doc'}) + ret['.docx']['.pdf'] = (unoconv, {'output_format': 'pdf'}) + ret['.docx']['.txt'] = (unoconv, {'output_format': 'text'}) + ret['.odt']['.doc'] = (unoconv, {'output_format': 'doc'}) + ret['.odt']['.pdf'] = (unoconv, {'output_format': 'pdf'}) + ret['.odt']['.txt'] = (unoconv, {'output_format': 'text'}) + ret['.ppt']['.odp'] = (unoconv, {'output_format': 'odp'}) + ret['.ppt']['.pdf'] = (unoconv, {'output_format': 'pdf'}) + ret['.ppt']['.txt'] = (unoconv, {'output_format': 'text'}) + ret['.pptx']['.odp'] = (unoconv, {'output_format': 'odp'}) + ret['.pptx']['.ppt'] = (unoconv, {'output_format': 'ppt'}) + ret['.pptx']['.pdf'] = (unoconv, {'output_format': 'pdf'}) + ret['.pptx']['.txt'] = (unoconv, {'output_format': 'text'}) + ret['.odp']['.ppt'] = (unoconv, {'output_format': 'ppt'}) + ret['.odp']['.pdf'] = (unoconv, {'output_format': 'pdf'}) + ret['.odp']['.txt'] = (unoconv, {'output_format': 'text'}) + ret['.xls']['.ods'] = (unoconv, {'output_format': 'ods'}) + ret['.xls']['.pdf'] = (unoconv, {'output_format': 'pdf'}) + ret['.xls']['.txt'] = (unoconv, {'output_format': 'text'}) + ret['.xls']['.csv'] = (unoconv, {'output_format': 'csv'}) + ret['.xlsx']['.xls'] = (unoconv, {'output_format': 'xls'}) + ret['.xlsx']['.ods'] = (unoconv, {'output_format': 'ods'}) + ret['.xlsx']['.pdf'] = (unoconv, {'output_format': 'pdf'}) + ret['.xlsx']['.txt'] = (unoconv, {'output_format': 'text'}) + ret['.xlsx']['.csv'] = (unoconv, {'output_format': 'csv'}) + ret['.ods']['.xls'] = (unoconv, {'output_format': 'xls'}) + ret['.ods']['.pdf'] = (unoconv, {'output_format': 'pdf'}) + ret['.ods']['.txt'] = (unoconv, {'output_format': 'text'}) + ret['.ods']['.csv'] = (unoconv, {'output_format': 'csv'}) + return ret + + +def get_best_format_to_extract_text_from(filelist, best_formats=CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM): + """ + Return among the filelist the best file whose format is best suited for + extracting text. + """ + from invenio.bibdocfile import decompose_file, normalize_format + best_formats = [normalize_format(aformat) for aformat in best_formats if can_convert(aformat, '.txt')] + for aformat in best_formats: + for filename in filelist: + if decompose_file(filename, skip_version=True)[2].endswith(aformat): + return filename + raise InvenioWebSubmitFileConverterError("It's not possible to extract valuable text from any of the proposed files.") + + +def get_missing_formats(filelist, desired_conversion=None): + """Given a list of files it will return a dictionary of the form: + file1 : missing formats to generate from it... + """ + from invenio.bibdocfile import normalize_format, decompose_file + + def normalize_desired_conversion(): + ret = {} + for key, value in desired_conversion.iteritems(): + ret[normalize_format(key)] = [normalize_format(aformat) for aformat in value] + return ret + + if desired_conversion is None: + desired_conversion = CFG_WEBSUBMIT_DESIRED_CONVERSIONS + + available_formats = [decompose_file(filename, skip_version=True)[2] for filename in filelist] + missing_formats = [] + desired_conversion = normalize_desired_conversion() + ret = {} + for filename in filelist: + aformat = decompose_file(filename, skip_version=True)[2] + if aformat in desired_conversion: + for desired_format in desired_conversion[aformat]: + if desired_format not in available_formats and desired_format not in missing_formats: + missing_formats.append(desired_format) + if filename not in ret: + ret[filename] = [] + ret[filename].append(desired_format) + return ret + + +def can_convert(input_format, output_format, max_intermediate_conversions=2): + """Return the chain of conversion to transform input_format into output_format, if any.""" + from invenio.bibdocfile import normalize_format + if max_intermediate_conversions <= 0: + return [] + input_format = normalize_format(input_format) + output_format = normalize_format(output_format) + if input_format in __CONVERSION_MAP: + if output_format in __CONVERSION_MAP[input_format]: + return [__CONVERSION_MAP[input_format][output_format]] + best_res = [] + best_intermediate = '' + for intermediate_format in __CONVERSION_MAP[input_format]: + res = can_convert(intermediate_format, output_format, max_intermediate_conversions-1) + if res and (len(res) < best_res or not best_res): + best_res = res + best_intermediate = intermediate_format + if best_res: + return [__CONVERSION_MAP[input_format][best_intermediate]] + best_res + return [] + + +def can_pdfopt(): + """Return True if it's possible to optimize PDFs.""" + return bool(CFG_PATH_PDFOPT) + + +def can_pdfa(): + """Return True if it's possible to generate PDF/As.""" + return bool(CFG_PATH_PDFTOPS and CFG_PATH_GS and CFG_PATH_PDFINFO) + + +def can_perform_ocr(): + """Return True if it's possible to perform OCR.""" + return bool(CFG_PATH_OCROSCRIPT) and bool(CFG_PATH_PDFTOPPM) + + +def guess_ocropus_produced_garbage(input_file, hocr_p): + """Return True if the output produced by OCROpus in hocr format contains + only garbage instead of text. This is implemented via an heuristic: + if the most common length for sentences encoded in UTF-8 is 1 then + this is Garbage (tm). + """ + + def _get_words_from_text(): + ret = [] + for row in open(input_file): + for word in row.strip().split(' '): + ret.append(word.strip()) + return ret + + def _get_words_from_hocr(): + ret = [] + hocr = extract_hocr(open(input_file).read()) + for dummy, dummy, lines in hocr: + for dummy, line in lines: + for word in line.split(): + ret.append(word.strip()) + return ret + + if hocr_p: + words = _get_words_from_hocr() + else: + words = _get_words_from_text() + #stats = {} + #most_common_len = 0 + #most_common_how_many = 0 + #for word in words: + #if word: + #word_length = len(word.decode('utf-8')) + #stats[word_length] = stats.get(word_length, 0) + 1 + #if stats[word_length] > most_common_how_many: + #most_common_len = word_length + #most_common_how_many = stats[word_length] + goods = 0 + bads = 0 + for word in words: + for char in word.decode('utf-8'): + if (u'a' <= char <= u'z') or (u'A' <= char <= u'Z'): + goods += 1 + else: + bads += 1 + if bads > goods: + debug('OCROpus produced garbage') + return True + else: + return False + + +def guess_is_OCR_needed(input_file, ln='en'): + """ + Tries to see if enough text is retrievable from input_file. + Return True if OCR is needed, False if it's already + possible to retrieve information from the document. + """ + ## FIXME: a way to understand if pdftotext has returned garbage + ## shuould be found. E.g. 1.0*len(text)/len(zlib.compress(text)) < 2.1 + ## could be a good hint for garbage being found. + return True + + +def convert_file(input_file, output_file=None, output_format=None, **params): + """ + Convert files from one format to another. + @param input_file [string] the path to an existing file + @param output_file [string] the path to the desired ouput. (if None a + temporary file is generated) + @param output_format [string] the desired format (if None it is taken from + output_file) + @param params other paramaters to pass to the particular converter + @return [string] the final output_file + """ + from invenio.bibdocfile import decompose_file, normalize_format + if output_format is None: + if output_file is None: + raise ValueError("At least output_file or format should be specified.") + else: + output_ext = decompose_file(output_file, skip_version=True)[2] + else: + output_ext = normalize_format(output_format) + input_ext = decompose_file(input_file, skip_version=True)[2] + conversion_chain = can_convert(input_ext, output_ext) + if conversion_chain: + current_input = input_file + current_output = None + for i in xrange(len(conversion_chain)): + if i == (len(conversion_chain) - 1): + current_output = output_file + converter = conversion_chain[i][0] + final_params = dict(conversion_chain[i][1]) + final_params.update(params) + try: + return converter(current_input, current_output, **final_params) + except InvenioWebSubmitFileConverterError, err: + raise InvenioWebSubmitFileConverterError("Error when converting from %s to %s: %s" % (input_file, output_ext, err)) + except Exception, err: + register_exception() + raise InvenioWebSubmitFileConverterError("Unexpected error when converting from %s to %s (%s): %s" % (input_file, output_ext, type(err), err)) + current_input = current_output + else: + raise InvenioWebSubmitFileConverterError("It's impossible to convert from %s to %s" % (input_ext, output_ext)) + + +def check_openoffice_tmpdir(): + """Return True if OpenOffice tmpdir do exists and OpenOffice can + successfully create file there.""" + if not os.path.exists(CFG_OPENOFFICE_TMPDIR): + raise InvenioWebSubmitFileConverterError('%s does not exists' % CFG_OPENOFFICE_TMPDIR) + if not os.path.isdir(CFG_OPENOFFICE_TMPDIR): + raise InvenioWebSubmitFileConverterError('%s is not a directory' % CFG_OPENOFFICE_TMPDIR) + now = str(time.time()) + execute_command('sudo', '-u', CFG_OPENOFFICE_USER, CFG_PATH_OPENOFFICE_PYTHON, 'import os; open(os.path.join(%s, "test"), "w").write(%s)' % (repr(CFG_OPENOFFICE_TMPDIR), repr(now))) + try: + test = open(os.path.join(CFG_OPENOFFICE_TMPDIR, 'test')).read() + if test != now: + raise IOError + except: + raise InvenioWebSubmitFileConverterError("%s can't be properly written by OpenOffice.org or read by Apache" % CFG_OPENOFFICE_TMPDIR) + + +def unoconv(input_file, output_file=None, output_format='txt', pdfopt=True, **dummy): + """Use unconv to convert among OpenOffice understood documents.""" + from invenio.bibdocfile import normalize_format + try: + check_openoffice_tmpdir() + except InvenioWebSubmitFileConverterError, err: + register_exception(alert_admin=True, prefix='ERROR: it\'s impossible to properly execute OpenOffice.org conversions: %s' % err) + raise + + input_file, output_file, dummy = prepare_io(input_file, output_file, output_format, need_working_dir=False) + if output_format == 'txt': + unoconv_format = 'text' + else: + unoconv_format = output_format + try: + tmpfile = tempfile.mktemp(dir=CFG_OPENOFFICE_TMPDIR, suffix=normalize_format(output_format)) + execute_command('sudo', '-u', CFG_OPENOFFICE_USER, CFG_PATH_OPENOFFICE_PYTHON, os.path.join(CFG_PYLIBDIR, 'invenio', 'unoconv.py'), '-v', '-s', CFG_OPENOFFICE_SERVER_HOST, '-p', CFG_OPENOFFICE_SERVER_PORT, '--outputfile', tmpfile, '-f', unoconv_format, input_file) + except InvenioWebSubmitFileConverterError: + time.sleep(5) + execute_command('sudo', '-u', CFG_OPENOFFICE_USER, CFG_PATH_OPENOFFICE_PYTHON, os.path.join(CFG_PYLIBDIR, 'invenio', 'unoconv.py'), '-v', '-s', CFG_OPENOFFICE_SERVER_HOST, '-p', CFG_OPENOFFICE_SERVER_PORT, '--outputfile', tmpfile, '-f', unoconv_format, input_file) + + if not os.path.exists(tmpfile): + raise InvenioWebSubmitFileConverterError('No output was generated by OpenOffice') + + output_format = normalize_format(output_format) + + if output_format == '.pdf' and pdfopt: + pdf2pdfopt(tmpfile, output_file) + else: + shutil.copy(tmpfile, output_file) + execute_command('sudo', '-u', CFG_OPENOFFICE_USER, CFG_PATH_OPENOFFICE_PYTHON, '-c', 'import os; os.remove(%s)' % repr(tmpfile)) + return output_file + + +def any2djvu(input_file, output_file=None, resolution=400, ocr=True, input_format=5, **dummy): + """ + Transform input_file into a .djvu file. + @param input_file [string] the input file name + @param output_file [string] the output_file file name, None for temporary generated + @param resolution [int] the resolution of the output_file + @param input_format [int] [1-9]: + 1 - DjVu Document (for verification or OCR) + 2 - PS/PS.GZ/PDF Document (default) + 3 - Photo/Picture/Icon + 4 - Scanned Document - B&W - <200 dpi + 5 - Scanned Document - B&W - 200-400 dpi + 6 - Scanned Document - B&W - >400 dpi + 7 - Scanned Document - Color/Mixed - <200 dpi + 8 - Scanned Document - Color/Mixed - 200-400 dpi + 9 - Scanned Document - Color/Mixed - >400 dpi + @return [string] output_file input_file. + raise InvenioWebSubmitFileConverterError in case of errors. + Note: due to the bottleneck of using a centralized server, it is very + slow and is not suitable for interactive usage (e.g. WebSubmit functions) + """ + from invenio.bibdocfile import decompose_file + input_file, output_file, working_dir = prepare_io(input_file, output_file, '.djvu') + + ocr = ocr and "1" or "0" + + ## Any2djvu expect to find the file in the current directory. + execute_command(CFG_PATH_ANY2DJVU, '-a', '-c', '-r', resolution, '-o', ocr, '-f', input_format, os.path.basename(input_file), cwd=working_dir) + + ## Any2djvu doesn't let you choose the output_file file name. + djvu_output = os.path.join(working_dir, decompose_file(input_file)[1] + '.djvu') + shutil.move(djvu_output, output_file) + clean_working_dir(working_dir) + return output_file + + +_RE_FIND_TITLE = re.compile(r'^Title:\s*(.*?)\s*$') + + +def pdf2pdfa(input_file, output_file=None, title=None, pdfopt=True, **dummy): + """ + Transform any PDF into a PDF/A (see: ) + @param input_file [string] the input file name + @param output_file [string] the output_file file name, None for temporary generated + @param title [string] the title of the document. None for autodiscovery. + @param pdfopt [bool] whether to linearize the pdf, too. + @return [string] output_file input_file + raise InvenioWebSubmitFileConverterError in case of errors. + """ + + input_file, output_file, working_dir = prepare_io(input_file, output_file, '.pdf') + + if title is None: + stdout = execute_command(CFG_PATH_PDFINFO, input_file) + for line in stdout.split('\n'): + g = _RE_FIND_TITLE.match(line) + if g: + title = g.group(1) + break + if not title: + raise InvenioWebSubmitFileConverterError("It's impossible to automatically discover the title. Please specify it as a parameter") + + debug("Extracted title is %s" % title) + + shutil.copy(os.path.join(CFG_ETCDIR, 'websubmit', 'file_converter_templates', 'ISOCoatedsb.icc'), working_dir) + pdfa_header = open(os.path.join(CFG_ETCDIR, 'websubmit', 'file_converter_templates', 'PDFA_def.ps')).read() + pdfa_header = pdfa_header.replace('<<<>>>', title) + inputps = os.path.join(working_dir, 'input.ps') + outputpdf = os.path.join(working_dir, 'output_file.pdf') + open(os.path.join(working_dir, 'PDFA_def.ps'), 'w').write(pdfa_header) + execute_command(CFG_PATH_PDFTOPS, '-level3', input_file, inputps) + execute_command(CFG_PATH_GS, '-sProcessColorModel=DeviceCMYK', '-dPDFA', '-dBATCH', '-dNOPAUSE', '-dNOOUTERSAVE', '-dUseCIEColor', '-sDEVICE=pdfwrite', '-sOutputFile=output_file.pdf', 'PDFA_def.ps', 'input.ps', cwd=working_dir) + if pdfopt: + execute_command(CFG_PATH_PDFOPT, outputpdf, output_file) + else: + shutil.move(outputpdf, output_file) + clean_working_dir(working_dir) + return output_file + + +def pdf2pdfopt(input_file, output_file=None, **dummy): + """ + Linearize the input PDF in order to improve the web-experience when + visualizing the document through the web. + @param input_file [string] the input input_file + @param output_file [string] the output_file file name, None for temporary generated + @return [string] output_file input_file + raise InvenioWebSubmitFileConverterError in case of errors. + """ + input_file, output_file, dummy = prepare_io(input_file, output_file, '.pdf', need_working_dir=False) + execute_command(CFG_PATH_PDFOPT, input_file, output_file) + return output_file + + +def pdf2ps(input_file, output_file=None, level=2, compress=True, **dummy): + """ + Convert from Pdf to Postscript. + """ + if compress: + suffix = '.ps.gz' + else: + suffix = '.ps' + input_file, output_file, working_dir = prepare_io(input_file, output_file, suffix) + execute_command(CFG_PATH_PDFTOPS, '-level%i' % level, input_file, os.path.join(working_dir, 'output.ps')) + if compress: + execute_command(CFG_PATH_GZIP, '-c', os.path.join(working_dir, 'output.ps'), filename_out=output_file) + else: + shutil.move(os.path.join(working_dir, 'output.ps'), output_file) + clean_working_dir(working_dir) + return output_file + + +def ps2pdfa(input_file, output_file=None, title=None, pdfopt=True, **dummy): + """ + Transform any PS into a PDF/A (see: ) + @param input_file [string] the input file name + @param output_file [string] the output_file file name, None for temporary generated + @param title [string] the title of the document. None for autodiscovery. + @param pdfopt [bool] whether to linearize the pdf, too. + @return [string] output_file input_file + raise InvenioWebSubmitFileConverterError in case of errors. + """ + + input_file, output_file, working_dir = prepare_io(input_file, output_file, '.pdf') + if not title: + raise InvenioWebSubmitFileConverterError("It's impossible to automatically discover the title. Please specify it as a parameter") + + shutil.copy(os.path.join(CFG_ETCDIR, 'websubmit', 'file_converter_templates', 'ISOCoatedsb.icc'), working_dir) + pdfa_header = open(os.path.join(CFG_ETCDIR, 'websubmit', 'file_converter_templates', 'PDFA_def.ps')).read() + pdfa_header = pdfa_header.replace('<<<>>>', title) + outputpdf = os.path.join(working_dir, 'output_file.pdf') + open(os.path.join(working_dir, 'PDFA_def.ps'), 'w').write(pdfa_header) + execute_command(CFG_PATH_GS, '-sProcessColorModel=DeviceCMYK', '-dPDFA', '-dBATCH', '-dNOPAUSE', '-dNOOUTERSAVE', '-dUseCIEColor', '-sDEVICE=pdfwrite', '-sOutputFile=output_file.pdf', 'PDFA_def.ps', input_file, cwd=working_dir) + if pdfopt: + execute_command(CFG_PATH_PDFOPT, outputpdf, output_file) + else: + shutil.move(outputpdf, output_file) + clean_working_dir(working_dir) + return output_file + + +def pdf2hocr(input_file, output_file=None, ln='en', return_working_dir=False, extract_only_text=False, **dummy): + """ + Return the text content in input_file. + @param ln is a two letter language code to give the OCR tool a hint. + @param return_working_dir if set to True, will return output_file path and the working_dir path, instead of deleting the working_dir. This is useful in case you need the intermediate images to build again a PDF. + """ + + def _perform_rotate(working_dir, imagefile, angle): + """Rotate imagefile of the corresponding angle. Creates a new file + with rotated- as prefix.""" + debug('Performing rotate on %s by %s degrees' % (imagefile, angle)) + if not angle: + #execute_command('%s %s %s', CFG_PATH_CONVERT, os.path.join(working_dir, imagefile), os.path.join(working_dir, 'rotated-%s' % imagefile)) + shutil.copy(os.path.join(working_dir, imagefile), os.path.join(working_dir, 'rotated-%s' % imagefile)) + else: + execute_command(CFG_PATH_CONVERT, os.path.join(working_dir, imagefile), '-rotate', str(angle), os.path.join(working_dir, 'rotated-%s' % imagefile)) + return True + + def _perform_deskew(working_dir, imagefile): + """Perform ocroscript deskew. Expect to work on rotated-imagefile. + Creates deskewed-imagefile. + Return True if deskewing was fine.""" + debug('Performing deskew on %s' % imagefile) + try: + dummy, stderr = execute_command_with_stderr(CFG_PATH_OCROSCRIPT, os.path.join(CFG_ETCDIR, 'websubmit', 'file_converter_templates', 'deskew.lua'), os.path.join(working_dir, 'rotated-%s' % imagefile), os.path.join(working_dir, 'deskewed-%s' % imagefile)) + if stderr.strip(): + debug('Errors found during deskewing') + return False + else: + return True + except InvenioWebSubmitFileConverterError, err: + debug('Deskewing error: %s' % err) + return False + + def _perform_recognize(working_dir, imagefile): + """Perform ocroscript recognize. Expect to work on deskewed-imagefile. + Creates recognized.out Return True if recognizing was fine.""" + debug('Performing recognize on %s' % imagefile) + if extract_only_text: + output_mode = 'text' + else: + output_mode = 'hocr' + try: + dummy, stderr = execute_command_with_stderr(CFG_PATH_OCROSCRIPT, 'recognize', '--tesslanguage=%s' % ln, '--output-mode=%s' % output_mode, os.path.join(working_dir, 'deskewed-%s' % imagefile), filename_out=os.path.join(working_dir, 'recognize.out')) + if stderr.strip(): + ## There was some output on stderr + debug('Errors found in recognize.err') + return False + return not guess_ocropus_produced_garbage(os.path.join(working_dir, 'recognize.out'), not extract_only_text) + except InvenioWebSubmitFileConverterError, err: + debug('Recognizer error: %s' % err) + return False + + def _perform_dummy_recognize(working_dir, imagefile): + """Return an empty text or an empty hocr referencing the image.""" + debug('Performing dummy recognize on %s' % imagefile) + if extract_only_text: + out = '' + else: + stdout = stderr = '' + try: + ## Since pdftoppm is returning a netpbm image, we use + ## pamfile to retrieve the size of the image, in order to + ## create an empty .hocr file containing just the + ## desired file and a reference to its size. + stdout, stderr = execute_command_with_stderr(CFG_PATH_PAMFILE, os.path.join(working_dir, imagefile)) + g = re.search(r'(?P\d+) by (?P\d+)', stdout) + if g: + width = int(g.group('width')) + height = int(g.group('height')) + + out = """ + OCR Output +
    +
    """ % (width, height, os.path.join(working_dir, imagefile)) + else: + raise InvenioWebSubmitFileConverterError() + except Exception, err: + raise InvenioWebSubmitFileConverterError('It\'s impossible to retrieve the size of %s needed to perform a dummy OCR. The stdout of pamfile was: %s, the stderr was: %s. (%s)' % (imagefile, stdout, stderr, err)) + open(os.path.join(working_dir, 'recognize.out'), 'w').write(out) + + if CFG_PATH_OCROSCRIPT: + ln = CFG_TWO2THREE_LANG_CODES.get(ln, 'eng') + if extract_only_text: + output_format = '.txt' + else: + output_format = '.hocr' + input_file, output_file, working_dir = prepare_io(input_file, output_file, output_format) + #execute_command('pdfimages %s %s', input_file, os.path.join(working_dir, 'image')) + execute_command(CFG_PATH_PDFTOPPM, '-r', '300', '-aa', 'yes', '-freetype' 'yes', input_file, os.path.join(working_dir, 'image')) + + images = os.listdir(working_dir) + images.sort() + for imagefile in images: + if imagefile.startswith('image-'): + for angle in (0, 90, 180, 270): + if _perform_rotate(working_dir, imagefile, angle) and _perform_deskew(working_dir, imagefile) and _perform_recognize(working_dir, imagefile): + ## Things went nicely! So we can remove the original + ## pbm picture which is soooooo huuuuugeee. + os.remove(os.path.join(working_dir, 'rotated-%s' % imagefile)) + os.remove(os.path.join(working_dir, imagefile)) + break + else: + _perform_dummy_recognize(working_dir, imagefile) + open(output_file, 'a').write(open(os.path.join(working_dir, 'recognize.out')).read()) + + if return_working_dir: + return output_file, working_dir + else: + clean_working_dir(working_dir) + return output_file + + else: + raise InvenioWebSubmitFileConverterError("It's impossible to generate HOCR output from PDF. OCROpus is not available.") + + +def hocr2pdf(input_file, output_file=None, working_dir=None, font="Courier", author=None, keywords=None, subject=None, title=None, draft=False, pdfopt=True, **dummy): + """ + @param working_dir the directory containing images to build the PDF. + @param font the default font (e.g. Courier, Times-Roman). + @param author the author name. + @param subject the subject of the document. + @param title the title of the document. + @param draft whether to enable debug information in the output. + """ + if working_dir: + working_dir = os.path.abspath(working_dir) + else: + working_dir = os.path.abspath(os.path.dirname(input_file)) + + if pdfopt: + input_file, tmp_output_file, dummy = prepare_io(input_file, output_ext='.pdf', need_working_dir=False) + else: + input_file, output_file, dummy = prepare_io(input_file, output_file=output_file, need_working_dir=False) + tmp_output_file = output_file + + try: + create_pdf(extract_hocr(open(input_file).read()), tmp_output_file, font=font, author=author, keywords=keywords, subject=subject, title=title, image_path=working_dir, draft=draft) + except: + register_exception() + raise + + if pdfopt: + output_file = pdf2pdfopt(tmp_output_file, output_file) + os.remove(tmp_output_file) + return output_file + else: + return tmp_output_file + + +def pdf2hocr2pdf(input_file, output_file=None, font="Courier", author=None, keywords=None, subject=None, title=None, draft=False, ln='en', pdfopt=True, **dummy): + """ + Transform a scanned PDF into a PDF with OCRed text. + @param font the default font (e.g. Courier, Times-Roman). + @param author the author name. + @param subject the subject of the document. + @param title the title of the document. + @param draft whether to enable debug information in the output. + @param ln is a two letter language code to give the OCR tool a hint. + """ + input_file, output_hocr_file, dummy = prepare_io(input_file, output_ext='.hocr', need_working_dir=False) + output_hocr_file, working_dir = pdf2hocr(input_file, output_file=output_hocr_file, ln=ln, return_working_dir=True) + output_file = hocr2pdf(output_hocr_file, output_file, working_dir, font=font, author=author, keywords=keywords, subject=subject, title=title, draft=draft, pdfopt=pdfopt) + os.remove(output_hocr_file) + clean_working_dir(working_dir) + return output_file + + +def pdf2text(input_file, output_file=None, perform_ocr=True, ln='en', **dummy): + """ + Return the text content in input_file. + """ + input_file, output_file, dummy = prepare_io(input_file, output_file, '.txt', need_working_dir=False) + execute_command(CFG_PATH_PDFTOTEXT, '-enc', 'UTF-8', '-eol', 'unix', '-nopgbrk', input_file, output_file) + if perform_ocr and can_perform_ocr(): + ocred_output = pdf2hocr(input_file, ln=ln, extract_only_text=True) + open(output_file, 'a').write(open(ocred_output).read()) + os.remove(ocred_output) + return output_file + + +def txt2text(input_file, output_file=None, **dummy): + """ + Return the text content in input_file + """ + input_file, output_file, dummy = prepare_io(input_file, output_file, '.txt', need_working_dir=False) + shutil.copy(input_file, output_file) + return output_file + + +def html2text(input_file, output_file=None, **dummy): + """ + Return the text content of an HTML/XML file. + """ + + class HTMLStripper(HTMLParser.HTMLParser): + + def __init__(self, output_file): + HTMLParser.HTMLParser.__init__(self) + self.output_file = output_file + + def handle_entityref(self, name): + if name in entitydefs: + self.output_file.write(entitydefs[name].decode('latin1').encode('utf8')) + + def handle_data(self, data): + if data.strip(): + self.output_file.write(_RE_CLEAN_SPACES.sub(' ', data)) + + def handle_charref(self, data): + try: + self.output_file.write(unichr(int(data)).encode('utf8')) + except: + pass + + def close(self): + self.output_file.close() + HTMLParser.HTMLParser.close(self) + + input_file, output_file, dummy = prepare_io(input_file, output_file, '.txt', need_working_dir=False) + html_stripper = HTMLStripper(open(output_file, 'w')) + for line in open(input_file): + html_stripper.feed(line) + html_stripper.close() + return output_file + + +def djvu2text(input_file, output_file=None, **dummy): + """ + Return the text content in input_file. + """ + input_file, output_file, dummy = prepare_io(input_file, output_file, '.txt', need_working_dir=False) + execute_command(CFG_PATH_DJVUTXT, input_file, output_file) + return output_file + + +def djvu2ps(input_file, output_file=None, level=2, compress=True, **dummy): + """ + Convert a djvu into a .ps[.gz] + """ + if compress: + input_file, output_file, working_dir = prepare_io(input_file, output_file, output_ext='.ps.gz') + execute_command(CFG_PATH_DJVUPS, input_file, os.path.join(working_dir, 'output.ps')) + execute_command(CFG_PATH_GZIP, '-c', os.path.join(working_dir, 'output.ps'), filename_out=output_file) + else: + input_file, output_file, working_dir = prepare_io(input_file, output_file, output_ext='.ps') + execute_command(CFG_PATH_DJVUPS, '-level=%i' % level, input_file, output_file) + clean_working_dir(working_dir) + return output_file + + +def tiff2pdf(input_file, output_file=None, pdfopt=True, pdfa=True, perform_ocr=True, **args): + """ + Convert a .tiff into a .pdf + """ + if pdfa or pdfopt or perform_ocr: + input_file, output_file, working_dir = prepare_io(input_file, output_file, '.pdf') + partial_output = os.path.join(working_dir, 'output.pdf') + execute_command(CFG_PATH_TIFF2PDF, '-o', partial_output, input_file) + if perform_ocr: + pdf2hocr2pdf(partial_output, output_file, pdfopt=pdfopt, **args) + elif pdfa: + pdf2pdfa(partial_output, output_file, pdfopt=pdfopt, **args) + else: + pdfopt(partial_output, output_file) + clean_working_dir(working_dir) + else: + input_file, output_file, dummy = prepare_io(input_file, output_file, '.pdf', need_working_dir=False) + execute_command(CFG_PATH_TIFF2PDF, '-o', output_file, input_file) + return output_file + + +def pstotext(input_file, output_file=None, **dummy): + """ + Convert a .ps[.gz] into text. + """ + input_file, output_file, working_dir = prepare_io(input_file, output_file, '.txt') + if input_file.endswith('.gz'): + new_input_file = os.path.join(working_dir, 'input.ps') + execute_command(CFG_PATH_GUNZIP, '-c', input_file, filename_out=new_input_file) + input_file = new_input_file + execute_command(CFG_PATH_PSTOTEXT, '-output', output_file, input_file) + clean_working_dir(working_dir) + return output_file + + +def gzip(input_file, output_file=None, **dummy): + """ + Compress a file. + """ + input_file, output_file, dummy = prepare_io(input_file, output_file, '.gz', need_working_dir=False) + execute_command(CFG_PATH_GZIP, '-c', input_file, filename_out=output_file) + return output_file + + +def gunzip(input_file, output_file=None, **dummy): + """ + Uncompress a file. + """ + from invenio.bibdocfile import decompose_file + input_ext = decompose_file(input_file, skip_version=True)[2] + if input_ext.endswith('.gz'): + input_ext = input_ext[:-len('.gz')] + else: + input_ext = None + input_file, output_file, dummy = prepare_io(input_file, output_file, input_ext, need_working_dir=False) + execute_command(CFG_PATH_GUNZIP, '-c', input_file, filename_out=output_file) + return output_file + + +def prepare_io(input_file, output_file=None, output_ext=None, need_working_dir=True): + """Clean input_file and the output_file.""" + from invenio.bibdocfile import decompose_file, normalize_format + output_ext = normalize_format(output_ext) + debug('Preparing IO for input=%s, output=%s, output_ext=%s' % (input_file, output_file, output_ext)) + if output_ext is None: + if output_file is None: + output_ext = '.tmp' + else: + output_ext = decompose_file(output_file, skip_version=True)[2] + if output_file is None: + try: + (fd, output_file) = tempfile.mkstemp(suffix=output_ext, dir=CFG_TMPDIR) + os.close(fd) + except IOError, err: + raise InvenioWebSubmitFileConverterError("It's impossible to create a temporary file: %s" % err) + else: + output_file = os.path.abspath(output_file) + if os.path.exists(output_file): + os.remove(output_file) + + if need_working_dir: + try: + working_dir = tempfile.mkdtemp(dir=CFG_TMPDIR, prefix='conversion') + except IOError, err: + raise InvenioWebSubmitFileConverterError("It's impossible to create a temporary directory: %s" % err) + + input_ext = decompose_file(input_file, skip_version=True)[2] + new_input_file = os.path.join(working_dir, 'input' + input_ext) + shutil.copy(input_file, new_input_file) + input_file = new_input_file + else: + working_dir = None + input_file = os.path.abspath(input_file) + + debug('IO prepared: input_file=%s, output_file=%s, working_dir=%s' % (input_file, output_file, working_dir)) + return (input_file, output_file, working_dir) + + +def clean_working_dir(working_dir): + """ + Remove the working_dir. + """ + debug('Cleaning working_dir: %s' % working_dir) + shutil.rmtree(working_dir) + + +def execute_command(*args, **argd): + """Wrapper to run_process_with_timeout.""" + debug("Executing: %s" % (args, )) + res, stdout, stderr = run_process_with_timeout(args, cwd=argd.get('cwd'), filename_out=argd.get('filename_out'), filename_err=argd.get('filename_err')) + if res != 0: + error("Error when executing %s" % (args, )) + raise InvenioWebSubmitFileConverterError("Error in running %s\n stdout:\n%s\nstderr:\n%s\n" % (args, stdout, stderr)) + return stdout + + +def execute_command_with_stderr(*args, **argd): + """Wrapper to run_process_with_timeout.""" + debug("Executing: %s" % (args, )) + res, stdout, stderr = run_process_with_timeout(args, cwd=argd.get('cwd'), filename_out=argd.get('filename_out')) + if res != 0: + error("Error when executing %s" % (args, )) + raise InvenioWebSubmitFileConverterError("Error in running %s\n stdout:\n%s\nstderr:\n%s\n" % (args, stdout, stderr)) + return stdout, stderr + +__CONVERSION_MAP = get_conversion_map() + + +def main_cli(): + """ + main function when the library behaves as a normal CLI tool. + """ + from invenio.bibdocfile import normalize_format + parser = OptionParser() + parser.add_option("-c", "--convert", dest="input_name", + help="convert the specified FILE", metavar="FILE") + parser.add_option("-d", "--debug", dest="debug", action="store_true", help="Enable debug information") + parser.add_option("--special-pdf2hocr2pdf", dest="ocrize", help="convert the given scanned PDF into a PDF with OCRed text", metavar="FILE") + parser.add_option("-f", "--format", dest="output_format", help="the desired output format", metavar="FORMAT") + parser.add_option("-o", "--output", dest="output_name", help="the desired output FILE (if not specified a new file will be generated with the desired output format)") + parser.add_option("--without-pdfa", action="store_false", dest="pdf_a", default=True, help="don't force creation of PDF/A PDFs") + parser.add_option("--without-pdfopt", action="store_false", dest="pdfopt", default=True, help="don't force optimization of PDFs files") + parser.add_option("--without-ocr", action="store_false", dest="ocr", default=True, help="don't force OCR") + parser.add_option("--can-convert", dest="can_convert", help="display all the possible format that is possible to generate from the given format", metavar="FORMAT") + parser.add_option("--is-ocr-needed", dest="check_ocr_is_needed", help="check if OCR is needed for the FILE specified", metavar="FILE") + parser.add_option("-t", "--title", dest="title", help="specify the title (used when creating PDFs)", metavar="TITLE") + parser.add_option("-l", "--language", dest="ln", help="specify the language (used when performing OCR, e.g. en, it, fr...)", metavar="LN", default='en') + (options, dummy) = parser.parse_args() + if options.debug: + getLogger().setLevel(DEBUG) + if options.can_convert: + if options.can_convert: + input_format = normalize_format(options.can_convert) + if input_format == '.pdf': + if can_pdfopt(): + print "PDF linearization supported" + else: + print "No PDF linearization support" + if can_pdfa(): + print "PDF/A generation supported" + else: + print "No PDF/A generation support" + if can_perform_ocr(): + print "OCR supported" + else: + print "OCR not supported" + print 'Can convert from "%s" to:' % input_format[1:], + for output_format in __CONVERSION_MAP: + if can_convert(input_format, output_format): + print '"%s"' % output_format[1:], + print + elif options.check_ocr_is_needed: + print "Checking if OCR is needed on %s..." % options.check_ocr_is_needed, + sys.stdout.flush() + if guess_is_OCR_needed(options.check_ocr_is_needed): + print "needed." + else: + print "not needed." + elif options.ocrize: + try: + output = pdf2hocr2pdf(options.ocrize, output_file=options.output_name, title=options.title, ln=options.ln) + print "Output stored in %s" % output + except InvenioWebSubmitFileConverterError, err: + print "ERROR: %s" % err + sys.exit(1) + else: + try: + if not options.output_name and not options.output_format: + parser.error("Either --format, --output should be specified") + if not options.input_name: + parser.error("An input should be specified!") + output = convert_file(options.input_name, output_file=options.output_name, output_format=options.output_format, pdfopt=options.pdfopt, pdfa=options.pdf_a, title=options.title, ln=options.ln) + print "Output stored in %s" % output + except InvenioWebSubmitFileConverterError, err: + print "ERROR: %s" % err + sys.exit(1) + + +if __name__ == "__main__": + main_cli() diff --git a/modules/websubmit/lib/websubmit_file_stamper.py b/modules/websubmit/lib/websubmit_file_stamper.py index bdff87d69..1300566c8 100644 --- a/modules/websubmit/lib/websubmit_file_stamper.py +++ b/modules/websubmit/lib/websubmit_file_stamper.py @@ -1,1526 +1,1526 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """This is websubmit_file_stamper.py This tool is used to create a stamped version of a PDF file. + Python API: Please see stamp_file(). + CLI API: $ python ~invenio/lib/python/invenio/websubmit_file_stamper.py \\ --latex-template=demo-stamp-left.tex \\ --latex-template-var='REPORTNUMBER=TEST-THESIS-2008-019' \\ --latex-template-var='DATE=27/02/2008' \\ --stamp='first' \\ --output-file=testfile_stamped.pdf \\ testfile.pdf """ __revision__ = "$Id$" import getopt, sys, re, os, time, shutil, tempfile -from invenio.config import CFG_PATH_DISTILLER, CFG_PATH_GFILE +from invenio.config import CFG_PATH_PS2PDF, CFG_PATH_GFILE from invenio.errorlib import register_exception from invenio.config import CFG_TMPDIR from invenio.config import CFG_ETCDIR CFG_WEBSUBMIT_FILE_STAMPER_TEMPLATES_DIR = \ "%s/websubmit/file_stamper_templates" % CFG_ETCDIR from invenio.config import CFG_PATH_PDFTK from invenio.config import CFG_PATH_PDF2PS from invenio.shellutils import escape_shell_arg from invenio.websubmit_config import InvenioWebSubmitFileStamperError ## ***** Functions related to the creation of the PDF Stamp file: ***** def copy_template_files_to_stampdir(path_workingdir, latex_template): """In order to stamp a PDF fulltext file, LaTeX is used to create a "stamp" page that is then merged with the fulltext PDF. The stamp page is created in a temporary stamp "working directory". This means that the LaTeX file and its image files must be copied locally into this working directory. This function handles copying them into the working directory. Note: Copying of the LaTeX template and its included image files is fairly naive and assumes that it is a very basic LaTeX file consisting of a main file and any included graphics. No other included file items will be copied. Also note that the order of searching for the LaTeX file and its associated graphics is as follows: + If the templatename provided has a path attached to it, look here first; + If there is no path, look in the current dir. + If there is no template in the current dir, look in ~invenio/etc/websubmit/latex + Images included within the LaTeX file are sought in the same way. Full path is used if provided; if not, current dir and failing that ~invenio/etc/websubmit/latex. @param path_workingdir: (string) - the working directory into which the latex templates should be copied. @param latex_template: (string) - the name of the LaTeX template to copy to the working dir. """ ## Get the "base name" of the latex template: (template_path, template_name) = os.path.split(latex_template) if template_path != "": ## A full path to the template was provided. We look for it there. ## Test to see whether the template is a real file and is readable: if os.access("%s/%s" % (template_path, template_name), os.R_OK): ## Template is readable. Copy it locally to the working directory: try: shutil.copyfile("%s/%s" % (template_path, template_name), \ "%s/%s" % (path_workingdir, template_name)) except IOError: ## Unable to copy the LaTeX template file to the ## working directory: msg = """Error: Unable to copy LaTeX file [%s/%s] to """ \ """working directory for stamping [%s].""" \ % (template_path, template_name, path_workingdir) raise InvenioWebSubmitFileStamperError(msg) else: ## Unable to read the template file: msg = """Error: Unable to copy LaTeX file [%s/%s] to """ \ """working directory for stamping [%s]. (File not """ \ """readable.)""" \ % (template_path, template_name, path_workingdir) raise InvenioWebSubmitFileStamperError(msg) else: ## There is no path to the template file. ## Look for it first in the current working directory, then in ## ~invenio/websubmit/latex; ## If not found in either, give up. if os.access("%s" % (template_name), os.F_OK): ## Template has been found in the current working directory. ## Copy it locally to the stamping working directory: try: shutil.copyfile("%s" % (template_name), \ "%s/%s" % (path_workingdir, template_name)) except IOError: ## Unable to copy the LaTeX template file to the ## working stamping directory: msg = """Error: Unable to copy LaTeX file [%s] to """ \ """working directory for stamping [%s].""" \ % (template_name, path_workingdir) raise InvenioWebSubmitFileStamperError(msg) elif os.access("%s/%s" % (CFG_WEBSUBMIT_FILE_STAMPER_TEMPLATES_DIR, \ template_name), os.F_OK): ## The template has been found in WebSubmit's latex templates ## directory. Copy it locally to the stamping working directory: try: shutil.copyfile("%s/%s" \ % (CFG_WEBSUBMIT_FILE_STAMPER_TEMPLATES_DIR, \ template_name), \ "%s/%s" % (path_workingdir, template_name)) except IOError: ## Unable to copy the LaTeX template file to the ## working stamping directory: msg = """Error: Unable to copy LaTeX file [%s/%s] to """ \ """working directory for stamping [%s].""" \ % (CFG_WEBSUBMIT_FILE_STAMPER_TEMPLATES_DIR, \ template_name, path_workingdir) raise InvenioWebSubmitFileStamperError(msg) else: ## Now that the template has been found, set the "template ## path" to the WebSubmit latex templates directory: template_path = CFG_WEBSUBMIT_FILE_STAMPER_TEMPLATES_DIR else: ## Unable to locate the latex template. msg = """Error: Unable to locate LaTeX file [%s].""" % template_name raise InvenioWebSubmitFileStamperError(msg) ## Now that the LaTeX template file has been copied locally, extract ## the names of graphics files to be included in the resulting ## document and attempt to copy them to the working "stamp" directory: cmd_findgraphics = \ """grep includegraphic %s | """ \ """sed -n 's/^[^{]*{\\([^}]\\{1,\\}\\)}.*$/\\1/p'""" \ % escape_shell_arg("%s/%s" % (path_workingdir, template_name)) fh_findgraphics = os.popen(cmd_findgraphics, "r") graphic_names = fh_findgraphics.readlines() findgraphics_errcode = fh_findgraphics.close() if findgraphics_errcode is not None: ## There was an error involving the grep/sed command. ## Unable to extract the details of any graphics to ## be included: msg = """Unable to stamp file. There was """ \ """a problem when trying to obtain details of images """ \ """included by the LaTeX template.""" raise InvenioWebSubmitFileStamperError(msg) ## Copy each include-graphic extracted from the template ## into the working stamp directory: for graphic in graphic_names: ## Remove any leading/trailing whitespace: graphic = graphic.strip() ## Get the path and "base name" of the included graphic: (graphic_path, graphic_name) = os.path.split(graphic) ## If there is a graphic_name to work with, try copy the file: if graphic_name != "": if graphic_path != "": ## The graphic is included from an absolute path: if os.access("%s/%s" % (graphic_path, graphic_name), os.F_OK): try: shutil.copyfile("%s/%s" % (graphic_path, \ graphic_name), \ "%s/%s" % (path_workingdir, \ graphic_name)) except IOError: ## Unable to copy the LaTeX template file to ## the current directory msg = """Unable to stamp file. There was """ \ """a problem when trying to copy an image """ \ """[%s/%s] included by the LaTeX template""" \ """ [%s].""" \ % (graphic_path, graphic_name, template_name) raise InvenioWebSubmitFileStamperError(msg) else: msg = """Unable to locate an image [%s/%s] included""" \ """ by the LaTeX template file [%s].""" \ % (graphic_path, graphic_name, template_name) raise InvenioWebSubmitFileStamperError(msg) else: ## The graphic is included from a relative path. Try to obtain ## it from the same directory that the latex template file was ## taken from: if template_path != "": ## Since template path is not empty, try to get the images ## from that location: if os.access("%s/%s" % (template_path, graphic_name), \ os.F_OK): try: shutil.copyfile("%s/%s" % (template_path, \ graphic_name), \ "%s/%s" % (path_workingdir, \ graphic_name)) except IOError: ## Unable to copy the LaTeX template file to ## the current directory msg = """Unable to stamp file. There was """ \ """a problem when trying to copy images """ \ """included by the LaTeX template.""" raise InvenioWebSubmitFileStamperError(msg) else: msg = """Unable to locate an image [%s] included""" \ """ by the LaTeX template file [%s].""" \ % (graphic_name, template_name) raise InvenioWebSubmitFileStamperError(msg) else: ## There is no template path. Try to get the images from ## current dir: if os.access("%s" % graphic_name, os.F_OK): try: shutil.copyfile("%s" % graphic_name, \ "%s/%s" % (path_workingdir, \ graphic_name)) except IOError: ## Unable to copy the LaTeX template file to ## the current directory msg = """Unable to stamp file. There was """ \ """a problem when trying to copy images """ \ """included by the LaTeX template.""" raise InvenioWebSubmitFileStamperError(msg) else: msg = """Unable to locate an image [%s] included""" \ """ by the LaTeX template file [%s].""" \ % (graphic_name, template_name) raise InvenioWebSubmitFileStamperError(msg) ## Return the basename of the template so that it can be used to create ## the PDF stamp file: return template_name def create_final_latex_template(working_dirname, \ latex_template, \ latex_template_var): """In the working directory, create a copy of the the orginal latex template with all the possible xxx--xxx in the template replaced with the values identified by the keywords in the latex_template_var dictionary. @param working_dirname: (string) the working directory used for the creation of the PDF stamp file. @latex_template: (string) name of the latex template before it has been parsed for replacements. @latex_template_var: (dict) dictionnary whose keys are the string to replace in latex_template and values are the replacement content @return: name of the final latex template (after replacements) """ ## Regexp used for finding a substitution line in the original template: re_replacement = re.compile("""XXX-(.+?)-XXX""") ## Now, read-in the local copy of the template and parse it line-by-line, ## replacing any ocurrences of "XXX-SEARCHWORD-XXX" with either: ## ## (a) The value from the "replacements" dictionary; ## (b) Nothing if there was no search-term in the dictionary; try: ## Open the original latex template for reading: fpread = open("%s/%s" \ % (working_dirname, latex_template), "r") ## Open a file to contain the "parsed" latex template: fpwrite = open("%s/create%s" \ % (working_dirname, latex_template), "w") for line in fpread.readlines(): ## For each line in the template file, test for ## substitution-markers: replacement_markers = re_replacement.finditer(line) ## For each replacement-pattern detected in this line, process it: for replacement_marker in replacement_markers: ## Found a match: search_term = replacement_marker.group(1) try: ## Get the replacement-term for this match ## from the dictionary replacement_term = latex_template_var[search_term] except KeyError: ## This search-term was not in the list of replacements ## to be made. It should be replaced with an empty string ## in the template: line = line[0:replacement_marker.start()] + \ line[replacement_marker.end():] else: ## Is the replacement term of the form date(XXXX)? If yes, ## take it literally and generate a pythonic date with it: if replacement_term.find("date(") == 0 \ and replacement_term[-1] == ")": ## Take the date format string, use it to ## generate today's date date_format = replacement_term[5:-1].strip('\'"') try: replacement = time.strftime(date_format, \ time.localtime()) except TypeError: ## Bad date format replacement = "" elif replacement_term.find("include(") == 0 \ and replacement_term[-1] == ")": ## Replacement term is a directive to include a file ## in the LaTeX template: replacement = replacement_term[8:-1].strip('\'"') else: ## Take replacement_term as a literal string ## to be inserted into the template at this point. replacement = replacement_term ## Now substitute replacement into the line of the template: line = line[0:replacement_marker.start()] + replacement \ + line[replacement_marker.end():] ## Write the modified line to the new template: fpwrite.write(line) fpwrite.flush() ## Close up the template files and unlink the original: fpread.close() fpwrite.close() except IOError: msg = "Unable to read LaTeX template [%s/%s]. Cannot Stamp File" \ % (working_dirname, latex_template) raise InvenioWebSubmitFileStamperError(msg) ## Return the name of the LaTeX template to be used: return "create%s" % latex_template def escape_latex_meta_characters(text): """The following are LaTeX meta characters that must be escaped with a backslash: # $ % & _ { } This function therefore takes a string as input and does a simple replace of these characters with escaped versions. @param text: (string) - the string to be escaped. @return: (string) - the string in which the LaTeX meta characters have been escaped. """ text = text.replace('#', '\#') text = text.replace('$', '\$') text = text.replace('%', '\%') text = text.replace('&', '\&') text = text.replace('_', '\_') text = text.replace('{', '\{') text = text.replace('}', '\}') return text def escape_latex_template_vars(template_vars, strict=False): """Take a dictionary of LaTeX template variables/values and escape LaTeX meta characters in some of them, or all of them depending upon whether a call is made in strict mode (if strict is set, ALL values are escaped.) Operating in non-strict mode, the rules for escaping are as follows: * If the string does not contain $ { or }, it must be escaped. * If the string contains $, then there must be an even number of these. If the count is even, do not escape. Else, escape. * If the string contains { or }, it must be balanced with a counterpart. That's to say that the count of "{" must match the count of "}". If it does, do not escape. Else, escape. @param template_vars: (dictionary) - the LaTeX template variables and their values. @param strict: (boolean) - a flag indicating whether or not to operate in strict mode. Strict mode means that all values are escaped regardless of whether or not they are considered to be "good" LaTeX. @return: (dictionary) - the LaTeX template variables with their values escaped. """ ## Make a copy of the LaTeX template variables so as not to corrupt ## the original: working_template_vars = template_vars.copy() ## ## For each of the variables, escape LaTeX meta characteras in the ## value according to the strict flag: varnames = working_template_vars.keys() for varname in varnames: escape_value = False varval = working_template_vars[varname] ## We don't want to escape values that are date or include directives ## so unfortunately, this if is needed here: if (varval.find("date(") == 0 or varval.find("include(") == 0) and \ varval[-1] == ")": ## This is a date or include directive: continue ## Count the number of "$", "{" and "}" in it. If any are present, ## they should be balanced. If so, we will assume that they are ## wanted and that the LaTeX in the string is good. ## If, however, they are not balanced, we will assume that they are ## not valid LaTeX commands and that the string should be escaped. ## If they are not present at all, we assume that the string should ## be escaped. if "$" in varval and varval.count("$") % 2 != 0: ## $ is present, but not in an even number. This string must ## be escaped: escape_value = True elif "{" in varval or "}" in varval: ## "{" and/or "}" is in the value string. Count each of them. ## If they are not matched one to one, consider the string to be ## in need of escaping: if varval.count("{") != varval.count("}"): escape_value = True elif "$" not in varval and "{" not in varval and "}" not in varval: ## Since none of $ { } are in the string, it should be escaped ## to be safe: escape_value = True ## if strict: ## If operating in strict mode, escape everything whatever the ## results of the above tests: escape_value = True ## If the value is to be escaped, go ahead and do so: if escape_value: escaped_varval = escape_latex_meta_characters(varval) working_template_vars[varname] = escaped_varval ## Return the "escaped" LaTeX template variables: return working_template_vars def create_pdf_stamp(path_workingdir, latex_template, latex_template_var): """Retrieve the LaTeX (and associated) files and use them to create a PDF "Stamp" file that can be merged with the main file. The PDF stamp is created in a temporary working directory. @param path_workingdir: (string) the path to the working directory that should be used for creating the PDF stamp file. @param latex_template: (string) - the name of the latex template to be used for the creation of the stamp. @param latex_template_var: (dictionary) - key-value pairs of strings to be sought and replaced within the latex template. @return: (string) - the name of the PDF stamp file. """ ## Copy the LaTeX (and helper) files should be copied into the working dir: template_name = copy_template_files_to_stampdir(path_workingdir, \ latex_template) ## #### ## Make a first attempt at the template PDF creation, escaping the variables ## in non-strict mode: escaped_latex_template_var = escape_latex_template_vars(latex_template_var) ## Now that the latex template and its helper files have been retrieved, ## the Stamp PDF can be created. final_template = create_final_latex_template(path_workingdir, \ template_name, \ escaped_latex_template_var) ## ## The name that will be givem to the PDF stamp file: pdf_stamp_name = "%s.pdf" % os.path.splitext(final_template)[0] ## Now, build the Stamp PDF from the LaTeX template: cmd_latex = """cd %(workingdir)s; /usr/bin/pdflatex """ \ """-interaction=batchmode """ \ """%(template-path)s > /dev/null 2>&1""" \ % { 'template-path' : escape_shell_arg("%s/%s" \ % (path_workingdir, final_template)), 'workingdir' : path_workingdir, } ## Log the latex command os.system("""echo %s > %s""" % (escape_shell_arg(cmd_latex), \ escape_shell_arg("%s/latex_cmd_first_try" \ % path_workingdir))) ## Run the latex command errcode_latex = os.system("%s" % cmd_latex) ## Was the PDF stamp file successfully created without error? if errcode_latex: ## No it wasn't. Perhaps there was a problem with some of the variable ## values that we substituted into the template? ## To be certain, try to create the PDF one more time - this time ## escaping all of the variable values. ## ## Unlink the PDF file if one was created on the previous attempt: if os.access("%s/%s" % (path_workingdir, pdf_stamp_name), os.F_OK): try: os.unlink("%s/%s" % (path_workingdir, pdf_stamp_name)) except OSError: ## Unable to unlink the PDF file. err_msg = "Unable to unlink the PDF stamp file [%s]. " \ "Stamping has failed." \ % pdf_stamp_name register_exception(prefix=err_msg) raise InvenioWebSubmitFileStamperError(err_msg) ## ## Unlink the LaTeX template file that was created with the previously ## escaped variables: if os.access("%s/%s" % (path_workingdir, final_template), os.F_OK): try: os.unlink("%s/%s" % (path_workingdir, final_template)) except OSError: ## Unable to unlink the LaTeX file. err_msg = "Unable to unlink the LaTeX stamp template file " \ "[%s]. Stamping has failed." \ % final_template register_exception(prefix=err_msg) raise InvenioWebSubmitFileStamperError(err_msg) ## #### ## Make another attempt at the template PDF creation, this time escaping ## the variables in strict mode: escaped_latex_template_var = \ escape_latex_template_vars(latex_template_var, strict=True) ## Now that the latex template and its helper files have been retrieved, ## the Stamp PDF can be created. final_template = create_final_latex_template(path_workingdir, \ template_name, \ escaped_latex_template_var) ## ## The name that will be givem to the PDF stamp file: pdf_stamp_name = "%s.pdf" % os.path.splitext(final_template)[0] ## Now, build the Stamp PDF from the LaTeX template: cmd_latex = """cd %(workingdir)s; /usr/bin/pdflatex """ \ """-interaction=batchmode """ \ """%(template-path)s > /dev/null 2>&1""" \ % { 'template-path' : escape_shell_arg("%s/%s" \ % (path_workingdir, final_template)), 'workingdir' : path_workingdir, } ## Log the latex command os.system("""echo %s > %s""" \ % (escape_shell_arg(cmd_latex), \ escape_shell_arg("%s/latex_cmd_second_try" \ % path_workingdir))) ## Run the latex command errcode_latex = os.system("%s" % cmd_latex) ## Was the PDF stamp file successfully created? if errcode_latex or \ not os.access("%s/%s" % (path_workingdir, pdf_stamp_name), os.F_OK): ## It was not possible to create the PDF stamp file. Fail. msg = """Error: Unable to create a PDF stamp file.""" raise InvenioWebSubmitFileStamperError(msg) ## Return the name of the PDF stamp file: return pdf_stamp_name ## ***** Functions related to the actual stamping of the file: ***** def apply_stamp_cover_page(path_workingdir, \ stamp_file_name, \ subject_file, \ output_file): """Carry out the stamping: This function adds a cover-page to the file. @param path_workingdir: (string) - the path to the working directory that contains all of the files needed for the stamping process to be carried out. @param stamp_file_name: (string) - the name of the PDF stamp file (i.e. the cover-page itself). @param subject_file: (string) - the name of the file to be stamped. @param output_file: (string) - the name of the final "stamped" file (i.e. that with the cover page added) that will be written in the working directory after the function has ended. """ ## Build the stamping command: cmd_add_cover_page = \ """%(pdftk)s %(cover-page-path)s """ \ """%(file-to-stamp-path)s """ \ """cat output %(stamped-file-path)s """ \ """2>/dev/null"""% \ { 'pdftk' : CFG_PATH_PDFTK, 'cover-page-path' : escape_shell_arg("%s/%s" \ % (path_workingdir, \ stamp_file_name)), 'file-to-stamp-path' : escape_shell_arg("%s/%s" \ % (path_workingdir, \ subject_file)), 'stamped-file-path' : escape_shell_arg("%s/%s" \ % (path_workingdir, \ output_file)), } ## Execute the stamping command: errcode_add_cover_page = os.system(cmd_add_cover_page) ## Was the PDF merged with the coverpage without error? if errcode_add_cover_page: ## There was a problem: msg = "Error: Unable to stamp file [%s/%s]. There was an error when " \ "trying to add the cover page [%s/%s] to the file. Stamping " \ "has failed." \ % (path_workingdir, \ subject_file, \ path_workingdir, \ stamp_file_name) raise InvenioWebSubmitFileStamperError(msg) def apply_stamp_first_page(path_workingdir, \ stamp_file_name, \ subject_file, \ output_file): """Carry out the stamping: This function adds a stamp to the first page of the file. @param path_workingdir: (string) - the path to the working directory that contains all of the files needed for the stamping process to be carried out. @param stamp_file_name: (string) - the name of the PDF stamp file (i.e. the stamp itself). @param subject_file: (string) - the name of the file to be stamped. @param output_file: (string) - the name of the final "stamped" file that will be written in the working directory after the function has ended. """ ## Since only the first page of the subject file is to be stamped, ## it's safest to separate this into its own temporary file, stamp ## it, then re-merge it with the remaining pages of the original ## document. In this way, the PDF to be stamped will probably be ## simpler (pages with complex figures and tables will probably be ## avoided) and the process will hopefully have a smaller chance of ## failure. ## ## First of all, separate the first page of the subject file into a ## temporary document: ## ## Name to be given to the first page of the document: output_file_first_page = "p1-%s" % output_file ## Name to be given to the first page of the document once it has ## been stamped: stamped_output_file_first_page = "stamped-%s" % output_file_first_page ## Perform the separation: cmd_get_first_page = \ "%(pdftk)s A=%(file-to-stamp-path)s " \ "cat A1 output %(first-page-path)s " \ "2>/dev/null" \ % { 'pdftk' : CFG_PATH_PDFTK, 'file-to-stamp-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, subject_file)), 'first-page-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ output_file_first_page)), } errcode_get_first_page = os.system(cmd_get_first_page) ## Check that the separation was successful: if errcode_get_first_page or \ not os.access("%s/%s" % (path_workingdir, \ output_file_first_page), os.F_OK): ## Separation was unsuccessful. Fail. msg = "Error: Unable to stamp file [%s/%s] - it wasn't possible to " \ "separate the first page from the rest of the document. " \ "Stamping has failed." \ % (path_workingdir, subject_file) raise InvenioWebSubmitFileStamperError(msg) ## Now stamp the first page: cmd_stamp_first_page = \ "%(pdftk)s %(first-page-path)s background " \ "%(stamp-file-path)s output " \ "%(stamped-first-page-path)s 2>/dev/null" \ % { 'pdftk' : CFG_PATH_PDFTK, 'first-page-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ output_file_first_page)), 'stamp-file-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ stamp_file_name)), 'stamped-first-page-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ stamped_output_file_first_page)), } errcode_stamp_first_page = os.system(cmd_stamp_first_page) ## Check that the first page was stamped successfully: if errcode_stamp_first_page or \ not os.access("%s/%s" % (path_workingdir, \ stamped_output_file_first_page), os.F_OK): ## Unable to stamp the first page. Fail. msg = "Error: Unable to stamp the file [%s/%s] - it was not possible " \ "to add the stamp to the first page. Stamping has failed." \ % (path_workingdir, subject_file) raise InvenioWebSubmitFileStamperError(msg) ## Now that the first page has been stamped successfully, merge it with ## the remaining pages of the original file: cmd_merge_stamped_and_original_files = \ "%(pdftk)s A=%(stamped-first-page-path)s " \ "B=%(original-file-path)s cat A1 B2-end output " \ "%(stamped-file-path)s 2>/dev/null" \ % { 'pdftk' : CFG_PATH_PDFTK, 'stamped-first-page-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ stamped_output_file_first_page)), 'original-file-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ subject_file)), 'stamped-file-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ output_file)), } errcode_merge_stamped_and_original_files = \ os.system(cmd_merge_stamped_and_original_files) ## Check to see whether the command exited with an error: if errcode_merge_stamped_and_original_files: ## There was an error when trying to merge the stamped first-page ## with pages 2 onwards of the original file. One possible ## explanation for this could be that the original file only had ## one page (in which case trying to reference pages 2-end would ## cause an error because they don't exist. ## ## Try to get the number of pages in the original PDF. If it only ## has 1 page, the stamped first page file can become the final ## stamped PDF. If it has more than 1 page, there really was an ## error when merging the stamped first page with the rest of the ## pages and stamping can be considered to have failed. cmd_find_number_pages = \ """%(pdftk)s %(original-file-path)s dump_data | """ \ """grep NumberOfPages | """ \ """sed -n 's/^NumberOfPages: \\([0-9]\\{1,\\}\\)$/\\1/p'""" \ % { 'pdftk' : CFG_PATH_PDFTK, 'original-file-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ subject_file)), } fh_find_number_pages = os.popen(cmd_find_number_pages, "r") match_number_pages = fh_find_number_pages.read() errcode_find_number_pages = fh_find_number_pages.close() if errcode_find_number_pages is not None: ## There was an error while checking for the number of pages. ## Fail. msg = "Error: Unable to stamp file [%s/%s]. There was an error " \ "when attempting to merge the file containing the " \ "first page of the stamped file with the remaining " \ "pages of the original file and when an attempt was " \ "made to count the number of pages in the file, an " \ "error was also encountered. Stamping has failed." \ % (path_workingdir, subject_file) raise InvenioWebSubmitFileStamperError(msg) else: try: number_pages_in_subject_file = int(match_number_pages) except ValueError: ## Unable to get the number of pages in the original file. ## Fail. msg = "Error: Unable to stamp file [%s/%s]. There was an " \ "error when attempting to merge the file containing the" \ " first page of the stamped file with the remaining " \ "pages of the original file and when an attempt was " \ "made to count the number of pages in the file, an " \ "error was also encountered. Stamping has failed." \ % (path_workingdir, subject_file) raise InvenioWebSubmitFileStamperError(msg) else: ## Do we have just one page? if number_pages_in_subject_file == 1: ## There was only one page in the subject file. ## copy the version that was stamped on the first page to ## the output_file filename: try: shutil.copyfile("%s/%s" \ % (path_workingdir, \ stamped_output_file_first_page), \ "%s/%s" \ % (path_workingdir, output_file)) except IOError: ## Unable to copy the file that was stamped on page 1 ## Stamping has failed. msg = "Error: It was not possible to copy the " \ "temporary file that was stamped on the " \ "first page [%s/%s] to the final stamped " \ "file [%s/%s]. Stamping has failed." \ % (path_workingdir, \ stamped_output_file_first_page, \ path_workingdir, \ output_file) raise InvenioWebSubmitFileStamperError(msg) else: ## Despite the fact that there was NOT only one page ## in the original file, there was an error when trying ## to merge it with the file that was stamped on the ## first page. Fail. msg = "Error: Unable to stamp file [%s/%s]. There " \ "was an error when attempting to merge the " \ "file containing the first page of the " \ "stamped file with the remaining pages of the " \ "original file. Stamping has failed." \ % (path_workingdir, subject_file) raise InvenioWebSubmitFileStamperError(msg) elif not os.access("%s/%s" % (path_workingdir, output_file), os.F_OK): ## A final version of the stamped file was NOT created even though ## no error signal was encountered during the merging process. ## Fail. msg = "Error: Unable to stamp file [%s/%s]. When attempting to " \ "merge the file containing the first page of the stamped " \ "file with the remaining pages of the original file, no " \ "final file was created. Stamping has failed." \ % (path_workingdir, subject_file) raise InvenioWebSubmitFileStamperError(msg) def apply_stamp_all_pages(path_workingdir, \ stamp_file_name, \ subject_file, \ output_file): """Carry out the stamping: This function adds a stamp to all pages of the file. @param path_workingdir: (string) - the path to the working directory that contains all of the files needed for the stamping process to be carried out. @param stamp_file_name: (string) - the name of the PDF stamp file (i.e. the stamp itself). @param subject_file: (string) - the name of the file to be stamped. @param output_file: (string) - the name of the final "stamped" file that will be written in the working directory after the function has ended. """ cmd_stamp_all_pages = \ "%(pdftk)s %(file-to-stamp-path)s background " \ "%(stamp-file-path)s output " \ "%(stamped-file-all-pages-path)s 2>/dev/null" \ % { 'pdftk' : CFG_PATH_PDFTK, 'file-to-stamp-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ subject_file)), 'stamp-file-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ stamp_file_name)), 'stamped-file-all-pages-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ output_file)), } errcode_stamp_all_pages = os.system(cmd_stamp_all_pages) if errcode_stamp_all_pages or \ not os.access("%s/%s" % (path_workingdir, output_file), os.F_OK): ## There was a problem stamping the document. Fail. msg = "Error: Unable to stamp file [%s/%s]. Stamping has failed." \ % (path_workingdir, subject_file) raise InvenioWebSubmitFileStamperError(msg) def apply_stamp_to_file(path_workingdir, stamp_type, stamp_file_name, subject_file, output_file): """Given a stamp-file, the details of the type of stamp to apply, and the details of the file to be stamped, coordinate the process of having that stamp applied to the file. @param path_workingdir: (string) - the path to the working directory that contains all of the files needed for the stamping process to be carried out. @param stamp_type: (string) - the type of stamp to be applied to the file. @param stamp_file_name: (string) - the name of the PDF stamp file (i.e. the stamp itself). @param subject_file: (string) - the name of the file to be stamped. @param output_file: (string) - the name of the final "stamped" file that will be written in the working directory after the function has ended. @return: (string) - the name of the stamped file that has been created. It will be found in the stamping working directory. """ ## Stamping is performed on PDF files. We therefore need to test for the ## type of the subject file before attempting to stamp it: ## ## Initialize a variable to hold the "file type" of the subject file: subject_filetype = "" ## Using the file command, test for the file-type of "subject_file": cmd_gfile = "%(gfile)s %(file-to-stamp-path)s 2> /dev/null" \ % { 'gfile' : CFG_PATH_GFILE, 'file-to-stamp-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ subject_file)), } ## Execute the file command: fh_gfile = os.popen(cmd_gfile, "r") ## Read the results string output by gfile: output_gfile = fh_gfile.read() ## Close the pipe and capture its error code: errcode_gfile = fh_gfile.close() ## If a result was obtained from gfile, scan it for an acceptable file-type: if errcode_gfile is None and output_gfile != "": output_gfile = output_gfile.lower() if "pdf document" in output_gfile: ## This is a PDF file. subject_filetype = "pdf" elif "postscript" in output_gfile: ## This is a PostScript file. subject_filetype = "ps" ## Unable to determine the file type using gfile. ## Try to determine the file type by examining its extension: if subject_filetype == "": ## split the name of the file to be stamped on "." and take the last ## part of it. This should be the "extension". tmp_file_extension = subject_file.split(".")[-1] if tmp_file_extension.lower() == "pdf": subject_filetype = "pdf" elif tmp_file_extension.lower() == "ps": subject_filetype = "ps" if subject_filetype not in ("ps", "pdf"): ## unable to process file. msg = """Error: Input file [%s] is not PDF or PS. - unable to """ \ """perform stamping.""" % subject_file raise InvenioWebSubmitFileStamperError(msg) if subject_filetype == "ps": ## Convert the subject file from PostScript to PDF: if subject_file[-3:].lower() == ".ps": ## The name of the file to be stamped has a PostScript extension. ## Strip it and give the name of the PDF file to be created a ## PDF extension: created_pdfname = "%s.pdf" % subject_file[:-3] elif len(subject_file.split(".")) > 1: ## The file name has an extension - strip it and add a PDF ## extension: raw_name = subject_file[:subject_file.rfind(".")] if raw_name != "": created_pdfname = "%s.pdf" % raw_name else: ## It would appear that the file had no extension and that its ## name started with a period. Just use the original name with ## a .pdf suffix: created_pdfname = "%s.pdf" % subject_file else: ## No extension - use the original name with a .pdf suffix: created_pdfname = "%s.pdf" % subject_file ## Build the distilling command: cmd_distill = """%(distiller)s %(ps-file-path)s """ \ """%(pdf-file-path)s 2>/dev/null""" % \ - { 'distiller' : CFG_PATH_DISTILLER, + { 'distiller' : CFG_PATH_PS2PDF, 'ps-file-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ subject_file)), 'pdf-file-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ created_pdfname)), } ## Distill the PS into a PDF: errcode_distill = os.system(cmd_distill) ## Test to see whether the PS was distilled into a PDF without error: if errcode_distill or \ not os.access("%s/%s" % (path_workingdir, created_pdfname), os.F_OK): ## The PDF file was not correctly created in the working directory. ## Unable to continue with the stamping process. msg = "Error: Unable to correctly convert PostScript file [%s] to" \ " PDF. Cannot stamp file." % subject_file raise InvenioWebSubmitFileStamperError(msg) ## Now assign the name of the created PDF file to subject_file: subject_file = created_pdfname ## Treat the name of "output_file": if output_file in (None, ""): ## there is no value for outputfile. outfile should be given the same ## name as subject_file, but with "stamped-" appended to the front. ## E.g.: subject_file: test.pdf; outfile: stamped-test.pdf output_file = "stamped-%s" % subject_file else: ## If output_file has an extension, strip it and add a PDF extension: if len(output_file.split(".")) > 1: ## The file name has an extension - strip it and add a PDF ## extension: raw_name = output_file[:output_file.rfind(".")] if raw_name != "": output_file = "%s.pdf" % raw_name else: ## It would appear that the file had no extension and that its ## name started with a period. Just use the original name with ## a .pdf suffix: output_file = "%s.pdf" % output_file else: ## No extension - use the original name with a .pdf suffix: output_file = "%s.pdf" % output_file if stamp_type == 'coverpage': ## The stamp to be applied to the document is in fact a "cover page". ## This means that the entire PDF "stamp" that was created from the ## LaTeX template is to be appended to the subject file as the first ## page (i.e. a cover-page). apply_stamp_cover_page(path_workingdir, \ stamp_file_name, \ subject_file, \ output_file) elif stamp_type == "first": apply_stamp_first_page(path_workingdir, \ stamp_file_name, \ subject_file, \ output_file) elif stamp_type == 'all': ## The stamp to be applied to the document is a simple that that should ## be applied to ALL pages of the document (i.e. merged onto each page.) apply_stamp_all_pages(path_workingdir, \ stamp_file_name, \ subject_file, \ output_file) else: ## Unexpcted stamping mode. msg = """Error: Unexpected stamping mode [%s]. Stamping has failed.""" \ % stamp_type raise InvenioWebSubmitFileStamperError(msg) ## Finally, if the original subject file was a PS, convert the stamped ## PDF back to PS: if subject_filetype == "ps": if output_file[-4:].lower() == ".pdf": ## The name of the file to be stamped has a PDF extension. ## Strip it and give the name of the PDF file to be created a ## PDF extension: stamped_psname = "%s.ps" % output_file[:-4] elif len(output_file.split(".")) > 1: ## The file name has an extension - strip it and add a PDF ## extension: raw_name = output_file[:output_file.rfind(".")] if raw_name != "": stamped_psname = "%s.ps" % raw_name else: ## It would appear that the file had no extension and that its ## name started with a period. Just use the original name with ## a .pdf suffix: stamped_psname = "%s.ps" % output_file else: ## No extension - use the original name with a .pdf suffix: stamped_psname = "%s.ps" % output_file ## Build the conversion command: cmd_pdf2ps = "%s %s %s 2>/dev/null" % (CFG_PATH_PDF2PS, escape_shell_arg("%s/%s" % \ (path_workingdir, \ output_file)), escape_shell_arg("%s/%s" % \ (path_workingdir, \ stamped_psname))) errcode_pdf2ps = os.system(cmd_pdf2ps) ## Check to see that the command executed OK: if not errcode_pdf2ps and \ os.access("%s/%s" % (path_workingdir, stamped_psname), os.F_OK): ## No problem converting the PDF to PS. output_file = stamped_psname ## Return the name of the "stamped" file: return output_file def copy_subject_file_to_working_directory(path_workingdir, input_file): """Attempt to copy the subject file (that which is to be stamped) to the current working directory, returning the name of the subject file if successful. @param path_workingdir: (string) - the path to the working directory for the current stamping session. @param input_file: (string) - the path to the subject file (that which is to be stamped). @return: (string) - the name of the subject file, which has been copied to the current working directory. @Exceptions raised: (InvenioWebSubmitFileStamperError) - upon failure to successfully copy the subject file to the working directory. """ ## Divide the input filename into path and basename: (dummy, name_input_file) = os.path.split(input_file) if name_input_file == "": ## The input file is just a path - not a valid filename. Fail. msg = """Error: unable to determine the name of the file to be """ \ """stamped.""" raise InvenioWebSubmitFileStamperError(msg) ## Test to see whether the stamping subject file is a real file and ## is readable: if os.access("%s" % input_file, os.R_OK): ## File is readable. Copy it locally to the working directory: try: shutil.copyfile("%s" % input_file, \ "%s/%s" % (path_workingdir, name_input_file)) except IOError: ## Unable to copy the stamping subject file to the ## working directory. Fail. msg = """Error: Unable to copy stamping file [%s] to """ \ """working directory for stamping [%s].""" \ % (input_file, path_workingdir) raise InvenioWebSubmitFileStamperError(msg) else: ## Unable to read the subject file. Fail. msg = """Error: Unable to copy stamping file [%s] to """ \ """working directory [%s]. (File not readable.)""" \ % (input_file, path_workingdir) raise InvenioWebSubmitFileStamperError(msg) ## Now that the stamping file has been successfully copied to the working ## directory, return its base name: return name_input_file def create_working_directory(): """Create a "working directory" in which the files related to the stamping process can be stored, and return the full path to it. The working directory will be created in ~invenio/var/tmp. If it cannot be created there, an exception (InvenioWebSubmitFileStamperError) will be raised. The working directory will have the prefix "websubmit_file_stamper_", and could be given a name something like: - websubmit_file_stamper_Tzs3St @return: (string) - the full path to the working directory. @Exceptions raised: InvenioWebSubmitFileStamperError. """ ## Create the temporary directory in which to place the LaTeX template ## and its helper files in ~invenio/var/tmp: path_workingdir = None try: path_workingdir = tempfile.mkdtemp(prefix="websubmit_file_stamper_", \ dir="%s" % CFG_TMPDIR) except OSError, err: ## Unable to create the temporary directory in ~invenio/var/tmp msg = "Error: Unable to create a temporary working directory in " \ "which to carry out the stamping process. An attempt was made " \ "to create the directory in [%s]; the error encountered was " \ "<%s>. Stamping has failed." % (CFG_TMPDIR, str(err)) raise InvenioWebSubmitFileStamperError(msg) ## return the path to the working-directory: return path_workingdir ## ***** Functions Specific to CLI calling of the program: ***** def usage(wmsg="", err_code=0): """Print a "usage" message (along with an optional additional warning/error message) to stderr and exit with a given error code. @param wmsg: (string) - some kind of warning message for the user. @param err_code: (integer) - an error code to be passed to sys.exit, which is called after the usage message has been printed. @return: None. """ ## Wash the warning message: if wmsg != "": wmsg = wmsg.strip() + "\n" ## The usage message: msg = """ Usage: python ~invenio/lib/python/invenio/websubmit_file_stamper.py \\ [options] input-file.pdf websubmit_file_stamper.py is used to add a "stamp" to a PDF file. A LaTeX template is used to create the stamp and this stamp is then concatenated with the original PDF file. The stamp can take the form of either a separate "cover page" that is appended to the document; or a "mark" that is applied somewhere either on the document's first page or on all of its pages. Options: -h, --help Print this help. -V, --version Print version information. -v, --verbose=LEVEL Verbose level (0=min, 1=default, 9=max). [NOT IMPLEMENTED] -t, --latex-template=PATH Path to the LaTeX template file that should be used for the creation of the PDF stamp. (Note, if it's just a basename, it will be sought first in the current working directory, and then in the invenio file-stamper templates directory; If there is a qualifying path to the template name, it will be sought only in that location); -c, --latex-template-var='VARNAME=VALUE' A variable that should be replaced in the LaTeX template file with its corresponding value. Of the following format: VARNAME=VALUE This option is repeatable - one for each template variable; -s, --stamp=STAMP-TYPE The type of stamp to be applied to the subject file. Must be one of 3 values: + "first" - stamp only the first page; + "all" - stamp all pages; + "coverpage" - add a cover page to the document; The default value is "first"; -o, --output-file=XYZ The optional name to be given to the finished (stamped) file. If this is omitted, the stamped file will be given the same name as the input file, but will be prefixed by "stamped-"; Example: python ~invenio/lib/python/invenio/websubmit_file_stamper.py \\ --latex-template=demo-stamp-left.tex \\ --latex-template-var='REPORTNUMBER=TEST-THESIS-2008-019' \\ --latex-template-var='DATE=27/02/2008' \\ --stamp='first' \\ --output-file=testfile_stamped.pdf \\ testfile.pdf """ sys.stderr.write(wmsg + msg) sys.exit(err_code) def get_cli_options(): """From the options and arguments supplied by the user via the CLI, build a dictionary of options to drive websubmit-file-stamper. For reference, the CLI options available to the user are as follows: -h, --help -> Display help/usage message and exit; -V, --version -> Display version information and exit; -v, --verbose= -> Set verbosity level (0=min, 1=default, 9=max). -t, --latex-template= -> Path to the LaTeX template file that should be used for the creation of the PDF stamp. (Note, if it's just a basename, it will be sought first in the current working directory, and then in the invenio file-stamper templates directory; If there is a qualifying path to the template name, it will be sought only in that location); -c, --latex-template-var= -> A variable that should be replaced in the LaTeX template file with its corresponding value. Of the following format: varname=value This option is repeatable - one for each template variable; -s, --stamp= The type of stamp to be applied to the subject file. Must be one of 3 values: + "first" - stamp only the first page; + "all" - stamp all pages; + "coverpage" - add a cover page to the document; The default value is "first"; -o, --output-file= -> The optional name to be given to the finished (stamped) file. If this is omitted, the stamped file will be given the same name as the input file, but will be prefixed by "stamped-"; @return: (dictionary) of input options and flags, set as appropriate. The dictionary has the following structure: + latex-template: (string) - the path to the LaTeX template to be used for the creation of the stamp itself; + latex-template-var: (dictionary) - This dictionary contains variables that should be sought in the LaTeX template file, and the values that should be substituted in their place. E.g.: { "TITLE" : "An Introduction to CDS Invenio" } + input-file: (string) - the path to the input file (i.e. that which is to be stamped; + output-file: (string) - the name of the stamped file that should be created by the program. This is optional - if not provided, a default name will be applied to a file instead; + stamp: (string) - the type of stamp that is to be applied to the input file. It must take one of 3 values: - "first": Stamp only the first page of the document; - "all": Apply the stamp to all pages of the document; - "coverpage": Add a "cover page" to the document; + verbosity: (integer) - the verbosity level under which the program is to run; So, an example of the returned dictionary would be something like: { 'latex-template' : "demo-stamp-left.tex", 'latex-template-var' : { "REPORTNUMBER" : "TEST-2008-001", "DATE" : "15/02/2008", }, 'input-file' : "test-doc.pdf", 'output-file' : "", 'stamp' : "first", 'verbosity' : 0, } """ ## dictionary of important values relating to cli call of program: options = { 'latex-template' : "", 'latex-template-var' : {}, 'input-file' : "", 'output-file' : "", 'stamp' : "first", 'verbosity' : 0, } ## Get the options and arguments provided by the user via the CLI: try: myoptions, myargs = getopt.getopt(sys.argv[1:], "hVv:t:c:s:o:", \ ["help", "version", "verbosity=", "latex-template=", "latex-template-var=", "stamp=", "output-file="]) except getopt.GetoptError, err: ## Invalid option provided - usage message usage(wmsg="Error: %(msg)s." % { 'msg' : str(err) }) ## Get the input file from the arguments list (it should be the ## first argument): if len(myargs) > 0: options["input-file"] = myargs[0] ## Extract the details of the options: for opt in myoptions: if opt[0] in ("-V","--version"): ## version message and exit sys.stdout.write("%s\n" % __revision__) sys.stdout.flush() sys.exit(0) elif opt[0] in ("-h","--help"): ## help message and exit usage() elif opt[0] in ("-v", "--verbosity"): ## Get verbosity level: if not opt[1].isdigit(): options['verbosity'] = 0 elif int(opt[1]) not in xrange(0, 10): options['verbosity'] = 0 else: options['verbosity'] = int(opt[1]) elif opt[0] in ("-o", "--output-file"): ## Get the name of the "output file" that is to be created after ## stamping (i.e. the "stamped file"): options["output-file"] = opt[1] elif opt[0] in ("-t", "--latex-template"): ## Get the path to the latex template to be used for the creation ## of the stamp file: options["latex-template"] = opt[1] elif opt[0] in ("-m", "--stamp"): ## The type of stamp that is to be applied to the document: ## Options are coverpage, first, all: if str(opt[1].lower()) in ("coverpage", "first", "all"): ## Valid stamp type, accept it; options["stamp"] = str(opt[1]).lower() else: ## Invalid stamp type. Print usage message and quit. usage() elif opt[0] in ("-c", "--latex-template-var"): ## This is a variable to be replaced in the LaTeX template. ## It should take the following form: ## varname=value ## We can therefore split it on the first "=" sign - anything to ## left will be considered to be the name of the variable to search ## for; anything to the right will be considered as the value that ## should replace the variable in the LaTeX template. ## Note: If the user supplies the same variable name more than once, ## the latter occurrence will be kept and the previous value will be ## overwritten. ## Note also that if the variable string does not take the ## expected format a=b, it will be ignored. ## ## Get the complete string: varstring = str(opt[1]) ## Split into 2 string based on the first "=": split_varstring = varstring.split("=", 1) if len(split_varstring) == 2: ## Split based on equals sign was successful: if split_varstring[0] != "": ## The variable name was not empty - keep it: options["latex-template-var"]["%s" % split_varstring[0]] = \ "%s" % split_varstring[1] ## Return the input options: return options def stamp_file(options): """The driver for the stamping process. This is effectively the function that is responsible for coordinating the stamping of a file. @param options: (dictionary) - a dictionary of options that are required by the function in order to carry out the stamping process. The dictionary must have the following structure: + latex-template: (string) - the path to the LaTeX template to be used for the creation of the stamp itself; + latex-template-var: (dictionary) - This dictionary contains variables that should be sought in the LaTeX template file, and the values that should be substituted in their place. E.g.: { "TITLE" : "An Introduction to CDS Invenio" } + input-file: (string) - the path to the input file (i.e. that which is to be stamped; + output-file: (string) - the name of the stamped file that should be created by the program. This is optional - if not provided, a default name will be applied to a file instead; + stamp: (string) - the type of stamp that is to be applied to the input file. It must take one of 3 values: - "first": Stamp only the first page of the document; - "all": Apply the stamp to all pages of the document; - "coverpage": Add a "cover page" to the document; + verbosity: (integer) - the verbosity level under which the program is to run; So, an example of the returned dictionary would be something like: { 'latex-template' : "demo-stamp-left.tex", 'latex-template-var' : { "REPORTNUMBER" : "TEST-2008-001", "DATE" : "15/02/2008", }, 'input-file' : "test-doc.pdf", 'output-file' : "", 'stamp' : "first", 'verbosity' : 0, } @return: (tuple) - consisting of two strings: 1. the path to the working directory in which all stamping-related files are stored; 2. The name of the "stamped" file; @Exceptions raised: (InvenioWebSubmitFileStamperError) exceptions may be raised or propagated by this function when the stamping process fails for one reason or another. """ ## SANITY CHECKS: ## Does the options dictionary contain all expected keys? ## ## A list of the names of the expected options: expected_option_names = ["latex-template", \ "latex-template-var", \ "input-file", \ "output-file", \ "stamp", \ "verbosity"] expected_option_names.sort() ## A list of the option names that have been received: received_option_names = options.keys() received_option_names.sort() if expected_option_names != received_option_names: ## Error: he dictionary of options had an illegal structure: msg = """Error: Unexpected value received for "options" parameter.""" raise TypeError(msg) ## Do we have an input file to work on? if options["input-file"] in (None, ""): ## No input file - stop the stamping: msg = "Error: unable to determine the name of the file to be stamped." raise InvenioWebSubmitFileStamperError(msg) ## Do we have a LaTeX file for creation of the stamp? if options["latex-template"] in (None, ""): ## No latex stamp file - stop the stamping: msg = "Error: unable to determine the name of the LaTeX template " \ "file to be used for stamp creation." raise InvenioWebSubmitFileStamperError(msg) ## OK - begin the document stamping process: ## ## Get the output file: (dummy, name_outfile) = os.path.split(options["output-file"]) if name_outfile != "": ## Take just the basename component of outfile: options["output-file"] = name_outfile ## Create a working directory (in which to store the various files used and ## created during the stamping process) and get the full path to it: path_workingdir = create_working_directory() ## Copy the file to be stamped into the working directory: basename_input_file = \ copy_subject_file_to_working_directory(path_workingdir, \ options["input-file"]) ## Now import the LaTeX (and associated) files into a temporary directory ## and use them to create the "stamp" PDF: pdf_stamp_name = create_pdf_stamp(path_workingdir, \ options["latex-template"], \ options["latex-template-var"]) ## Everything is now ready to merge the "stamping subject" file with the ## PDF "stamp" file that has been created: name_stamped_file = apply_stamp_to_file(path_workingdir, \ options["stamp"], \ pdf_stamp_name, \ basename_input_file, \ options["output-file"]) ## Return a tuple containing the working directory and the name of the ## stamped file to the caller: return (path_workingdir, name_stamped_file) def stamp_file_cli(): """The function responsible for triggering the stamping process when called via the CLI. This function will effectively get the CLI options, then pass them to function that is responsible for coordinating the stamping process itself. Once stamping has been completed, an attempt will be made to copy the stamped file to the current working directory. """ ## Get CLI options and arguments: input_options = get_cli_options() ## Stamp the file and obtain the working directory in which the stamped file ## is situated and the name of the stamped file: try: (working_dir, stamped_file) = stamp_file(input_options) except InvenioWebSubmitFileStamperError, err: ## Something went wrong: sys.stderr.write("Stamping failed: [%s]\n" % str(err)) sys.stderr.flush() sys.exit(1) if not os.access("./%s" % stamped_file, os.F_OK): ## Copy the stamped file into the current directory: try: shutil.copyfile("%s/%s" % (working_dir, stamped_file), \ "./%s" % stamped_file) except IOError: ## Report that it wasn't possible to copy the stamped file locally ## and offer the user a path to it: msg = "It was not possible to copy the stamped file to the " \ "current working directory.\nYou can find it here: " \ "[%s/%s].\n" \ % (working_dir, stamped_file) sys.stderr.write(msg) sys.stderr.flush() else: ## A file exists in curdir with the same name as the final stamped file. ## just print out a message stating this fact, along with the path to ## the stamped file in the temporary working directory: msg = "The stamped file [%s] has not been copied to the current " \ "working directory because a file with this name already " \ "existed there.\nYou can find the stamped file here: " \ "[%s/%s].\n" % (stamped_file, working_dir, stamped_file) sys.stderr.write(msg) sys.stderr.flush() ## Start proceedings for CLI calls: if __name__ == "__main__": stamp_file_cli() diff --git a/modules/websubmit/lib/websubmit_icon_creator.py b/modules/websubmit/lib/websubmit_icon_creator.py index e4ca6d072..34e9b6201 100644 --- a/modules/websubmit/lib/websubmit_icon_creator.py +++ b/modules/websubmit/lib/websubmit_icon_creator.py @@ -1,771 +1,771 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """This is websubmit_icon_creator.py This tool is used to create an icon of a picture file. + Python API: Please see create_icon(). + CLI API: ## $ python ~invenio/lib/python/invenio/websubmit_icon_creator.py \\ ## --icon-scale=200 \\ ## --icon-name=test-icon \\ ## --icon-file-format=jpg \\ ## test-image.jpg ## $ python ~invenio/lib/python/invenio/websubmit_icon_creator.py \\ ## --icon-scale=200 \\ ## --icon-name=test-icon2 \\ ## --icon-file-format=gif \\ ## --multipage-icon \\ ## --multipage-icon-delay=50 \\ ## test-image2.pdf """ __revision__ = "$Id$" import os.path, sys, getopt, shutil, tempfile, re from invenio.config import \ CFG_TMPDIR, \ - CFG_PATH_DISTILLER, \ + CFG_PATH_PS2PDF, \ CFG_PATH_PDFTK, \ CFG_PATH_CONVERT from invenio.shellutils import escape_shell_arg from invenio.websubmit_config import InvenioWebSubmitIconCreatorError CFG_ALLOWED_FILE_EXTENSIONS = ["pdf", "gif", "jpg", \ "jpeg", "ps", "png", "bmp", \ "eps", "epsi", "epsf"] ## ***** Functions related to the icon creation process: ***** # Accepted format for the ImageMagick 'scale' parameter: re_imagemagic_scale_parameter_format = re.compile(r'x?\d+(x\d*)?(^|!|>|<|@|%)?$') def create_working_directory(): """Create a "working directory" in which the files related to the icon- creation process can be stored, and return the full path to it. The working directory will be created in ~invenio/var/tmp. If it cannot be created there, an exception (InvenioWebSubmitIconCreatorError) will be raised. The working directory will have the prefix "websubmit_icon_creator_", and could be given a name something like: - websubmit_icon_creator_Tzs3St @return: (string) - the full path to the working directory. @Exceptions raised: InvenioWebSubmitIconCreatorError. """ ## Create the temporary directory in which to place the files related to ## icon creation in ~invenio/var/tmp: path_workingdir = None try: path_workingdir = tempfile.mkdtemp(prefix="websubmit_icon_creator_", \ dir="%s" % CFG_TMPDIR) except OSError, err: ## Unable to create the temporary directory in ~invenio/var/tmp msg = "Error: Unable to create a temporary working directory in " \ "which to carry out the icon creation process. An attempt was " \ "made to create the directory in [%s]; the error encountered " \ "was <%s>. Icon creation has failed." % (CFG_TMPDIR, str(err)) raise InvenioWebSubmitIconCreatorError(msg) ## return the path to the working-directory: return path_workingdir def copy_file_to_directory(source_file, destination_dir): """Attempt to copy an ordinary file from one location to a destination directory, returning the name of the copied file if successful. @param source_file: (string) - the name of the file to be copied to the destination directory. @param destination_dir: (string) - the path of the directory into which the source file is to be copied. @return: (string) - the name of the source file after it has been copied to the destination directory (i.e. no leading path information.) @Exceptions raised: (IOError) - upon failure to successfully copy the source file to the destination directory. """ ## Divide the input filename into path and basename: (dummy, name_source_file) = os.path.split(source_file) if name_source_file == "": ## The source file is just a path - not a valid filename. msg = """Error: the name of the file to be copied was invalid.""" raise IOError(msg) ## Test to see whether source file is a real file and is readable: if os.access("%s" % source_file, os.R_OK): ## File is readable. Copy it locally to the destination directory: try: shutil.copyfile("%s" % source_file, \ "%s/%s" % (destination_dir, name_source_file)) except IOError: ## Unable to copy the source file to the destination directory. msg = """Error: Unable to copy source file [%s] to """ \ """the destination directory [%s].""" \ % (source_file, destination_dir) raise IOError(msg) else: ## Unable to read the source file. msg = """Error: Unable to copy source file [%s] to """ \ """destination directory [%s]. (File not readable.)""" \ % (source_file, destination_dir) raise IOError(msg) ## Now that the source file has been successfully copied to the destination ## directory, return its base name: return name_source_file def build_icon(path_workingdir, source_filename, source_filetype, icon_name, icon_filetype, multipage_icon, multipage_icon_delay, icon_scale): """Whereas create_icon acts as the API for icon creation and therefore deals with argument washing, temporary working directory creation, etc, the build_icon function takes care of the actual creation of the icon file itself by calling various shell tools. To accomplish this, it relies upon the following parameters: @param path_workingdir: (string) - the path to the working directory in which all files related to the icon creation are stored. @param source_filename: (string) - the filename of the original image file. @param source_filetype: (string) - the file type of the original image file. @param icon_name: (string) - the name that is to be given to the icon. @param icon_filetype: (string) - the file type of the icon that is to be created. @param multipage_icon: (boolean) - a flag indicating whether or not an icon with multiple pages (i.e. an animated gif icon) should be created. @param multipage_icon_delay: (integer) - the delay to be used between frame changing for an icon with multiple pages (i.e. an animated gif.) @param icon_scale: (integer) - the scaling information for the created icon. @return: (string) - the name of the created icon file (which will have been created in the working directory "path_workingdir".) @Exceptions raised: (InvenioWebSubmitIconCreatorError) - raised when the icon creation process fails. """ ## ## If the source file is a PS, convert it into a PDF: if source_filetype == "ps": ## Convert the subject file from PostScript to PDF: if source_filename[-3:].lower() == ".ps": ## The name of the file to be stamped has a PostScript extension. ## Strip it and give the name of the PDF file to be created a ## PDF extension: created_pdfname = "%s.pdf" % source_filename[:-3] elif len(source_filename.split(".")) > 1: ## The file name has an extension - strip it and add a PDF ## extension: raw_name = source_filename[:source_filename.rfind(".")] if raw_name != "": created_pdfname = "%s.pdf" % raw_name else: ## It would appear that the file had no extension and that its ## name started with a period. Just use the original name with ## a .pdf suffix: created_pdfname = "%s.pdf" % source_filename else: ## No extension - use the original name with a .pdf suffix: created_pdfname = "%s.pdf" % source_filename ## Build the distilling command: cmd_distill = """%(distiller)s %(ps-file-path)s """ \ """%(pdf-file-path)s 2>/dev/null""" % \ - { 'distiller' : CFG_PATH_DISTILLER, + { 'distiller' : CFG_PATH_PS2PDF, 'ps-file-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ source_filename)), 'pdf-file-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ created_pdfname)), } ## Distill the PS into a PDF: errcode_distill = os.system(cmd_distill) ## Test to see whether the PS was distilled into a PDF without error: if errcode_distill or \ not os.access("%s/%s" % (path_workingdir, created_pdfname), os.F_OK): ## The PDF file was not correctly created in the working directory. ## Unable to continue. msg = "Error: Unable to correctly convert PostScript file [%s] to" \ " PDF. Cannot create icon." % source_filename raise InvenioWebSubmitIconCreatorError(msg) ## Now assign the name of the created PDF file to subject_file: source_filename = created_pdfname ## ## Treat the name of the icon: if icon_name in (None, ""): ## Since no name has been provided for the icon, give it the same name ## as the source file, but with the prefix "icon-": icon_name = "icon-%s" % source_filename ## Now if the icon name has an extension, strip it and add that of the ## icon file type: if len(icon_name.split(".")) > 1: ## The icon file name has an extension - strip it and add the icon ## file type extension: raw_name = icon_name[:icon_name.rfind(".")] if raw_name != "": icon_name = "%s.%s" % (raw_name, icon_filetype) else: ## It would appear that the file had no extension and that its ## name started with a period. Just use the original name with ## the icon file type's suffix: icon_name = "%s.%s" % (icon_name, icon_filetype) else: ## The icon name had no extension. Use the original name with the ## icon file type's suffix: icon_name = "%s.%s" % (icon_name, icon_filetype) ## ## If the source file type is PS or PDF, it may be necessary to separate ## the first page from the rest of the document and keep it for use as ## the icon. Do this if necessary: if source_filetype in ("ps", "pdf") and \ (icon_filetype != "gif" or not multipage_icon): ## Either (a) the icon type isn't GIF (in which case it cannot ## be animated and must therefore be created _only_ from the ## document's first page; or (b) the icon type is GIF, but the ## icon is to be created from the first page of the document only. ## The first page of the PDF document must be separated and is to ## be used for icon creation: source_file_first_page = "p1-%s" % source_filename ## Perform the separation: cmd_get_first_page = \ "%(pdftk)s A=%(source-file-path)s " \ "cat A1 output %(first-page-path)s " \ "2>/dev/null" \ % { 'pdftk' : CFG_PATH_PDFTK, 'source-file-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, source_filename)), 'first-page-path' : escape_shell_arg("%s/%s" % \ (path_workingdir, \ source_file_first_page)), } errcode_get_first_page = os.system(cmd_get_first_page) ## Check that the separation was successful: if errcode_get_first_page or \ not os.access("%s/%s" % (path_workingdir, \ source_file_first_page), os.F_OK): ## Separation was unsuccessful. msg = "Error: Unable to create an icon for file [%s/%s] - it " \ "wasn't possible to separate the first page from the " \ "rest of the document (error code [%s].)" \ % (path_workingdir, source_filename, errcode_get_first_page) raise InvenioWebSubmitIconCreatorError(msg) else: ## Successfully extracted the first page. Treat it as the source ## file for icon creation from now on: source_filename = source_file_first_page ## ## Create the icon: ## If a delay is necessary for an animated gif icon, create the ## delay string: delay_info = "" if source_filetype in ("ps", "pdf") and \ icon_filetype == "gif" and multipage_icon: ## Include delay information: delay_info = "-delay %s" % escape_shell_arg(str(multipage_icon_delay)) ## Command for icon creation: cmd_create_icon = "%(convert)s -colorspace rgb -scale %(scale)s %(delay)s " \ "%(source-file-path)s %(icon-file-path)s 2>/dev/null" \ % { 'convert' : CFG_PATH_CONVERT, 'scale' : \ escape_shell_arg(icon_scale), 'delay' : delay_info, 'source-file-path' : \ escape_shell_arg("%s/%s" \ % (path_workingdir, \ source_filename)), 'icon-file-path' : \ escape_shell_arg("%s/%s" \ % (path_workingdir, \ icon_name)), } errcode_create_icon = os.system(cmd_create_icon) ## Check that the icon creation was successful: if errcode_create_icon or \ not os.access("%s/%s" % (path_workingdir, icon_name), os.F_OK): ## Icon creation was unsuccessful. msg = "Error: Unable to create an icon for file [%s/%s] (error " \ "code [%s].)" \ % (path_workingdir, source_filename, errcode_create_icon) raise InvenioWebSubmitIconCreatorError(msg) ## ## The icon was successfully created. Return its name: return icon_name def create_icon(options): """The driver for the icon creation process. This is effectively the function that is responsible for coordinating the icon creation. It is the API for the creation of an icon. @param options: (dictionary) - a dictionary of options that are required by the function in order to carry out the icon-creation process. The dictionary must have the following structure: + input-file: (string) - the path to the input file (i.e. that which is to be stamped; + icon-name: (string) - the name of the icon that is to be created by the program. This is optional - if not provided, a default name will be applied to the icon file instead; + multipage-icon: (boolean) - used only when the original file is a PDF or PS file. If False, the created icon will feature ONLY the first page of the PDF. If True, ALL pages of the PDF will be included in the created icon. Note: If the icon type is not gif, this flag will be forced as False. + multipage-icon-delay: (integer) - used only when the original file is a PDF or PS AND use-first-page-only is False AND the icon type is gif. This allows the user to specify the delay between "pages" of a multi-page (animated) icon. + icon-scale: ('geometry') - the scaling information to be used for the creation of the new icon. Type 'geometry' as defined in ImageMagick. (eg. 320 or 320x240 or 100> or 5%) + icon-file-format: (string) - the file format of the icon that is to be created. Legal values are: * pdf * gif * jpg * jpeg * ps * png * bmp + verbosity: (integer) - the verbosity level under which the program is to run; So, an example of the returned dictionary could be something like: { 'input-file' : "demo-picture-file.jpg", 'icon-name' : "icon-demo-picture-file", 'icon-file-format' : "gif", 'multipage-icon' : True, 'multipage-icon-delay' : 100, 'icon-scale' : 180, 'verbosity' : 0, } @return: (tuple) - consisting of two strings: 1. the path to the working directory in which all files related to icon creation are stored; 2. The name of the "icon" file; @Exceptions raised: (InvenioWebSubmitIconCreatorError) be raised or propagated by this function when the icon creation process fails for one reason or another. """ ## SANITY CHECKS: ## Does the options dictionary contain all expected keys? ## ## A list of the names of the expected options: expected_option_names = ['input-file', \ 'icon-name', \ 'icon-file-format', \ 'multipage-icon', \ 'multipage-icon-delay', \ 'icon-scale', \ 'verbosity'] expected_option_names.sort() ## A list of the option names that have been received: received_option_names = options.keys() received_option_names.sort() if expected_option_names != received_option_names: ## Error: he dictionary of options had an illegal structure: msg = """Error: Unexpected value received for "options" parameter.""" raise InvenioWebSubmitIconCreatorError(msg) ## Do we have an input file to work on? if options["input-file"] in (None, ""): ## No input file - stop the icon creation: msg = "Error: unable to determine the name of the file from which " \ "the icon is to be created." raise InvenioWebSubmitIconCreatorError(msg) else: ## Get the file type of the input file: tmp_file_extension = options["input-file"].split(".")[-1] if tmp_file_extension.lower() not in CFG_ALLOWED_FILE_EXTENSIONS: ## Ilegal input file type. msg = "Error: icons can be only be created from %s files, " \ "not [%s]." % (str(CFG_ALLOWED_FILE_EXTENSIONS), \ tmp_file_extension.lower()) raise InvenioWebSubmitIconCreatorError(msg) else: subject_filetype = tmp_file_extension.lower() ## Wash the requested icon name: if type(options["icon-name"]) is not str: options["icon-name"] = "" else: (dummy, name_iconfile) = os.path.split(options["icon-name"]) if name_iconfile != "": ## Take just the basename component of the icon file: options["icon-name"] = name_iconfile ## Do we have an icon file format? icon_format = options["icon-file-format"] if icon_format in (None, ""): ## gif by default: options["icon-file-format"] = "gif" elif str(icon_format).lower() not in CFG_ALLOWED_FILE_EXTENSIONS: ## gif if an invalid icon type was supplied: options["icon-file-format"] = "gif" else: ## Use the provided icon type: options["icon-file-format"] = icon_format.lower() ## Wash the use-first-page-only flag according to the type of the ## requested icon: if options["icon-file-format"] != "gif": ## Since the request icon isn't a gif file, it can't be animated ## and should be created from the first "page" of the original file: options["multipage-icon"] = False else: ## The requested icon is a gif. Verify that the multipage-icon ## flag is a boolean value. If not, set it to False by default: if type(options["multipage-icon"]) is not bool: ## Non-boolean value: default to False: options["multipage-icon"] = False ## Wash the delay time for frames in an animated gif icon: if type(options["multipage-icon-delay"]) is not int: ## Invalid value - set it to default: options["multipage-icon-delay"] = 100 elif options["multipage-icon-delay"] < 0: ## Can't have negative delays: options["multipage-icon-delay"] = 100 ## Wash the icon scaling information: if not re_imagemagic_scale_parameter_format.match(options["icon-scale"]): ## Ivalid value - set it to default: options["icon-scale"] = "180" ## OK. Begin the icon creation process: ## ## Create a working directory for the icon creation process and get the ## full path to it: path_workingdir = create_working_directory() ## Copy the file from which the icon is to be created into the ## working directory: try: basename_source_file = \ copy_file_to_directory(options["input-file"], path_workingdir) except IOError, err: ## Unable to copy the source file to the working directory. msg = "Icon creation failed: unable to copy the source image file " \ "to the working directory. Got this error: [%s]" % str(err) raise InvenioWebSubmitIconCreatorError(msg) ## Create the icon and get its name: icon_name = build_icon(path_workingdir, \ basename_source_file, \ subject_filetype, \ options["icon-name"], \ options["icon-file-format"], \ options["multipage-icon"], \ options["multipage-icon-delay"], \ options["icon-scale"]) ## Return a tuple containing the working directory and the name of the ## icon file to the caller: return (path_workingdir, icon_name) ## ***** Functions Specific to CLI calling of the program: ***** def usage(wmsg="", err_code=0): """Print a "usage" message (along with an optional additional warning/error message) to stderr and exit with a given error code. @param wmsg: (string) - some kind of warning message for the user. @param err_code: (integer) - an error code to be passed to sys.exit, which is called after the usage message has been printed. @return: None. """ ## Wash the warning message: if wmsg != "": wmsg = wmsg.strip() + "\n" ## The usage message: msg = """ Usage: python ~invenio/lib/python/invenio/websubmit_icon_creator.py \\ [options] input-file.jpg websubmit_icon_creator.py is used to create an icon for an image. Options: -h, --help Print this help. -V, --version Print version information. -v, --verbose=LEVEL Verbose level (0=min, 1=default, 9=max). [NOT IMPLEMENTED] -s, --icon-scale Scaling information for the icon that is to be created. Must be an integer. Defaults to 180. -m, --multipage-icon A flag to indicate that the icon should consist of multiple pages. Will only be respected if the requested icon type is GIF and the input file is a PS or PDF consisting of several pages. -d, --multipage-icon-delay=VAL If the icon consists of several pages and is an animated GIF, a delay between frames can be specified. Must be an integer. Defaults to 100. -f, --icon-file-format=FORMAT The file format of the icon to be created. Must be one of: [pdf, gif, jpg, jpeg, ps, png, bmp] Defaults to gif. -o, --icon-name=XYZ The optional name to be given to the created icon file. If this is omitted, the icon file will be given the same name as the input file, but will be prefixed by "icon-"; Examples: python ~invenio/lib/python/invenio/websubmit_icon_creator.py \\ --icon-scale=200 \\ --icon-name=test-icon \\ --icon-file-format=jpg \\ test-image.jpg python ~invenio/lib/python/invenio/websubmit_icon_creator.py \\ --icon-scale=200 \\ --icon-name=test-icon2 \\ --icon-file-format=gif \\ --multipage-icon \\ --multipage-icon-delay=50 \\ test-image2.pdf """ sys.stderr.write(wmsg + msg) sys.exit(err_code) def get_cli_options(): """From the options and arguments supplied by the user via the CLI, build a dictionary of options to drive websubmit-icon-creator. For reference, the CLI options available to the user are as follows: -h, --help -> Display help/usage message and exit; -V, --version -> Display version information and exit; -v, --verbose= -> Set verbosity level (0=min, 1=default, 9=max). -s, --icon-scale -> Scaling information for the icon that is to be created. Must be of type 'geometry', as understood by ImageMagick (Eg. 320 or 320x240 or 100>). Defaults to 180. -m, --multipage-icon -> A flag to indicate that the icon should consist of multiple pages. Will only be respected if the requested icon type is GIF and the input file is a PS or PDF consisting of several pages. -d, --multipage-icon-delay= -> If the icon consists of several pages and is an animated GIF, a delay between frames can be specified. Must be an integer. Defaults to 100. -f, --icon-file-format= -> The file format of the icon to be created. Must be one of: [pdf, gif, jpg, jpeg, ps, png, bmp] Defaults to gif. -o, --icon-name= -> The optional name to be given to the created icon file. If this is omitted, the icon file will be given the same name as the input file, but will be prefixed by "icon-"; @return: (dictionary) of input options and flags, set as appropriate. The dictionary has the following structure: + input-file: (string) - the path to the input file (i.e. that which is to be stamped; + icon-name: (string) - the name of the icon that is to be created by the program. This is optional - if not provided, a default name will be applied to the icon file instead; + multipage-icon: (boolean) - used only when the original file is a PDF or PS file. If False, the created icon will feature ONLY the first page of the PDF. If True, ALL pages of the PDF will be included in the created icon. Note: If the icon type is not gif, this flag will be forced as False. + multipage-icon-delay: (integer) - used only when the original file is a PDF or PS AND use-first-page-only is False AND the icon type is gif. This allows the user to specify the delay between "pages" of a multi-page (animated) icon. + icon-scale: (integer) - the scaling information to be used for the creation of the new icon. + icon-file-format: (string) - the file format of the icon that is to be created. Legal values are: [pdf, gif, jpg, jpeg, ps, png, bmp] + verbosity: (integer) - the verbosity level under which the program is to run; So, an example of the returned dictionary could be something like: { 'input-file' : "demo-picture-file.jpg", 'icon-name' : "icon-demo-picture-file", 'icon-file-format' : "gif", 'multipage-icon' : True, 'multipage-icon-delay' : 100, 'icon-scale' : 180, 'verbosity' : 0, } """ ## dictionary of important values relating to cli call of program: options = { 'input-file' : "", 'icon-name' : "", 'icon-file-format' : "", 'multipage-icon' : False, 'multipage-icon-delay' : 100, 'icon-scale' : 180, 'verbosity' : 0, } ## Get the options and arguments provided by the user via the CLI: try: myoptions, myargs = getopt.getopt(sys.argv[1:], "hVv:s:md:f:o:", \ ["help", "version", "verbosity=", "icon-scale=", "multipage-icon", "multipage-icon-delay=", "icon-file-format=", "icon-name="]) except getopt.GetoptError, err: ## Invalid option provided - usage message usage(wmsg="Error: %(msg)s." % { 'msg' : str(err) }) ## Get the input file from the arguments list (it should be the ## first argument): if len(myargs) > 0: options["input-file"] = myargs[0] ## Extract the details of the options: for opt in myoptions: if opt[0] in ("-V","--version"): ## version message and exit sys.stdout.write("%s\n" % __revision__) sys.stdout.flush() sys.exit(0) elif opt[0] in ("-h","--help"): ## help message and exit usage() elif opt[0] in ("-v", "--verbosity"): ## Get verbosity level: if not opt[1].isdigit(): options['verbosity'] = 0 elif int(opt[1]) not in xrange(0, 10): options['verbosity'] = 0 else: options['verbosity'] = int(opt[1]) elif opt[0] in ("-o", "--icon-name"): ## Get the name of the icon that is to be created: options["icon-name"] = opt[1] elif opt[0] in ("-f", "--icon-file-format"): ## The file format of the icon file: if str(opt[1]).lower() not in CFG_ALLOWED_FILE_EXTENSIONS: ## Illegal file format requested for icon: usage() else: ## gif if an invalid icon type was supplied: options["icon-file-format"] = str(opt[1]).lower() elif opt[0] in ("-m","--multipage-icon"): ## The user would like a multipage (animated) icon: options['multipage-icon'] = True elif opt[0] in ("-d", "--multipage-icon-delay"): ## The delay to be used in the case of a multipage (animated) icon: try: frame_delay = int(opt[1]) except ValueError: ## Invalid value for delay supplied. Usage message. usage() else: if frame_delay >= 0: options['multipage-icon-delay'] = frame_delay elif opt[0] in ("-s", "--icon-scale"): ## The scaling information for the icon: if re_imagemagic_scale_parameter_format.match(opt[1]): options['icon-scale'] = opt[1] else: usage() ## ## Done. Return the dictionary of options: return options def create_icon_cli(): """The function responsible for triggering the icon creation process when called via the CLI. This function will effectively get the CLI options, then pass them to function that is responsible for coordinating the icon creation process itself. Once stamping has been completed, an attempt will be made to copy the icon file to the current working directory. If this can't be done, the path to the icon will be printed to stdout instead. """ ## Get CLI options and arguments: input_options = get_cli_options() ## Create the icon file and obtain the name of the working directory in ## which the icon file is situated and the name of the icon file: try: (working_dir, icon_file) = create_icon(input_options) except InvenioWebSubmitIconCreatorError, err: ## Something went wrong: sys.stderr.write("Icon creation failed: [%s]\n" % str(err)) sys.stderr.flush() sys.exit(1) if not os.access("./%s" % icon_file, os.F_OK): ## Copy the icon file into the current directory: try: shutil.copyfile("%s/%s" % (working_dir, icon_file), \ "./%s" % icon_file) except IOError: ## Report that it wasn't possible to copy the icon file locally ## and offer the user a path to it: msg = "It was not possible to copy the icon file to the " \ "current working directory.\nYou can find it here: " \ "[%s/%s].\n" \ % (working_dir, icon_file) sys.stderr.write(msg) sys.stderr.flush() else: ## A file exists in curdir with the same name as the final icon file. ## Just print out a message stating this fact, along with the path to ## the icon file in the temporary working directory: msg = "The icon file [%s] has not been copied to the current " \ "working directory because a file with this name already " \ "existed there.\nYou can find the icon file here: " \ "[%s/%s].\n" % (icon_file, working_dir, icon_file) sys.stderr.write(msg) sys.stderr.flush() ## Start proceedings for CLI calls: if __name__ == "__main__": create_icon_cli() diff --git a/modules/websubmit/lib/websubmit_templates.py b/modules/websubmit/lib/websubmit_templates.py index d15389079..59fb60c16 100644 --- a/modules/websubmit/lib/websubmit_templates.py +++ b/modules/websubmit/lib/websubmit_templates.py @@ -1,3082 +1,3095 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __revision__ = "$Id$" import urllib import time import cgi import gettext import string import locale import re import operator import os from invenio.config import \ CFG_SITE_URL, \ CFG_VERSION, \ CFG_SITE_URL, \ CFG_SITE_LANG from invenio.messages import gettext_set_language -from invenio.dateutils import convert_datetext_to_dategui +from invenio.dateutils import convert_datetext_to_dategui, convert_datestruct_to_dategui from invenio.urlutils import create_html_link from invenio.webmessage_mailutils import email_quoted_txt2html from invenio.htmlutils import escape_html from websubmit_config import \ CFG_WEBSUBMIT_CHECK_USER_LEAVES_SUBMISSION import invenio.template class Template: # Parameters allowed in the web interface for fetching files files_default_urlargd = { 'version': (str, ""), # version "" means "latest" 'docname': (str, ""), # the docname (optional) 'format' : (str, ""), # the format - 'verbose' : (int, 0) # the verbosity + 'verbose' : (int, 0), # the verbosity + 'subformat' : (str, ""), # the subformat } def tmpl_submit_home_page(self, ln, catalogues): """ The content of the home page of the submit engine Parameters: - 'ln' *string* - The language to display the interface in - 'catalogues' *string* - The HTML code for the catalogues list """ # load the right message language _ = gettext_set_language(ln) return """ """ % { 'document_types' : _("Document types available for submission"), 'please_select' : _("Please select the type of document you want to submit"), 'catalogues' : catalogues, 'ln' : ln, } def tmpl_submit_home_catalog_no_content(self, ln): """ The content of the home page of submit in case no doctypes are available Parameters: - 'ln' *string* - The language to display the interface in """ # load the right message language _ = gettext_set_language(ln) out = "

    " + _("No document types available.") + "

    \n" return out def tmpl_submit_home_catalogs(self, ln, catalogs): """ Produces the catalogs' list HTML code Parameters: - 'ln' *string* - The language to display the interface in - 'catalogs' *array* - The catalogs of documents, each one a hash with the properties: - 'id' - the internal id - 'name' - the name - 'sons' - sub-catalogs - 'docs' - the contained document types, in the form: - 'id' - the internal id - 'name' - the name There is at least one catalog """ # load the right message language _ = gettext_set_language(ln) # import pprint # out = "
    " + pprint.pformat(catalogs)
             out = ""
             for catalog in catalogs:
                 out += "\n
      " out += self.tmpl_submit_home_catalogs_sub(ln, catalog) out += "\n
    \n" return out def tmpl_print_warning(self, msg, type, prologue, epilogue): """Prints warning message and flushes output. Parameters: - 'msg' *string* - The message string - 'type' *string* - the warning type - 'prologue' *string* - HTML code to display before the warning - 'epilogue' *string* - HTML code to display after the warning """ out = '\n%s' % (prologue) if type: out += '%s: ' % type out += '%s%s' % (msg, epilogue) return out def tmpl_submit_home_catalogs_sub(self, ln, catalog): """ Recursive function that produces a catalog's HTML display Parameters: - 'ln' *string* - The language to display the interface in - 'catalog' *array* - A catalog of documents, with the properties: - 'id' - the internal id - 'name' - the name - 'sons' - sub-catalogs - 'docs' - the contained document types, in the form: - 'id' - the internal id - 'name' - the name """ # load the right message language _ = gettext_set_language(ln) if catalog['level'] == 1: out = "
  • %s\n" % catalog['name'] else: if catalog['level'] == 2: out = "
  • %s\n" % cgi.escape(catalog['name']) else: if catalog['level'] > 2: out = "
  • %s\n" % cgi.escape(catalog['name']) if len(catalog['docs']) or len(catalog['sons']): out += "
      \n" if len(catalog['docs']) != 0: for row in catalog['docs']: out += self.tmpl_submit_home_catalogs_doctype(ln, row) if len(catalog['sons']) != 0: for row in catalog['sons']: out += self.tmpl_submit_home_catalogs_sub(ln, row) if len(catalog['docs']) or len(catalog['sons']): out += "
  • " else: out += "
  • " return out def tmpl_submit_home_catalogs_doctype(self, ln, doc): """ Recursive function that produces a catalog's HTML display Parameters: - 'ln' *string* - The language to display the interface in - 'doc' *array* - A catalog of documents, with the properties: - 'id' - the internal id - 'name' - the name """ # load the right message language _ = gettext_set_language(ln) return """
  • %s
  • """ % create_html_link('%s/submit' % CFG_SITE_URL, {'doctype' : doc['id'], 'ln' : ln}, doc['name']) def tmpl_action_page(self, ln, uid, guest, pid, now, doctype, description, docfulldesc, snameCateg, lnameCateg, actionShortDesc, indir, statustext): """ Recursive function that produces a catalog's HTML display Parameters: - 'ln' *string* - The language to display the interface in - 'guest' *boolean* - If the user is logged in or not - 'pid' *string* - The current process id - 'now' *string* - The current time (security control features) - 'doctype' *string* - The selected doctype - 'description' *string* - The description of the doctype - 'docfulldesc' *string* - The title text of the page - 'snameCateg' *array* - The short names of all the categories of documents - 'lnameCateg' *array* - The long names of all the categories of documents - 'actionShortDesc' *array* - The short names (codes) for the different actions - 'indir' *array* - The directories for each of the actions - 'statustext' *array* - The names of the different action buttons """ # load the right message language _ = gettext_set_language(ln) out = "" out += """

    %(continue_explain)s
    Access Number:

    """ % { 'continue_explain' : _("To continue with a previously interrupted submission, enter an access number into the box below:"), 'doctype' : doctype, 'go' : _("GO"), 'ln' : ln, } return out def tmpl_warning_message(self, ln, msg): """ Produces a warning message for the specified text Parameters: - 'ln' *string* - The language to display the interface in - 'msg' *string* - The message to display """ # load the right message language _ = gettext_set_language(ln) return """
    %s
    """ % msg def tmpl_page_interface(self, ln, docname, actname, curpage, nbpages, nextPg, access, nbPg, doctype, act, fields, javascript, mainmenu): """ Produces a page with the specified fields (in the submit chain) Parameters: - 'ln' *string* - The language to display the interface in - 'doctype' *string* - The document type - 'docname' *string* - The document type name - 'actname' *string* - The action name - 'act' *string* - The action - 'curpage' *int* - The current page of submitting engine - 'nbpages' *int* - The total number of pages - 'nextPg' *int* - The next page - 'access' *string* - The submission number - 'nbPg' *string* - ?? - 'fields' *array* - the fields to display in the page, with each record having the structure: - 'fullDesc' *string* - the description of the field - 'text' *string* - the HTML code of the field - 'javascript' *string* - if the field has some associated javascript code - 'type' *string* - the type of field (T, F, I, H, D, S, R) - 'name' *string* - the name of the field - 'rows' *string* - the number of rows for textareas - 'cols' *string* - the number of columns for textareas - 'val' *string* - the default value of the field - 'size' *string* - the size for text fields - 'maxlength' *string* - the maximum length for text fields - 'htmlcode' *string* - the complete HTML code for user-defined fields - 'typename' *string* - the long name of the type - 'javascript' *string* - the javascript code to insert in the page - 'mainmenu' *string* - the url of the main menu """ # load the right message language _ = gettext_set_language(ln) # top menu out = """
    \n" # Display the navigation cell # Display "previous page" navigation arrows out += """
    %(docname)s   %(actname)s  """ % { 'docname' : docname, 'actname' : actname, } for i in range(1, nbpages+1): if i == int(curpage): out += """""" % curpage else: out += """""" % (i, i) out += """
       page: %s  %s   
     %(summary)s(2) 

    """ % { 'summary' : _("SUMMARY"), 'doctype' : cgi.escape(doctype), 'act' : cgi.escape(act), 'access' : cgi.escape(access), 'nextPg' : cgi.escape(nextPg), 'curpage' : cgi.escape(curpage), 'nbPg' : cgi.escape(nbPg), 'ln' : cgi.escape(ln), } for field in fields: if field['javascript']: out += """ """ % field['javascript'] # now displays the html form field(s) out += "%s\n%s\n" % (field['fullDesc'], field['text']) out += javascript out += "
     
     
    """ if int(curpage) != 1: out += """ """ % { 'prpage' : int(curpage) - 1, 'images' : CFG_SITE_URL + '/img', 'prevpage' : _("Previous page"), } else: out += """ """ # Display the submission number out += """ \n""" % { 'submission' : _("Submission number") + '(1)', 'access' : cgi.escape(access), } # Display the "next page" navigation arrow if int(curpage) != int(nbpages): out += """ """ % { 'nxpage' : int(curpage) + 1, 'images' : CFG_SITE_URL + '/img', 'nextpage' : _("Next page"), } else: out += """ """ out += """
      %(prevpage)s %(prevpage)s  %(submission)s: %(access)s %(nextpage)s %(nextpage)s  


    %(back)s


    %(take_note)s
    %(explain_summary)s
    """ % { 'surequit' : _("Are you sure you want to quit this submission?"), 'check_not_already_enabled': CFG_WEBSUBMIT_CHECK_USER_LEAVES_SUBMISSION and 'false' or 'true', 'back' : _("Back to main menu"), 'mainmenu' : cgi.escape(mainmenu), 'images' : CFG_SITE_URL + '/img', 'take_note' : '(1) ' + _("This is your submission access number. It can be used to continue with an interrupted submission in case of problems."), 'explain_summary' : '(2) ' + _("Mandatory fields appear in red in the SUMMARY window."), } return out def tmpl_submit_field(self, ln, field): """ Produces the HTML code for the specified field Parameters: - 'ln' *string* - The language to display the interface in - 'field' *array* - the field to display in the page, with the following structure: - 'javascript' *string* - if the field has some associated javascript code - 'type' *string* - the type of field (T, F, I, H, D, S, R) - 'name' *string* - the name of the field - 'rows' *string* - the number of rows for textareas - 'cols' *string* - the number of columns for textareas - 'val' *string* - the default value of the field - 'size' *string* - the size for text fields - 'maxlength' *string* - the maximum length for text fields - 'htmlcode' *string* - the complete HTML code for user-defined fields - 'typename' *string* - the long name of the type """ # load the right message language _ = gettext_set_language(ln) # If the field is a textarea if field['type'] == 'T': ## Field is a textarea: text = "" \ % (field['name'], field['rows'], field['cols'], cgi.escape(str(field['val']), 1)) # If the field is a file upload elif field['type'] == 'F': ## the field is a file input: text = """""" \ % (field['name'], field['size'], "%s" \ % ((field['maxlength'] in (0, None) and " ") or (""" maxlength="%s\"""" % field['maxlength'])) ) # If the field is a text input elif field['type'] == 'I': ## Field is a text input: text = """""" \ % (field['name'], field['size'], field['val'], "%s" \ % ((field['maxlength'] in (0, None) and " ") or (""" maxlength="%s\"""" % field['maxlength'])) ) # If the field is a hidden input elif field['type'] == 'H': text = "" % (field['name'], field['val']) # If the field is user-defined elif field['type'] == 'D': text = field['htmlcode'] # If the field is a select box elif field['type'] == 'S': text = field['htmlcode'] # If the field type is not recognized else: text = "%s: unknown field type" % field['typename'] return text def tmpl_page_interface_js(self, ln, upload, field, fieldhtml, txt, check, level, curdir, values, select, radio, curpage, nbpages, returnto): """ Produces the javascript for validation and value filling for a submit interface page Parameters: - 'ln' *string* - The language to display the interface in - 'upload' *array* - booleans if the field is a field - 'field' *array* - the fields' names - 'fieldhtml' *array* - the fields' HTML representation - 'txt' *array* - the fields' long name - 'check' *array* - if the fields should be checked (in javascript) - 'level' *array* - strings, if the fields should be filled (M) or not (O) - 'curdir' *array* - the current directory of the submission - 'values' *array* - the current values of the fields - 'select' *array* - booleans, if the controls are "select" controls - 'radio' *array* - booleans, if the controls are "radio" controls - 'curpage' *int* - the current page - 'nbpages' *int* - the total number of pages - 'returnto' *array* - a structure with 'field' and 'page', if a mandatory field on antoher page was not completed """ # load the right message language _ = gettext_set_language(ln) nbFields = len(upload) # if there is a file upload field, we change the encoding type out = """""" return out def tmpl_page_do_not_leave_submission_js(self, ln): """ Code to ask user confirmation when leaving the page, so that the submission is not interrupted by mistake. All submission functions should set the Javascript variable 'user_must_confirm_before_leaving_page' to 'false' before programmatically submitting the submission form. Parameters: - 'ln' *string* - The language to display the interface in """ # load the right message language _ = gettext_set_language(ln) out = ''' ''' % (CFG_WEBSUBMIT_CHECK_USER_LEAVES_SUBMISSION and 'true' or 'false', _('Your modifications will not be saved.').replace('"', '\\"')) return out def tmpl_page_endaction(self, ln, nextPg, startPg, access, curpage, nbPg, nbpages, doctype, act, docname, actname, mainmenu, finished, function_content, next_action): """ Produces the pages after all the fields have been submitted. Parameters: - 'ln' *string* - The language to display the interface in - 'doctype' *string* - The document type - 'act' *string* - The action - 'docname' *string* - The document type name - 'actname' *string* - The action name - 'curpage' *int* - The current page of submitting engine - 'startPg' *int* - The start page - 'nextPg' *int* - The next page - 'access' *string* - The submission number - 'nbPg' *string* - total number of pages - 'nbpages' *string* - number of pages (?) - 'mainmenu' *string* - the url of the main menu - 'finished' *bool* - if the submission is finished - 'function_content' *string* - HTML code produced by some function executed - 'next_action' *string* - if there is another action to be completed, the HTML code for linking to it """ # load the right message language _ = gettext_set_language(ln) out = """
    """ % { 'finished' : _("finished!"), } else: for i in range(1, nbpages + 1): out += """""" % (i,i) out += """
    %(docname)s   %(actname)s  """ % { 'nextPg' : cgi.escape(nextPg), 'startPg' : cgi.escape(startPg), 'access' : cgi.escape(access), 'curpage' : cgi.escape(curpage), 'nbPg' : cgi.escape(nbPg), 'doctype' : cgi.escape(doctype), 'act' : cgi.escape(act), 'docname' : docname, 'actname' : actname, 'mainmenu' : cgi.escape(mainmenu), 'ln' : cgi.escape(ln), } if finished == 1: out += """
      %(finished)s   
       %s %(end_action)s  
     %(summary)s(2) """ % { 'end_action' : _("end of action"), 'summary' : _("SUMMARY"), 'doctype' : cgi.escape(doctype), 'act' : cgi.escape(act), 'access' : cgi.escape(access), 'ln' : cgi.escape(ln), } out += """

    %(function_content)s %(next_action)s

    """ % { 'function_content' : function_content, 'next_action' : next_action, } if finished == 0: out += """%(submission)s²: %(access)s""" % { 'submission' : _("Submission no"), 'access' : cgi.escape(access), } else: out += " \n" out += """


    """ # Add the "back to main menu" button if finished == 0: out += """ %(back)s

    """ % { 'surequit' : _("Are you sure you want to quit this submission?"), 'back' : _("Back to main menu"), 'images' : CFG_SITE_URL + '/img', 'mainmenu' : cgi.escape(mainmenu), 'check_not_already_enabled': CFG_WEBSUBMIT_CHECK_USER_LEAVES_SUBMISSION and 'false' or 'true', } else: out += """ %(back)s

    """ % { 'back' : _("Back to main menu"), 'images' : CFG_SITE_URL + '/img', 'mainmenu' : cgi.escape(mainmenu), } return out def tmpl_function_output(self, ln, display_on, action, doctype, step, functions): """ Produces the output of the functions. Parameters: - 'ln' *string* - The language to display the interface in - 'display_on' *bool* - If debug information should be displayed - 'doctype' *string* - The document type - 'action' *string* - The action - 'step' *int* - The current step in submission - 'functions' *aray* - HTML code produced by functions executed and informations about the functions - 'name' *string* - the name of the function - 'score' *string* - the score of the function - 'error' *bool* - if the function execution produced errors - 'text' *string* - the HTML code produced by the function """ # load the right message language _ = gettext_set_language(ln) out = "" if display_on: out += """

    %(function_list)s

    """ % { 'function_list' : _("Here is the %(x_action)s function list for %(x_doctype)s documents at level %(x_step)s") % { 'x_action' : action, 'x_doctype' : doctype, 'x_step' : step, }, 'function' : _("Function"), 'score' : _("Score"), 'running' : _("Running function"), } for function in functions: out += """""" % { 'name' : function['name'], 'score' : function['score'], 'result' : function['error'] and (_("Function %s does not exist.") % function['name'] + "
    ") or function['text'] } out += "
    %(function)s%(score)s%(running)s
    %(name)s%(score)s%(result)s
    " else: for function in functions: if not function['error']: out += function['text'] return out def tmpl_next_action(self, ln, actions): """ Produces the output of the functions. Parameters: - 'ln' *string* - The language to display the interface in - 'actions' *array* - The actions to display, in the structure - 'page' *string* - the starting page - 'action' *string* - the action (in terms of submission) - 'doctype' *string* - the doctype - 'nextdir' *string* - the path to the submission data - 'access' *string* - the submission number - 'indir' *string* - ?? - 'name' *string* - the name of the action """ # load the right message language _ = gettext_set_language(ln) out = "

    %(haveto)s

      " % { 'haveto' : _("You must now"), } i = 0 for action in actions: if i > 0: out += " " + _("or") + " " i += 1 out += """
    • %(name)s
    • """ % action out += "
    " return out def tmpl_filelist(self, ln, filelist='', recid='', docname='', version=''): """ Displays the file list for a record. Parameters: - 'ln' *string* - The language to display the interface in - 'recid' *int* - The record id - 'docname' *string* - The document name - 'version' *int* - The version of the document - 'filelist' *string* - The HTML string of the filelist (produced by the BibDoc classes) """ # load the right message language _ = gettext_set_language(ln) title = _("record") + ' #' + '%s' % (CFG_SITE_URL, recid, recid) if docname != "": title += ' ' + _("document") + ' #' + str(docname) if version != "": title += ' ' + _("version") + ' #' + str(version) out = """
    %s
    """ % (filelist) return out def tmpl_bibrecdoc_filelist(self, ln, types, verbose_files=''): """ Displays the file list for a record. Parameters: - 'ln' *string* - The language to display the interface in - 'types' *array* - The different types to display, each record in the format: - 'name' *string* - The name of the format - 'content' *array of string* - The HTML code produced by tmpl_bibdoc_filelist, for the right files - 'verbose_files' - A string representing in a verbose way the file information. """ # load the right message language _ = gettext_set_language(ln) out = "" for mytype in types: out += "%s %s:" % (mytype['name'], _("file(s)")) out += "
      " for content in mytype['content']: out += content out += "
    " if verbose_files: out += "
    %s
    " % verbose_files return out def tmpl_bibdoc_filelist(self, ln, versions=[], imageurl='', recid='', docname=''): """ Displays the file list for a record. Parameters: - 'ln' *string* - The language to display the interface in - 'versions' *array* - The different versions to display, each record in the format: - 'version' *string* - The version - 'content' *string* - The HTML code produced by tmpl_bibdocfile_filelist, for the right file - 'previous' *bool* - If the file has previous versions - 'imageurl' *string* - The URL to the file image - 'recid' *int* - The record id - 'docname' *string* - The name of the document """ # load the right message language _ = gettext_set_language(ln) out = """""" % { 'imageurl' : imageurl, 'docname' : docname } for version in versions: if version['previous']: versiontext = """
    (%(see)s %(previous)s)""" % { 'see' : _("see"), 'siteurl' : CFG_SITE_URL, 'docname' : urllib.quote(docname), 'recID': recid, 'previous': _("previous"), 'ln_link': (ln != CFG_SITE_LANG and '&ln=' + ln) or '', } else: versiontext = "" out += """" out += "" return out - def tmpl_bibdocfile_filelist(self, ln, recid, name, version, format, size, description): + def tmpl_bibdocfile_filelist(self, ln, recid, name, version, md, superformat, subformat, nice_size, description): """ Displays a file in the file list. Parameters: - 'ln' *string* - The language to display the interface in - 'recid' *int* - The id of the record - 'name' *string* - The name of the file - 'version' *string* - The version - - 'format' *string* - The display format + - 'md' *datetime* - the modification date - - 'size' *string* - The size of the file + - 'superformat' *string* - The display superformat + + - 'subformat' *string* - The display subformat + + - 'nice_size' *string* - The nice_size of the file - 'description' *string* - The description that might have been associated to the particular file """ # load the right message language _ = gettext_set_language(ln) + urlbase = '%s/record/%s/files/%s' % ( + CFG_SITE_URL, + recid, + '%s%s' % (name, superformat)) + + urlargd = {'version' : version} + if subformat: + urlargd['subformat'] = subformat + + link_label = '%s%s' % (name, superformat) + if subformat: + link_label += ' (%s)' % subformat + + link = create_html_link(urlbase, urlargd, cgi.escape(link_label)) + return """ - - %(name)s%(format)s - + %(link)s - [%(size)s B] + [%(nice_size)s] + %(md)s %(description)s """ % { - 'siteurl' : CFG_SITE_URL, - 'recid' : recid, - 'quoted_name' : urllib.quote(name), - 'name' : cgi.escape(name), - 'version' : version, - 'name' : cgi.escape(name), - 'quoted_format' : urllib.quote(format), - 'format' : cgi.escape(format), - 'size' : size, + 'link' : link, + 'nice_size' : nice_size, + 'md' : convert_datestruct_to_dategui(md.timetuple(), ln), 'description' : cgi.escape(description), } def tmpl_submit_summary (self, ln, values): """ Displays the summary for the submit procedure. Parameters: - 'ln' *string* - The language to display the interface in - 'values' *array* - The values of submit. Each of the records contain the following fields: - 'name' *string* - The name of the field - 'mandatory' *bool* - If the field is mandatory or not - 'value' *string* - The inserted value - 'page' *int* - The submit page on which the field is entered """ # load the right message language _ = gettext_set_language(ln) out = """""" % \ { 'images' : CFG_SITE_URL + '/img' } for value in values: if value['mandatory']: color = "red" else: color = "" out += """""" % { 'color' : color, 'name' : value['name'], 'value' : value['value'], 'page' : value['page'], 'ln' : ln } out += "
    %(name)s %(value)s
    " return out def tmpl_yoursubmissions(self, ln, order, doctypes, submissions): """ Displays the list of the user's submissions. Parameters: - 'ln' *string* - The language to display the interface in - 'order' *string* - The ordering parameter - 'doctypes' *array* - All the available doctypes, in structures: - 'id' *string* - The doctype id - 'name' *string* - The display name of the doctype - 'selected' *bool* - If the doctype should be selected - 'submissions' *array* - The available submissions, in structures: - 'docname' *string* - The document name - 'actname' *string* - The action name - 'status' *string* - The status of the document - 'cdate' *string* - Creation date - 'mdate' *string* - Modification date - 'id' *string* - The id of the submission - 'reference' *string* - The display name of the doctype - 'pending' *bool* - If the submission is pending - 'act' *string* - The action code - 'doctype' *string* - The doctype code """ # load the right message language _ = gettext_set_language(ln) out = "" out += """
    " return out def tmpl_yourapprovals(self, ln, referees): """ Displays the doctypes and categories for which the user is referee Parameters: - 'ln' *string* - The language to display the interface in - 'referees' *array* - All the doctypes for which the user is referee: - 'doctype' *string* - The doctype - 'docname' *string* - The display name of the doctype - 'categories' *array* - The specific categories for which the user is referee: - 'id' *string* - The category id - 'name' *string* - The display name of the category """ # load the right message language _ = gettext_set_language(ln) out = """ " out += '''

    To see the status of documents for which approval has been requested, click here

    ''' % {'url' : CFG_SITE_URL} return out def tmpl_publiline_selectdoctype(self, ln, docs): """ Displays the doctypes that the user can select Parameters: - 'ln' *string* - The language to display the interface in - 'docs' *array* - All the doctypes that the user can select: - 'doctype' *string* - The doctype - 'docname' *string* - The display name of the doctype """ # load the right message language _ = gettext_set_language(ln) out = """ %s""" % (ln, _("Go to specific approval workflow")) return out def tmpl_publiline_selectcplxdoctype(self, ln, docs): """ Displays the doctypes that the user can select in a complex workflow Parameters: - 'ln' *string* - The language to display the interface in - 'docs' *array* - All the doctypes that the user can select: - 'doctype' *string* - The doctype - 'docname' *string* - The display name of the doctype """ # load the right message language _ = gettext_set_language(ln) out = """
    """ return out def tmpl_publiline_selectcateg(self, ln, doctype, title, categories): """ Displays the categories from a doctype that the user can select Parameters: - 'ln' *string* - The language to display the interface in - 'doctype' *string* - The doctype - 'title' *string* - The doctype name - 'categories' *array* - All the categories that the user can select: - 'id' *string* - The id of the category - 'waiting' *int* - The number of documents waiting - 'approved' *int* - The number of approved documents - 'rejected' *int* - The number of rejected documents """ # load the right message language _ = gettext_set_language(ln) out = """ """ % { 'key' : _("Key"), 'pending' : _("Pending"), 'images' : CFG_SITE_URL + '/img', 'waiting' : _("Waiting for approval"), 'approved' : _("Approved"), 'already_approved' : _("Already approved"), 'rejected' : _("Rejected"), 'rejected_text' : _("Rejected"), 'somepending' : _("Some documents are pending."), } return out def tmpl_publiline_selectcplxcateg(self, ln, doctype, title, types): """ Displays the categories from a doctype that the user can select Parameters: - 'ln' *string* - The language to display the interface in - 'doctype' *string* - The doctype - 'title' *string* - The doctype name - 'categories' *array* - All the categories that the user can select: - 'id' *string* - The id of the category - 'waiting' *int* - The number of documents waiting - 'approved' *int* - The number of approved documents - 'rejected' *int* - The number of rejected documents """ # load the right message language _ = gettext_set_language(ln) out = "" #out = """ # # # # #
    # # """ % { # 'title' : title, # 'list_type' : _("List of specific approvals"), # } columns = [] columns.append ({'apptype' : 'RRP', 'list_categ' : _("List of refereing categories"), 'id_form' : 0, }) #columns.append ({'apptype' : 'RPB', # 'list_categ' : _("List of publication categories"), # 'id_form' : 1, # }) #columns.append ({'apptype' : 'RDA', # 'list_categ' : _("List of direct approval categories"), # 'id_form' : 2, # }) for column in columns: out += """ """ # Key out += """ """ % { 'key' : _("Key"), 'pending' : _("Pending"), 'images' : CFG_SITE_URL + '/img', 'waiting' : _("Waiting for approval"), 'approved' : _("Approved"), 'already_approved' : _("Already approved"), 'rejected' : _("Rejected"), 'rejected_text' : _("Rejected"), 'cancelled' : _("Cancelled"), 'cancelled_text' : _("Cancelled"), 'somepending' : _("Some documents are pending."), } return out def tmpl_publiline_selectdocument(self, ln, doctype, title, categ, docs): """ Displays the documents that the user can select in the specified category Parameters: - 'ln' *string* - The language to display the interface in - 'doctype' *string* - The doctype - 'title' *string* - The doctype name - 'categ' *string* - the category - 'docs' *array* - All the categories that the user can select: - 'RN' *string* - The id of the document - 'status' *string* - The status of the document """ # load the right message language _ = gettext_set_language(ln) out = """ """ return out def tmpl_publiline_selectcplxdocument(self, ln, doctype, title, categ, categname, docs, apptype): """ Displays the documents that the user can select in the specified category Parameters: - 'ln' *string* - The language to display the interface in - 'doctype' *string* - The doctype - 'title' *string* - The doctype name - 'categ' *string* - the category - 'docs' *array* - All the categories that the user can select: - 'RN' *string* - The id of the document - 'status' *string* - The status of the document - 'apptype' *string* - the approval type """ # load the right message language _ = gettext_set_language(ln) listtype = "" if apptype == "RRP": listtype = _("List of refereed documents") elif apptype == "RPB": listtype = _("List of publication documents") elif apptype == "RDA": listtype = _("List of direct approval documents") out = """ """ return out def tmpl_publiline_displaydoc(self, ln, doctype, docname, categ, rn, status, dFirstReq, dLastReq, dAction, access, confirm_send, auth_code, auth_message, authors, title, sysno, newrn, note): """ Displays the categories from a doctype that the user can select Parameters: - 'ln' *string* - The language to display the interface in - 'doctype' *string* - The doctype - 'docname' *string* - The doctype name - 'categ' *string* - the category - 'rn' *string* - The document RN (id number) - 'status' *string* - The status of the document - 'dFirstReq' *string* - The date of the first approval request - 'dLastReq' *string* - The date of the last approval request - 'dAction' *string* - The date of the last action (approval or rejection) - 'confirm_send' *bool* - must display a confirmation message about sending approval email - 'auth_code' *bool* - authorised to referee this document - 'auth_message' *string* - ??? - 'authors' *string* - the authors of the submission - 'title' *string* - the title of the submission - 'sysno' *string* - the unique database id for the record - 'newrn' *string* - the record number assigned to the submission - 'note' *string* - Note about the approval request. """ # load the right message language _ = gettext_set_language(ln) if status == "waiting": image = """""" % (CFG_SITE_URL + '/img') elif status == "approved": image = """""" % (CFG_SITE_URL + '/img') elif status == "rejected": image = """""" % (CFG_SITE_URL + '/img') else: image = "" out = """ """ return out def tmpl_publiline_displaycplxdoc(self, ln, doctype, docname, categ, rn, apptype, status, dates, isPubCom, isEdBoard, isReferee, isProjectLeader, isAuthor, authors, title, sysno, newrn): # load the right message language _ = gettext_set_language(ln) if status == "waiting": image = """""" % (CFG_SITE_URL + '/img') elif status == "approved": image = """""" % (CFG_SITE_URL + '/img') elif status == "rejected": image = """""" % (CFG_SITE_URL + '/img') elif status == "cancelled": image = """""" % (CFG_SITE_URL + '/img') else: image = "" out = """ """ return out def tmpl_publiline_displaycplxdocitem(self, doctype, categ, rn, apptype, action, comments, (user_can_view_comments, user_can_add_comment, user_can_delete_comment), selected_category, selected_topic, selected_group_id, comment_subject, comment_body, ln): _ = gettext_set_language(ln) if comments and user_can_view_comments: comments_text = '' comments_overview = '
      ' for comment in comments: (cmt_uid, cmt_nickname, cmt_title, cmt_body, cmt_date, cmt_priority, cmtid) = comment comments_overview += '
    • %s - %s (%s)
    • ' % (cmtid, cmt_nickname, cmt_title, convert_datetext_to_dategui (cmt_date)) comments_text += """
      %s - %s (%s)ReplyTop
      %s
      """ % (cmtid, cmt_nickname, cmt_title, convert_datetext_to_dategui (cmt_date), CFG_SITE_URL, doctype, apptype, categ, rn, cmt_uid, ln, email_quoted_txt2html(cmt_body)) comments_overview += '
    ' else: comments_text = '' comments_overview = 'None.' body = '' if user_can_view_comments: body += """

    %(comments_label)s

    """ if user_can_view_comments: body += """%(comments)s""" if user_can_add_comment: validation = """ """ % {'button_label': _("Add Comment")} body += self.tmpl_publiline_displaywritecomment (doctype, categ, rn, apptype, action, _("Add Comment"), comment_subject, validation, comment_body, ln) body %= { 'comments_label': _("Comments"), 'action': action, 'button_label': _("Write a comment"), 'comments': comments_text} content = '
    ' out = """

    %(comments_overview_label)s

    %(comments_overview)s
    %(body)s
    """ % { 'comments_overview_label' : _('Comments overview'), 'comments_overview' : comments_overview, 'body' : body,} return out def tmpl_publiline_displaywritecomment(self, doctype, categ, rn, apptype, action, write_label, title, validation, reply_message, ln): _ = gettext_set_language(ln) return """

    %(write_label)s

    %(title_label)s:

    %(comment_label)s:


    %(validation)s
    """ % {'write_label': write_label, 'title_label': _("Title"), 'title': title, 'comment_label': _("Comment"), 'rn' : rn, 'categ' : categ, 'doctype' : doctype, 'apptype' : apptype, 'action' : action, 'validation' : validation, 'reply_message' : reply_message, 'ln' : ln, } def tmpl_publiline_displaydocplxaction(self, ln, doctype, categ, rn, apptype, action, status, authors, title, sysno, subtitle1, email_user_pattern, stopon1, users, extrausers, stopon2, subtitle2, usersremove, stopon3, validate_btn): # load the right message language _ = gettext_set_language(ln) if status == "waiting": image = """""" % (CFG_SITE_URL + '/img') elif status == "approved": image = """""" % (CFG_SITE_URL + '/img') elif status == "rejected": image = """""" % (CFG_SITE_URL + '/img') else: image = "" out = """ """ if ((apptype == "RRP") or (apptype == "RPB")) and ((action == "EdBoardSel") or (action == "RefereeSel")): out += """ """ if action == "EdBoardSel": out += """ """ if validate_btn != "": out += """
    """ % { 'rn' : rn, 'categ' : categ, 'doctype' : doctype, 'apptype' : apptype, 'action' : action, 'validate_btn' : validate_btn, 'ln': ln, } return out def tmpl_publiline_displaycplxrecom(self, ln, doctype, categ, rn, apptype, action, status, authors, title, sysno, msg_to, msg_to_group, msg_subject): # load the right message language _ = gettext_set_language(ln) if status == "waiting": image = """""" % (CFG_SITE_URL + '/img') elif status == "approved": image = """""" % (CFG_SITE_URL + '/img') elif status == "rejected": image = """""" % (CFG_SITE_URL + '/img') else: image = "" out = """ """ # escape forbidden character msg_to = escape_html(msg_to) msg_to_group = escape_html(msg_to_group) msg_subject = escape_html(msg_subject) write_box = """
    """ if msg_to != "": addr_box = """ """ % {'users_label': _("User"), 'to_users' : msg_to, } if msg_to_group != "": addr_box += """ """ % {'groups_label': _("Group"), 'to_groups': msg_to_group, } elif msg_to_group != "": addr_box = """ """ % {'groups_label': _("Group"), 'to_groups': msg_to_group, } else: addr_box = """ """ write_box += addr_box write_box += """
    %(to_label)s%(users_label)s %(to_users)s
      %(groups_label)s %(to_groups)s%(groups_label)s %(to_groups)s   
         
    %(subject_label)s
    %(message_label)s
    """ write_box = write_box % {'rn' : rn, 'categ' : categ, 'doctype' : doctype, 'apptype' : apptype, 'action' : action, 'subject' : msg_subject, 'to_label': _("To:"), 'subject_label': _("Subject:"), 'message_label': _("Message:"), 'send_label': _("SEND"), 'select' : _("Select:"), 'approve' : _("approve"), 'reject' : _("reject"), 'ln': ln, } out += write_box return out def displaycplxdoc_displayauthaction(action, linkText): return """ (%(linkText)s)""" % { "action" : action, "linkText" : linkText } diff --git a/modules/websubmit/lib/websubmit_webinterface.py b/modules/websubmit/lib/websubmit_webinterface.py index d89e46220..9cd43d8de 100644 --- a/modules/websubmit/lib/websubmit_webinterface.py +++ b/modules/websubmit/lib/websubmit_webinterface.py @@ -1,972 +1,952 @@ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __lastupdated__ = """$Date$""" __revision__ = "$Id$" import os import time import cgi import sys from urllib import urlencode from invenio.config import \ CFG_ACCESS_CONTROL_LEVEL_SITE, \ CFG_SITE_LANG, \ CFG_SITE_NAME, \ CFG_SITE_NAME_INTL, \ CFG_SITE_URL, \ CFG_SITE_SECURE_URL, \ CFG_WEBSUBMIT_STORAGEDIR, \ CFG_PREFIX from invenio import webinterface_handler_wsgi_utils as apache from invenio.dbquery import run_sql from invenio.access_control_config import VIEWRESTRCOLL from invenio.access_control_mailcookie import mail_cookie_create_authorize_action from invenio.access_control_engine import acc_authorize_action from invenio.access_control_admin import acc_is_role from invenio.webpage import page, create_error_box, pageheaderonly, \ pagefooteronly -from invenio.webuser import getUid, page_not_authorized, collect_user_info, isGuestUser +from invenio.webuser import getUid, page_not_authorized, collect_user_info, isGuestUser, isUserSuperAdmin from invenio.websubmit_config import * from invenio.webinterface_handler import wash_urlargd, WebInterfaceDirectory from invenio.urlutils import make_canonical_urlargd, redirect_to_url from invenio.messages import gettext_set_language from invenio.search_engine import \ guess_primary_collection_of_a_record, get_colID, record_exists, \ create_navtrail_links, check_user_can_view_record, record_empty from invenio.bibdocfile import BibRecDocs, normalize_format, file_strip_ext, \ stream_restricted_icon, BibDoc, InvenioWebSubmitFileError, stream_file, \ - decompose_file, propose_next_docname + decompose_file, propose_next_docname, get_subformat_from_format from invenio.errorlib import register_exception from invenio.websubmit_icon_creator import create_icon, InvenioWebSubmitIconCreatorError import invenio.template websubmit_templates = invenio.template.load('websubmit') from invenio.websearchadminlib import get_detailed_page_tabs from invenio.session import get_session import invenio.template webstyle_templates = invenio.template.load('webstyle') websearch_templates = invenio.template.load('websearch') try: from invenio.fckeditor_invenio_connector import FCKeditorConnectorInvenio fckeditor_available = True except ImportError, e: fckeditor_available = False class WebInterfaceFilesPages(WebInterfaceDirectory): def __init__(self,recid): self.recid = recid def _lookup(self, component, path): # after /record//files/ every part is used as the file # name filename = component def getfile(req, form): args = wash_urlargd(form, websubmit_templates.files_default_urlargd) ln = args['ln'] _ = gettext_set_language(ln) uid = getUid(req) user_info = collect_user_info(req) verbose = args['verbose'] - if verbose >= 1 and acc_authorize_action(user_info, 'fulltext')[0] != 0: + if verbose >= 1 and not isUserSuperAdmin(user_info): # Only SuperUser can see all the details! verbose = 0 if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE > 1: return page_not_authorized(req, "/record/%s" % self.recid, navmenuid='submit') if record_exists(self.recid) < 1: msg = "

    %s

    " % _("Requested record does not seem to exist.") return warningMsg(msg, req, CFG_SITE_NAME, ln) if record_empty(self.recid): msg = "

    %s

    " % _("Requested record does not seem to have been integrated.") return warningMsg(msg, req, CFG_SITE_NAME, ln) - (auth_code, auth_msg) = check_user_can_view_record(user_info, self.recid) + (auth_code, auth_message) = check_user_can_view_record(user_info, self.recid) if auth_code and user_info['email'] == 'guest' and not user_info['apache_user']: cookie = mail_cookie_create_authorize_action(VIEWRESTRCOLL, {'collection' : guess_primary_collection_of_a_record(self.recid)}) target = '/youraccount/login' + \ make_canonical_urlargd({'action': cookie, 'ln' : ln, 'referer' : \ CFG_SITE_URL + user_info['uri']}, {}) return redirect_to_url(req, target) elif auth_code: return page_not_authorized(req, "../", \ - text = auth_msg) + text = auth_message) readonly = CFG_ACCESS_CONTROL_LEVEL_SITE == 1 # From now on: either the user provided a specific file # name (and a possible version), or we return a list of # all the available files. In no case are the docids # visible. try: bibarchive = BibRecDocs(self.recid) except InvenioWebSubmitFileError, e: register_exception(req=req, alert_admin=True) msg = "

    %s

    %s

    " % ( _("The system has encountered an error in retrieving the list of files for this document."), _("The error has been logged and will be taken in consideration as soon as possible.")) return warningMsg(msg, req, CFG_SITE_NAME, ln) + if bibarchive.deleted_p(): + return print_warning(req, _("Requested record does not seem to exist.")) + docname = '' format = '' version = '' + warn = '' if filename: # We know the complete file name, guess which docid it # refers to ## TODO: Change the extension system according to ext.py from setlink ## and have a uniform extension mechanism... docname = file_strip_ext(filename) format = filename[len(docname):] if format and format[0] != '.': format = '.' + format + if args['subformat']: + format += ';%s' % args['subformat'] else: docname = args['docname'] if not format: format = args['format'] + if args['subformat']: + format += ';%s' % args['subformat'] if not version: version = args['version'] # version could be either empty, or all or an integer try: int(version) except ValueError: if version != 'all': version = '' - display_hidden = acc_authorize_action(user_info, 'fulltext')[0] == 0 + display_hidden = isUserSuperAdmin(user_info) if version != 'all': # search this filename in the complete list of files for doc in bibarchive.list_bibdocs(): if docname == doc.get_docname(): try: docfile = doc.get_file(format, version) - except InvenioWebSubmitFileError, msg: - register_exception(req=req, alert_admin=True) - - if docfile.get_status() == '': - # The file is not resticted, let's check for - # collection restriction then. - (auth_code, auth_message) = check_user_can_view_record(user_info, self.recid) - if auth_code: - return warningMsg(_("The collection to which this file belong is restricted: ") + auth_message, req, CFG_SITE_NAME, ln) - else: - # The file is probably restricted on its own. - # Let's check for proper authorization then (auth_code, auth_message) = docfile.is_restricted(req) if auth_code != 0: - return warningMsg(_("This file is restricted: ") + auth_message, req, CFG_SITE_NAME, ln) - - if display_hidden or not docfile.hidden_p(): - if not readonly: - ip = str(req.remote_ip) - res = doc.register_download(ip, version, format, uid) - try: - return docfile.stream(req) - except InvenioWebSubmitFileError, msg: - register_exception(req=req, alert_admin=True) - return warningMsg(_("An error has happened in trying to stream the request file."), req, CFG_SITE_NAME, ln) - else: - warn = print_warning(_("The requested file is hidden and you don't have the proper rights to access it.")) - - elif doc.get_icon() is not None and doc.get_icon().docname == file_strip_ext(filename): - icon = doc.get_icon() - try: - iconfile = icon.get_file(format, version) - except InvenioWebSubmitFileError, msg: - register_exception(req=req, alert_admin=True) - return warningMsg(_("An error has happened in trying to retrieve the corresponding icon."), req, CFG_SITE_NAME, ln) - - if iconfile.get_status() == '': - # The file is not resticted, let's check for - # collection restriction then. - (auth_code, auth_message) = check_user_can_view_record(user_info, self.recid) - if auth_code: - return stream_restricted_icon(req) - else: - # The file is probably restricted on its own. - # Let's check for proper authorization then - (auth_code, auth_message) = iconfile.is_restricted(req) - if auth_code != 0: - return stream_restricted_icon(req) + if get_subformat_from_format(format).startswith('icon'): + return stream_restricted_icon(req) + if user_info['email'] == 'guest' and not user_info['apache_user']: + cookie = mail_cookie_create_authorize_action('viewrestrdoc', {'status' : docfile.get_status()}) + target = '/youraccount/login' + \ + make_canonical_urlargd({'action': cookie, 'ln' : ln, 'referer' : \ + CFG_SITE_URL + user_info['uri']}, {}) + redirect_to_url(req, target) + else: + req.status = apache.HTTP_UNAUTHORIZED + warn += print_warning(_("This file is restricted: ") + auth_message) + break + + if display_hidden or not docfile.hidden_p(): + if not readonly: + ip = str(req.remote_ip) + res = doc.register_download(ip, version, format, uid) + try: + return docfile.stream(req) + except InvenioWebSubmitFileError, msg: + register_exception(req=req, alert_admin=True) + req.status = apache.HTTP_INTERNAL_SERVER_ERROR + return warningMsg(_("An error has happened in trying to stream the request file."), req, CFG_SITE_NAME, ln) + else: + req.status = apache.HTTP_UNAUTHORIZED + warn = print_warning(_("The requested file is hidden and you don't have the proper rights to access it.")) - if not readonly: - ip = str(req.remote_ip) - res = doc.register_download(ip, version, format, uid) - try: - return iconfile.stream(req) except InvenioWebSubmitFileError, msg: register_exception(req=req, alert_admin=True) - return warningMsg(_("An error has happened in trying to stream the corresponding icon."), req, CFG_SITE_NAME, ln) - if docname and format and display_hidden: + if docname and format and not warn: req.status = apache.HTTP_NOT_FOUND - warn = print_warning(_("Requested file does not seem to exist.")) - else: - warn = '' + warn += print_warning(_("Requested file does not seem to exist.")) filelist = bibarchive.display("", version, ln=ln, verbose=verbose, display_hidden=display_hidden) t = warn + websubmit_templates.tmpl_filelist( ln=ln, recid=self.recid, docname=args['docname'], version=version, filelist=filelist) cc = guess_primary_collection_of_a_record(self.recid) unordered_tabs = get_detailed_page_tabs(get_colID(cc), self.recid, ln) ordered_tabs_id = [(tab_id, values['order']) for (tab_id, values) in unordered_tabs.iteritems()] ordered_tabs_id.sort(lambda x,y: cmp(x[1],y[1])) link_ln = '' if ln != CFG_SITE_LANG: link_ln = '?ln=%s' % ln tabs = [(unordered_tabs[tab_id]['label'], \ '%s/record/%s/%s%s' % (CFG_SITE_URL, self.recid, tab_id, link_ln), \ tab_id == 'files', unordered_tabs[tab_id]['enabled']) \ for (tab_id, order) in ordered_tabs_id if unordered_tabs[tab_id]['visible'] == True] top = webstyle_templates.detailed_record_container_top(self.recid, tabs, args['ln']) bottom = webstyle_templates.detailed_record_container_bottom(self.recid, tabs, args['ln']) title, description, keywords = websearch_templates.tmpl_record_page_header_content(req, self.recid, args['ln']) return pageheaderonly(title=title, navtrail=create_navtrail_links(cc=cc, aas=0, ln=ln) + \ ''' > %s > %s''' % \ (CFG_SITE_URL, self.recid, title, _("Access to Fulltext")), description="", keywords="keywords", uid=uid, language=ln, req=req, navmenuid='search', navtrail_append_title_p=0) + \ websearch_templates.tmpl_search_pagestart(ln) + \ top + t + bottom + \ websearch_templates.tmpl_search_pageend(ln) + \ pagefooteronly(lastupdated=__lastupdated__, language=ln, req=req) return getfile, [] def __call__(self, req, form): """Called in case of URLs like /record/123/files without trailing slash. """ args = wash_urlargd(form, websubmit_templates.files_default_urlargd) ln = args['ln'] link_ln = '' if ln != CFG_SITE_LANG: link_ln = '?ln=%s' % ln return redirect_to_url(req, '%s/record/%s/files/%s' % (CFG_SITE_URL, self.recid, link_ln)) def websubmit_legacy_getfile(req, form): """ Handle legacy /getfile.py URLs """ args = wash_urlargd(form, { 'recid': (int, 0), 'docid': (int, 0), 'version': (str, ''), 'name': (str, ''), 'format': (str, ''), 'ln' : (str, CFG_SITE_LANG) }) _ = gettext_set_language(args['ln']) def _getfile_py(req, recid=0, docid=0, version="", name="", format="", ln=CFG_SITE_LANG): if not recid: ## Let's obtain the recid from the docid if docid: try: bibdoc = BibDoc(docid=docid) recid = bibdoc.get_recid() except InvenioWebSubmitFileError, e: return warningMsg(_("An error has happened in trying to retrieve the requested file."), req, CFG_SITE_NAME, ln) else: return warningMsg(_('Not enough information to retrieve the document'), req, CFG_SITE_NAME, ln) else: if not name and docid: ## Let's obtain the name from the docid try: bibdoc = BibDoc(docid) name = bibdoc.get_docname() except InvenioWebSubmitFileError, e: return warningMsg(_("An error has happened in trying to retrieving the requested file."), req, CFG_SITE_NAME, ln) format = normalize_format(format) redirect_to_url(req, '%s/record/%s/files/%s%s?ln=%s%s' % (CFG_SITE_URL, recid, name, format, ln, version and 'version=%s' % version or ''), apache.HTTP_MOVED_PERMANENTLY) return _getfile_py(req, **args) # -------------------------------------------------- from invenio.websubmit_engine import home, action, interface, endaction class WebInterfaceSubmitPages(WebInterfaceDirectory): _exports = ['summary', 'sub', 'direct', '', 'attachfile', 'uploadfile', 'getuploadedfile'] def uploadfile(self, req, form): """ Similar to /submit, but only consider files. Nice for asynchronous Javascript uploads. Should be used to upload a single file. Also try to create an icon, and return URL to file(s) + icon(s) Authentication is performed based on session ID passed as parameter instead of cookie-based authentication, due to the use of this URL by the Flash plugin (to upload multiple files at once), which does not route cookies. FIXME: consider adding /deletefile and /modifyfile functions + parsing of additional parameters to rename files, add comments, restrictions, etc. """ if sys.hexversion < 0x2060000: try: import simplejson as json simplejson_available = True except ImportError: # Okay, no Ajax app will be possible, but continue anyway, # since this package is only recommended, not mandatory. simplejson_available = False else: import json simplejson_available = True argd = wash_urlargd(form, { 'doctype': (str, ''), 'access': (str, ''), 'indir': (str, ''), 'session_id': (str, ''), 'rename': (str, ''), }) curdir = None if not form.has_key("indir") or \ not form.has_key("doctype") or \ not form.has_key("access"): return apache.HTTP_BAD_REQUEST else: curdir = os.path.join(CFG_WEBSUBMIT_STORAGEDIR, argd['indir'], argd['doctype'], argd['access']) user_info = collect_user_info(req) if form.has_key("session_id"): # Are we uploading using Flash, which does not transmit # cookie? The expect to receive session_id as a form # parameter. First check that IP addresses do not # mismatch. A ValueError will be raises if there is # something wrong session = get_session(req=req, sid=argd['session_id']) try: session = get_session(req=req, sid=argd['session_id']) except ValueError, e: return apache.HTTP_BAD_REQUEST # Retrieve user information. We cannot rely on the session here. res = run_sql("SELECT uid FROM session WHERE session_key=%s", (argd['session_id'],)) if len(res): uid = res[0][0] user_info = collect_user_info(uid) try: act_fd = file(os.path.join(curdir, 'act')) action = act_fd.read() act_fd.close() except: act = "" # Is user authorized to perform this action? - (auth_code, auth_msg) = acc_authorize_action(uid, "submit", + (auth_code, auth_message) = acc_authorize_action(uid, "submit", verbose=0, doctype=argd['doctype'], act=action) if acc_is_role("submit", doctype=argd['doctype'], act=action) and auth_code != 0: # User cannot submit return apache.HTTP_UNAUTHORIZED else: # Process the upload and get the response added_files = {} for key, formfields in form.items(): filename = key.replace("[]", "") file_to_open = os.path.join(curdir, filename) if hasattr(formfields, "filename") and formfields.filename: dir_to_open = os.path.abspath(os.path.join(curdir, 'files', str(user_info['uid']), key)) try: assert(dir_to_open.startswith(CFG_WEBSUBMIT_STORAGEDIR)) except AssertionError: register_exception(req=req, prefix='curdir="%s", key="%s"' % (curdir, key)) return apache.HTTP_FORBIDDEN if not os.path.exists(dir_to_open): try: os.makedirs(dir_to_open) except: register_exception(req=req, alert_admin=True) return apache.HTTP_FORBIDDEN filename = formfields.filename ## Before saving the file to disc, wash the filename (in particular ## washing away UNIX and Windows (e.g. DFS) paths): filename = os.path.basename(filename.split('\\')[-1]) filename = filename.strip() if filename != "": # Check that file does not already exist n = 1 while os.path.exists(os.path.join(dir_to_open, filename)): #dirname, basename, extension = decompose_file(new_destination_path) basedir, name, extension = decompose_file(filename) new_name = propose_next_docname(name) filename = new_name + extension # This may be dangerous if the file size is bigger than the available memory fp = open(os.path.join(dir_to_open, filename), "w") fp.write(formfields.file.read()) fp.close() fp = open(os.path.join(curdir, "lastuploadedfile"), "w") fp.write(filename) fp.close() fp = open(file_to_open, "w") fp.write(filename) fp.close() try: # Create icon (icon_path, icon_name) = create_icon( { 'input-file' : os.path.join(dir_to_open, filename), 'icon-name' : filename, # extension stripped automatically 'icon-file-format' : 'gif', 'multipage-icon' : False, 'multipage-icon-delay' : 100, 'icon-scale' : "300>", # Resize only if width > 300 'verbosity' : 0, }) icons_dir = os.path.join(os.path.join(curdir, 'icons', str(user_info['uid']), key)) if not os.path.exists(icons_dir): # Create uid/icons dir if needed os.makedirs(icons_dir) os.rename(os.path.join(icon_path, icon_name), os.path.join(icons_dir, icon_name)) added_files[key] = {'name': filename, 'iconName': icon_name} except InvenioWebSubmitIconCreatorError, e: # We could not create the icon added_files[key] = {'name': filename} continue else: return apache.HTTP_BAD_REQUEST # Send our response if simplejson_available: return json.dumps(added_files) def getuploadedfile(self, req, form): """ Stream uploaded files. For the moment, restrict to files in ./curdir/files/uid or ./curdir/icons/uid directory, so that we are sure we stream files only to the user who uploaded them. """ argd = wash_urlargd(form, {'indir': (str, None), 'doctype': (str, None), 'access': (str, None), 'icon': (int, 0), 'key': (str, None), 'filename': (str, None)}) if None in argd.values(): return apache.HTTP_BAD_REQUEST uid = getUid(req) if argd['icon']: file_path = os.path.join(CFG_WEBSUBMIT_STORAGEDIR, argd['indir'], argd['doctype'], argd['access'], 'icons', str(uid), argd['key'], argd['filename'] ) else: file_path = os.path.join(CFG_WEBSUBMIT_STORAGEDIR, argd['indir'], argd['doctype'], argd['access'], 'files', str(uid), argd['key'], argd['filename'] ) abs_file_path = os.path.abspath(file_path) if abs_file_path.startswith(CFG_WEBSUBMIT_STORAGEDIR): # Check if file exist. Note that icon might not yet have # been created. for i in range(5): if os.path.exists(abs_file_path): return stream_file(req, abs_file_path) time.sleep(1) # Send error 404 in all other cases return apache.HTTP_NOT_FOUND def attachfile(self, req, form): """ Process requests received from FCKeditor to upload files. If the uploaded file is an image, create an icon version """ if not fckeditor_available: return apache.HTTP_NOT_FOUND if not form.has_key('type'): form['type'] = 'File' if not form.has_key('NewFile') or \ not form['type'] in \ ['File', 'Image', 'Flash', 'Media']: return apache.HTTP_NOT_FOUND uid = getUid(req) # URL where the file can be fetched after upload user_files_path = '%(CFG_SITE_URL)s/submit/getattachedfile/%(uid)s' % \ {'uid': uid, 'CFG_SITE_URL': CFG_SITE_URL} # Path to directory where uploaded files are saved user_files_absolute_path = '%(CFG_PREFIX)s/var/tmp/attachfile/%(uid)s' % \ {'uid': uid, 'CFG_PREFIX': CFG_PREFIX} try: os.makedirs(user_files_absolute_path) except: pass # Create a Connector instance to handle the request conn = FCKeditorConnectorInvenio(form, recid=-1, uid=uid, allowed_commands=['QuickUpload'], allowed_types = ['File', 'Image', 'Flash', 'Media'], user_files_path = user_files_path, user_files_absolute_path = user_files_absolute_path) user_info = collect_user_info(req) - (auth_code, auth_msg) = acc_authorize_action(user_info, 'attachsubmissionfile') + (auth_code, auth_message) = acc_authorize_action(user_info, 'attachsubmissionfile') if user_info['email'] == 'guest' and not user_info['apache_user']: # User is guest: must login prior to upload data = conn.sendUploadResults(1, '', '', 'Please login before uploading file.') elif auth_code: # User cannot submit data = conn.sendUploadResults(1, '', '', 'Sorry, you are not allowed to submit files.') else: # Process the upload and get the response data = conn.doResponse() # At this point, the file has been uploaded. The FCKeditor # submit the image in form['NewFile']. However, the image # might have been renamed in between by the FCK connector on # the server side, by appending (%04d) at the end of the base # name. Retrieve that file uploaded_file_path = os.path.join(user_files_absolute_path, form['type'].lower(), form['NewFile'].filename) uploaded_file_path = retrieve_most_recent_attached_file(uploaded_file_path) uploaded_file_name = os.path.basename(uploaded_file_path) # Create an icon if form.get('type','') == 'Image': try: (icon_path, icon_name) = create_icon( { 'input-file' : uploaded_file_path, 'icon-name' : os.path.splitext(uploaded_file_name)[0], 'icon-file-format' : os.path.splitext(uploaded_file_name)[1][1:] or 'gif', 'multipage-icon' : False, 'multipage-icon-delay' : 100, 'icon-scale' : "300>", # Resize only if width > 300 'verbosity' : 0, }) # Move original file to /original dir, and replace it with icon file original_user_files_absolute_path = os.path.join(user_files_absolute_path, 'image', 'original') if not os.path.exists(original_user_files_absolute_path): # Create /original dir if needed os.mkdir(original_user_files_absolute_path) os.rename(uploaded_file_path, original_user_files_absolute_path + os.sep + uploaded_file_name) os.rename(icon_path + os.sep + icon_name, uploaded_file_path) except InvenioWebSubmitIconCreatorError, e: pass # Transform the headers into something ok for mod_python for header in conn.headers: if not header is None: if header[0] == 'Content-Type': req.content_type = header[1] else: req.headers_out[header[0]] = header[1] # Send our response req.send_http_header() req.write(data) def _lookup(self, component, path): """ This handler is invoked for the dynamic URLs (for getting and putting attachments) Eg: /submit/getattachedfile/41336978/image/myfigure.png /submit/attachfile/41336978/image/myfigure.png """ if component == 'getattachedfile' and len(path) > 2: uid = path[0] # uid of the submitter file_type = path[1] # file, image, flash or media (as # defined by FCKeditor) if file_type in ['file', 'image', 'flash', 'media']: file_name = '/'.join(path[2:]) # the filename def answer_get(req, form): """Accessing files attached to submission.""" form['file'] = file_name form['type'] = file_type form['uid'] = uid return self.getattachedfile(req, form) return answer_get, [] # All other cases: file not found return None, [] def getattachedfile(self, req, form): """ Returns a file uploaded to the submission 'drop box' by the FCKeditor. """ argd = wash_urlargd(form, {'file': (str, None), 'type': (str, None), 'uid': (int, 0)}) # Can user view this record, i.e. can user access its # attachments? uid = getUid(req) user_info = collect_user_info(req) if not argd['file'] is None: # Prepare path to file on disk. Normalize the path so that # ../ and other dangerous components are removed. path = os.path.abspath(CFG_PREFIX + '/var/tmp/attachfile/' + \ '/' + str(argd['uid']) + \ '/' + argd['type'] + '/' + argd['file']) # Check that we are really accessing attachements # directory, for the declared record. if path.startswith(CFG_PREFIX + '/var/tmp/attachfile/') and os.path.exists(path): return stream_file(req, path) # Send error 404 in all other cases return(apache.HTTP_NOT_FOUND) def direct(self, req, form): """Directly redirected to an initialized submission.""" args = wash_urlargd(form, {'sub': (str, ''), 'access' : (str, '')}) sub = args['sub'] access = args['access'] ln = args['ln'] _ = gettext_set_language(ln) uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "direct", navmenuid='submit') myQuery = req.args if not sub: return warningMsg(_("Sorry, 'sub' parameter missing..."), req, ln=ln) res = run_sql("SELECT docname,actname FROM sbmIMPLEMENT WHERE subname=%s", (sub,)) if not res: return warningMsg(_("Sorry. Cannot analyse parameter"), req, ln=ln) else: # get document type doctype = res[0][0] # get action name action = res[0][1] # retrieve other parameter values params = dict(form) # find existing access number if not access: # create 'unique' access number pid = os.getpid() now = time.time() access = "%i_%s" % (now,pid) # retrieve 'dir' value res = run_sql ("SELECT dir FROM sbmACTION WHERE sactname=%s", (action,)) dir = res[0][0] mainmenu = req.headers_in.get('referer') params['access'] = access params['act'] = action params['doctype'] = doctype params['startPg'] = '1' params['mainmenu'] = mainmenu params['ln'] = ln params['indir'] = dir url = "%s/submit?%s" % (CFG_SITE_URL, urlencode(params)) redirect_to_url(req, url) def sub(self, req, form): """DEPRECATED: /submit/sub is deprecated now, so raise email to the admin (but allow submission to continue anyway)""" args = wash_urlargd(form, {'password': (str, '')}) uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../sub/", navmenuid='submit') try: raise DeprecationWarning, 'submit/sub handler has been used. Please use submit/direct. e.g. "submit/sub?RN=123@SBIFOO" -> "submit/direct?RN=123&sub=SBIFOO"' except DeprecationWarning: register_exception(req=req, alert_admin=True) ln = args['ln'] _ = gettext_set_language(ln) #DEMOBOO_RN=DEMO-BOOK-2008-001&ln=en&password=1223993532.26572%40APPDEMOBOO params = dict(form) password = args['password'] if password: del params['password'] if "@" in password: params['access'], params['sub'] = password.split('@', 1) else: params['sub'] = password else: args = str(req.args).split('@') if len(args) > 1: params = {'sub' : args[-1]} args = '@'.join(args[:-1]) params.update(cgi.parse_qs(args)) else: return warningMsg(_("Sorry, invalid URL..."), req, ln=ln) url = "%s/submit/direct?%s" % (CFG_SITE_URL, urlencode(params, doseq=True)) redirect_to_url(req, url) def summary(self, req, form): args = wash_urlargd(form, { 'doctype': (str, ''), 'act': (str, ''), 'access': (str, ''), 'indir': (str, '')}) uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../summary", navmenuid='submit') t="" curdir = os.path.join(CFG_WEBSUBMIT_STORAGEDIR, args['indir'], args['doctype'], args['access']) try: assert(curdir == os.path.abspath(curdir)) except AssertionError: register_exception(req=req, alert_admin=True, prefix='Possible cracking tentative: indir="%s", doctype="%s", access="%s"' % (args['indir'], args['doctype'], args['access'])) return warningMsg("Invalid parameters") subname = "%s%s" % (args['act'], args['doctype']) res = run_sql("select sdesc,fidesc,pagenb,level from sbmFIELD where subname=%s " "order by pagenb,fieldnb", (subname,)) nbFields = 0 values = [] for arr in res: if arr[0] != "": val = { 'mandatory' : (arr[3] == 'M'), 'value' : '', 'page' : arr[2], 'name' : arr[0], } if os.path.exists(os.path.join(curdir, curdir,arr[1])): fd = open(os.path.join(curdir, arr[1]),"r") value = fd.read() fd.close() value = value.replace("\n"," ") value = value.replace("Select:","") else: value = "" val['value'] = value values.append(val) return websubmit_templates.tmpl_submit_summary( ln = args['ln'], values = values, ) def index(self, req, form): args = wash_urlargd(form, { 'c': (str, CFG_SITE_NAME), 'doctype': (str, ''), 'act': (str, ''), 'startPg': (str, "1"), 'access': (str, ''), 'mainmenu': (str, ''), 'fromdir': (str, ''), 'nextPg': (str, ''), 'nbPg': (str, ''), 'curpage': (str, '1'), 'step': (str, '0'), 'mode': (str, 'U'), }) ## Strip whitespace from beginning and end of doctype and action: args["doctype"] = args["doctype"].strip() args["act"] = args["act"].strip() def _index(req, c, ln, doctype, act, startPg, access, mainmenu, fromdir, nextPg, nbPg, curpage, step, mode): uid = getUid(req) if isGuestUser(uid): return redirect_to_url(req, "%s/youraccount/login%s" % ( CFG_SITE_SECURE_URL, make_canonical_urlargd({ 'referer' : CFG_SITE_URL + req.unparsed_uri, 'ln' : args['ln']}, {}))) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../submit", navmenuid='submit') if doctype=="": return home(req,c,ln) elif act=="": return action(req,c,ln,doctype) elif int(step)==0: return interface(req, c, ln, doctype, act, startPg, access, mainmenu, fromdir, nextPg, nbPg, curpage) else: return endaction(req, c, ln, doctype, act, startPg, access,mainmenu, fromdir, nextPg, nbPg, curpage, step, mode) return _index(req, **args) # Answer to both /submit/ and /submit __call__ = index def errorMsg(title, req, c=None, ln=CFG_SITE_LANG): # load the right message language _ = gettext_set_language(ln) if c is None: c = CFG_SITE_NAME_INTL.get(ln, CFG_SITE_NAME) return page(title = _("Error"), body = create_error_box(req, title=title, verbose=0, ln=ln), description="%s - Internal Error" % c, keywords="%s, Internal Error" % c, uid = getUid(req), language=ln, req=req, navmenuid='submit') def warningMsg(title, req, c=None, ln=CFG_SITE_LANG): # load the right message language _ = gettext_set_language(ln) if c is None: c = CFG_SITE_NAME_INTL.get(ln, CFG_SITE_NAME) return page(title = _("Warning"), body = title, description="%s - Internal Error" % c, keywords="%s, Internal Error" % c, uid = getUid(req), language=ln, req=req, navmenuid='submit') def print_warning(msg, type='', prologue='
    ', epilogue='
    '): """Prints warning message and flushes output.""" if msg: return websubmit_templates.tmpl_print_warning( msg = msg, type = type, prologue = prologue, epilogue = epilogue, ) else: return '' def retrieve_most_recent_attached_file(file_path): """ Retrieve the latest file that has been uploaded with the FCKeditor. This is the only way to retrieve files that the FCKeditor has renamed after the upload. Eg: 'prefix/image.jpg' was uploaded but did already exist. FCKeditor silently renamed it to 'prefix/image(1).jpg': >>> retrieve_most_recent_attached_file('prefix/image.jpg') 'prefix/image(1).jpg' """ (base_path, filename) = os.path.split(file_path) base_name = os.path.splitext(filename)[0] file_ext = os.path.splitext(filename)[1][1:] most_recent_filename = filename i = 0 while True: i += 1 possible_filename = "%s(%d).%s" % \ (base_name, i, file_ext) if os.path.exists(base_path + os.sep + possible_filename): most_recent_filename = possible_filename else: break return os.path.join(base_path, most_recent_filename) diff --git a/modules/websubmit/lib/websubmitadmin_dblayer.py b/modules/websubmit/lib/websubmitadmin_dblayer.py index fe9653c1f..48fb62261 100644 --- a/modules/websubmit/lib/websubmitadmin_dblayer.py +++ b/modules/websubmit/lib/websubmitadmin_dblayer.py @@ -1,3216 +1,3216 @@ # -*- coding: utf-8 -*- ## ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __revision__ = "$Id$" from invenio.dbquery import run_sql from invenio.websubmitadmin_config import * from random import seed, randint ## Functions related to the organisation of catalogues: def insert_submission_collection(collection_name): qstr = """INSERT INTO sbmCOLLECTION (name) VALUES (%s)""" qres = run_sql(qstr, (collection_name,)) return int(qres) def update_score_of_collection_child_of_submission_collection_at_scorex(id_father, old_score, new_score): qstr = """UPDATE sbmCOLLECTION_sbmCOLLECTION """ \ """SET catalogue_order=%s WHERE id_father=%s AND catalogue_order=%s""" qres = run_sql(qstr, (new_score, id_father, old_score)) return 0 def update_score_of_collection_child_of_submission_collection_with_colid_and_scorex(id_father, id_son, old_score, new_score): qstr = """UPDATE sbmCOLLECTION_sbmCOLLECTION """ \ """SET catalogue_order=%s """ \ """WHERE id_father=%s AND id_son=%s AND catalogue_order=%s""" qres = run_sql(qstr, (new_score, id_father, id_son, old_score)) return 0 def update_score_of_doctype_child_of_submission_collection_at_scorex(id_father, old_score, new_score): qstr = """UPDATE sbmCOLLECTION_sbmDOCTYPE """ \ """SET catalogue_order=%s WHERE id_father=%s AND catalogue_order=%s""" qres = run_sql(qstr, (new_score, id_father, old_score)) return 0 def update_score_of_doctype_child_of_submission_collection_with_doctypeid_and_scorex(id_father, id_son, old_score, new_score): qstr = """UPDATE sbmCOLLECTION_sbmDOCTYPE """ \ """SET catalogue_order=%s """ \ """WHERE id_father=%s AND id_son=%s AND catalogue_order=%s""" qres = run_sql(qstr, (new_score, id_father, id_son, old_score)) return 0 def get_id_father_of_collection(collection_id): qstr = """SELECT id_father FROM sbmCOLLECTION_sbmCOLLECTION """ \ """WHERE id_son=%s """ \ """LIMIT 1""" qres = run_sql(qstr, (collection_id,)) try: return int(qres[0][0]) except (TypeError, IndexError): return None def get_maximum_catalogue_score_of_collection_children_of_submission_collection(collection_id): qstr = """SELECT IFNULL(MAX(catalogue_order), 0) """ \ """FROM sbmCOLLECTION_sbmCOLLECTION """ \ """WHERE id_father=%s""" qres = int(run_sql(qstr, (collection_id,))[0][0]) return qres def get_score_of_collection_child_of_submission_collection(id_father, id_son): qstr = """SELECT catalogue_order FROM sbmCOLLECTION_sbmCOLLECTION """ \ """WHERE id_son=%s and id_father=%s """ \ """LIMIT 1""" qres = run_sql(qstr, (id_son, id_father)) try: return int(qres[0][0]) except (TypeError, IndexError): return None def get_score_of_previous_collection_child_above(id_father, score): qstr = """SELECT MAX(catalogue_order) """ \ """FROM sbmCOLLECTION_sbmCOLLECTION """ \ """WHERE id_father=%s and catalogue_order < %s""" qres = run_sql(qstr, (id_father, score)) try: return int(qres[0][0]) except (TypeError, IndexError): return None def get_score_of_next_collection_child_below(id_father, score): qstr = """SELECT MIN(catalogue_order) """ \ """FROM sbmCOLLECTION_sbmCOLLECTION """ \ """WHERE id_father=%s and catalogue_order > %s""" qres = run_sql(qstr, (id_father, score)) try: return int(qres[0][0]) except (TypeError, IndexError): return None def get_catalogue_score_of_doctype_child_of_submission_collection(id_father, id_son): qstr = """SELECT catalogue_order FROM sbmCOLLECTION_sbmDOCTYPE """ \ """WHERE id_son=%s and id_father=%s """ \ """LIMIT 1""" qres = run_sql(qstr, (id_son, id_father)) try: return int(qres[0][0]) except (TypeError, IndexError): return None def get_score_of_previous_doctype_child_above(id_father, score): qstr = """SELECT MAX(catalogue_order) """ \ """FROM sbmCOLLECTION_sbmDOCTYPE """ \ """WHERE id_father=%s and catalogue_order < %s""" qres = run_sql(qstr, (id_father, score)) try: return int(qres[0][0]) except (TypeError, IndexError): return None def get_score_of_next_doctype_child_below(id_father, score): qstr = """SELECT MIN(catalogue_order) """ \ """FROM sbmCOLLECTION_sbmDOCTYPE """ \ """WHERE id_father=%s and catalogue_order > %s""" qres = run_sql(qstr, (id_father, score)) try: return int(qres[0][0]) except (TypeError, IndexError): return None def get_maximum_catalogue_score_of_doctype_children_of_submission_collection(collection_id): qstr = """SELECT IFNULL(MAX(catalogue_order), 0) """ \ """FROM sbmCOLLECTION_sbmDOCTYPE """ \ """WHERE id_father=%s""" qres = int(run_sql(qstr, (collection_id,))[0][0]) return qres def insert_collection_child_for_submission_collection(id_father, id_son, score): qstr = """INSERT INTO sbmCOLLECTION_sbmCOLLECTION (id_father, id_son, catalogue_order) """ \ """VALUES (%s, %s, %s)""" qres = run_sql(qstr, (id_father, id_son, score)) def insert_doctype_child_for_submission_collection(id_father, id_son, score): qstr = """INSERT INTO sbmCOLLECTION_sbmDOCTYPE (id_father, id_son, catalogue_order) """ \ """VALUES (%s, %s, %s)""" qres = run_sql(qstr, (id_father, id_son, score)) def get_doctype_children_of_collection(id_father): """Get details of all 'doctype' children of a given collection. For each doctype, get: * doctype ID * doctype long-name * doctype catalogue-order The document type children retrieved are ordered in ascending order of 'catalogue order'. @param id_father: (integer) - the ID of the parent collection for which doctype children are to be retrieved. @return: (tuple) of tuples. Each tuple is a row giving the following details of a doctype: (doctype_id, doctype_longname, doctype_catalogue_order) """ ## query to retrieve details of doctypes attached to a given collection: qstr_doctype_children = """SELECT col_doctype.id_son, doctype.ldocname, col_doctype.catalogue_order """ \ """FROM sbmCOLLECTION_sbmDOCTYPE AS col_doctype """ \ """INNER JOIN sbmDOCTYPE AS doctype """ \ """ON col_doctype.id_son = doctype.sdocname """ \ """WHERE id_father=%s ORDER BY catalogue_order ASC""" res_doctype_children = run_sql(qstr_doctype_children, (id_father,)) ## return the result of this query: return res_doctype_children def get_collection_children_of_collection(id_father): """Get the collection ids of all 'collection' children of a given collection. @param id_father: (integer) the ID of the parent collection for which collection are to be retrieved. @return: (tuple) of tuples. Each tuple is a row containing the collection ID of a 'collection' child of the given parent collection. """ ## query to retrieve IDs of collections attached to a given collection: qstr_collection_children = """SELECT id_son FROM sbmCOLLECTION_sbmCOLLECTION WHERE id_father=%s ORDER BY catalogue_order ASC""" res_collection_children = run_sql(qstr_collection_children, (id_father,)) ## return the result of this query: return res_collection_children def get_id_and_score_of_collection_children_of_collection(id_father): """Get the collection ids and catalogue score positions of all 'collection' children of a given collection. @param id_father: (integer) the ID of the parent collection for which collection are to be retrieved. @return: (tuple) of tuples. Each tuple is a row containing the collection ID and the catalogue-score position of a 'collection' child of the given parent collection: (id, catalogue-score) """ ## query to retrieve IDs of collections attached to a given collection: qstr_collection_children = """SELECT id_son, catalogue_order """ \ """FROM sbmCOLLECTION_sbmCOLLECTION """ \ """WHERE id_father=%s ORDER BY catalogue_order ASC""" res_collection_children = run_sql(qstr_collection_children, (id_father,)) ## return the result of this query: return res_collection_children def get_number_of_rows_for_submission_collection_as_submission_tree_branch(collection_id): """Get the number of rows found for a submission-collection as a branch of the submission tree. @param collection_id: (integer) - the id of the submission-collection. @return: (integer) - number of rows found by the query. """ qstr = """SELECT COUNT(*) FROM sbmCOLLECTION_sbmCOLLECTION WHERE id_son=%s""" return int(run_sql(qstr, (collection_id,))[0][0]) def get_number_of_rows_for_submission_collection(collection_id): """Get the number of rows found for a submission-collection. @param collection_id: (integer) - the id of the submission-collection. @return: (integer) - number of rows found by the query. """ qstr = """SELECT COUNT(*) FROM sbmCOLLECTION WHERE id=%s""" return int(run_sql(qstr, (collection_id,))[0][0]) def delete_submission_collection_details(collection_id): """Delete the details of a submission-collection from the database. @param collection_id: (integer) - the ID of the submission-collection whose details are to be deleted from the WebSubmit database. @return: (integer) - error code: 0 on successful delete; 1 on failure to delete. """ qstr = """DELETE FROM sbmCOLLECTION WHERE id=%s""" run_sql(qstr, (collection_id,)) ## check to see if submission-collection details deleted: numrows_submission_collection = get_number_of_rows_for_submission_collection(collection_id) if numrows_submission_collection == 0: ## everything OK - no doctype-children remain for this submission-collection return 0 else: ## everything NOT OK - still rows remaining for this submission-collection ## make a last attempt to delete them: run_sql(qstr, (collection_id,)) ## once more, check the number of rows remaining for this submission-collection: numrows_submission_collection = get_number_of_rows_for_submission_collection(collection_id) if numrows_submission_collection == 0: ## Everything OK - submission-collection deleted return 0 else: ## still could not delete the submission-collection return 1 def delete_submission_collection_from_submission_tree(collection_id): """Delete a submission-collection from the submission tree. @param collection_id: (integer) - the ID of the submission-collection whose details are to be deleted from the WebSubmit database. @return: (integer) - error code: 0 on successful delete; 1 on failure to delete. """ qstr = """DELETE FROM sbmCOLLECTION_sbmCOLLECTION WHERE id_son=%s""" run_sql(qstr, (collection_id,)) ## check to ensure that the submission-collection was deleted from the tree: numrows_collection = \ get_number_of_rows_for_submission_collection_as_submission_tree_branch(collection_id) if numrows_collection == 0: ## everything OK - this submission-collection does not exist as a branch on the submission tree return 0 else: ## submission-collection still exists as a branch of the submission tree ## try once more to delete it: run_sql(qstr, (collection_id,)) numrows_collection = \ get_number_of_rows_for_submission_collection_as_submission_tree_branch(collection_id) if numrows_collection == 0: ## deleted successfully this time: return 0 else: ## Still unable to delete return 1 def get_collection_name(collection_id): """Get the name of a given collection. @param collection_id: (integer) - the ID of the collection for which whose name is to be retrieved @return: (string or None) the name of the collection if it exists, None if no rows were returned """ collection_name = None ## query to retrieve the name of a given collection: qstr_collection_name = """SELECT name FROM sbmCOLLECTION WHERE id=%s""" ## get the name of this collection: res_collection_name = run_sql(qstr_collection_name, (collection_id,)) try: collection_name = res_collection_name[0][0] except IndexError: pass ## return the collection name: return collection_name def delete_doctype_children_from_submission_collection(collection_id): """Delete all doctype-children of a submission-collection. @param collection_id: (integer) - the ID of the submission-collection from which the doctype-children are to be deleted. @return: (integer) - error code: 0 on successful delete; 1 on failure to delete. """ qstr = """DELETE FROM sbmCOLLECTION_sbmDOCTYPE WHERE id_father=%s""" run_sql(qstr, (collection_id,)) ## check to see if doctype-children still remain attached to submission-collection: num_doctype_children = get_number_of_doctype_children_of_submission_collection(collection_id) if num_doctype_children == 0: ## everything OK - no doctype-children remain for this submission-collection return 0 else: ## everything NOT OK - still doctype-children remaining for this submission-collection ## make a last attempt to delete them: run_sql(qstr, (collection_id,)) ## once more, check the number of doctype-children remaining num_doctype_children = get_number_of_doctype_children_of_submission_collection(collection_id) if num_doctype_children == 0: ## Everything OK - all doctype-children deleted this time return 0 else: ## still could not delete the doctype-children from this submission return 1 def get_details_of_all_submission_collections(): """Get the id and name of all submission-collections. @return: (tuple) of tuples - (collection-id, collection-name) """ qstr_collections = """SELECT id, name from sbmCOLLECTION order by id ASC""" res_collections = run_sql(qstr_collections) return res_collections def get_count_of_doctype_instances_at_score_for_collection(doctypeid, id_father, catalogue_score): """Get the number of rows found for a given doctype as attached to a given position on a query tree. @param doctypeid: (string) - the identifier for the given document type. @param id_father: (integer) - the id of the submission-collection to which the doctype is attached. @param catalogue_posn: (integer) - the score of the document type for that catalogue connection. @return: (integer) - number of rows found by the query. """ qstr = """SELECT COUNT(*) FROM sbmCOLLECTION_sbmDOCTYPE WHERE id_father=%s AND id_son=%s AND catalogue_order=%s""" return int(run_sql(qstr, (id_father, doctypeid, catalogue_score))[0][0]) def get_number_of_doctype_children_of_submission_collection(collection_id): """Get the number of rows found for doctype-children as attached to a given submission-collection. @param collection_id: (integer) - the id of the submission-collection to which the doctype-children are attached. @return: (integer) - number of rows found by the query. """ qstr = """SELECT COUNT(*) FROM sbmCOLLECTION_sbmDOCTYPE WHERE id_father=%s""" return int(run_sql(qstr, (collection_id,))[0][0]) def delete_doctype_from_position_on_submission_page(doctypeid, id_father, catalogue_score): """Delete a document type from a given score position of a given submission-collection. @param doctypeid: (string) - the ID of the document type that is to be deleted from the submission-collection. @param id_father: (integer) - the ID of the submission-collection from which the document type is to be deleted. @param catalogue_score: (integer) - the score of the submission-collection at which the document type to be deleted is connected. @return: (integer) - error code: 0 if delete was successful; 1 if delete failed; """ qstr = """DELETE FROM sbmCOLLECTION_sbmDOCTYPE WHERE id_father=%s AND id_son=%s AND catalogue_order=%s""" run_sql(qstr, (id_father, doctypeid, catalogue_score)) ## check to see whether this doctype was deleted: numrows_doctype = get_count_of_doctype_instances_at_score_for_collection(doctypeid, id_father, catalogue_score) if numrows_doctype == 0: ## delete successful return 0 else: ## unsuccessful delete - try again run_sql(qstr, (id_father, doctypeid, catalogue_score)) numrows_doctype = get_count_of_doctype_instances_at_score_for_collection(doctypeid, id_father, catalogue_score) if numrows_doctype == 0: ## delete successful return 0 else: ## unable to delete return 1 def update_score_of_doctype_child_of_collection(id_father, id_son, old_catalogue_score, new_catalogue_score): """Update the score of a given doctype child of a submission-collection. @param id_father: (integer) - the ID of the submission-collection whose child's score is to be updated @param id_son: (string) - the ID of the document type to be updated @param old_catalogue_score: (integer) - the score of the submission-collection that the doctype is found at before update @param new_catalogue_score: (integer) - the new value of the doctype's score for the submission-collection @return: (integer) - 0 """ qstr = """UPDATE sbmCOLLECTION_sbmDOCTYPE SET catalogue_order=%s """ \ """WHERE id_father=%s AND id_son=%s AND catalogue_order=%s""" run_sql(qstr, (new_catalogue_score, id_father, id_son, old_catalogue_score)) return 0 def update_score_of_collection_child_of_collection(id_father, id_son, old_catalogue_score, new_catalogue_score): """Update the score of a given collection child ofa submission-collection. @param id_father: (integer) - the ID of the submission-collection whose child's score is to be updated @param id_son: (integer) - the ID of the collection type to be updated @param old_catalogue_score: (integer) - the score of the submission-collection that the collection is found at before update @param new_catalogue_score: (integer) - the new value of the collection's score for the submission-collection @return: (integer) - 0 """ qstr = """UPDATE sbmCOLLECTION_sbmCOLLECTION SET catalogue_order=%s """ \ """WHERE id_father=%s AND id_son=%s AND catalogue_order=%s""" run_sql(qstr, (new_catalogue_score, id_father, id_son, old_catalogue_score)) return 0 def normalize_scores_of_doctype_children_for_submission_collection(collection_id): """Normalize the scores of the doctype-children of a given submission-collection. I.e. set them into the format (1, 2, 3, 4, 5, [...]). @param collection_id: (integer) - the ID of the submission-collection whose doctype-children's scores are to be normalized. @return: None """ ## Get all document types attached to the collection, ordered by score: doctypes = get_doctype_children_of_collection(collection_id) num_doctypes = len(doctypes) normal_score = 1 ## for each document type, if score does not fit with counter, update it: for idx in xrange(0, num_doctypes): this_doctype_id = doctypes[idx][0] this_doctype_score = int(doctypes[idx][2]) if this_doctype_score != normal_score: ## Score of doctype is not good - correct it: update_score_of_doctype_child_of_collection(collection_id, this_doctype_id, \ this_doctype_score, normal_score) normal_score += 1 return def normalize_scores_of_collection_children_of_collection(collection_id): """Normalize the scores of the collection-children of a given submission-collection. I.e. set them into the format (1, 2, 3, 4, 5, [...]). @param collection_id: (integer) - the ID of the submission-collection whose collection-children's scores are to be normalized. @return: None """ ## Get all document types attached to the collection, ordered by score: collections = get_id_and_score_of_collection_children_of_collection(collection_id) num_collections = len(collections) normal_score = 1 ## for each collection, if score does not fit with counter, update it: for idx in xrange(0, num_collections): this_collection_id = collections[idx][0] this_collection_score = int(collections[idx][1]) if this_collection_score != normal_score: ## Score of collection is not good - correct it: update_score_of_collection_child_of_collection(collection_id, this_collection_id, \ this_collection_score, normal_score) normal_score += 1 return ## Functions relating to WebSubmit ACTIONS, their addition, and their modification: def update_action_details(actid, actname, working_dir, status_text): """Update the details of an action in the websubmit database IF there was only one action with that actid (sactname). @param actid: unique action id (sactname) @param actname: action name (lactname) @param working_dir: directory action works from (dir) @param status_text: text string indicating action status (statustext) @return: 0 (ZERO) if update is performed; 1 (ONE) if insert not performed due to rows existing for given action name. """ # Check record with code 'actid' does not already exist: numrows_actid = get_number_actions_with_actid(actid) if numrows_actid == 1: q ="""UPDATE sbmACTION SET lactname=%s, dir=%s, statustext=%s, md=CURDATE() WHERE sactname=%s""" run_sql(q, (actname, working_dir, status_text, actid)) return 0 # Everything is OK else: return 1 # Everything not OK: Either no rows or more than one row for action "actid" def get_action_details(actid): """Get and return a tuple of tuples for all actions with the sactname "actid". @param actid: Action Identifier Code (sactname). @return: tuple of tuples (one tuple per action row): (sactname,lactname,dir,statustext,cd,md). """ q = """SELECT act.sactname, act.lactname, act.dir, act.statustext, act.cd, act.md FROM sbmACTION AS act WHERE act.sactname=%s""" return run_sql(q, (actid,)) def get_actid_actname_allactions(): """Get and return a tuple of tuples containing the "action id" and "action name" for each action in the WebSubmit database. @return: tuple of tuples: (actid,actname) """ q = """SELECT sactname,lactname FROM sbmACTION ORDER BY sactname ASC""" return run_sql(q) def get_number_actions_with_actid(actid): """Return the number of actions found for a given action id. @param actid: action id (sactname) to query for @return: an integer count of the number of actions in the websubmit database for this actid. """ q = """SELECT COUNT(sactname) FROM sbmACTION WHERE sactname=%s""" return int(run_sql(q, (actid,))[0][0]) def insert_action_details(actid, actname, working_dir, status_text): """Insert details of a new action into the websubmit database IF there are not already actions with the same actid (sactname). @param actid: unique action id (sactname) @param actname: action name (lactname) @param working_dir: directory action works from (dir) @param status_text: text string indicating action status (statustext) @return: 0 (ZERO) if insert is performed; 1 (ONE) if insert not performed due to rows existing for given action name. """ # Check record with code 'actid' does not already exist: numrows_actid = get_number_actions_with_actid(actid) if numrows_actid == 0: # insert new action: q = """INSERT INTO sbmACTION (lactname,sactname,dir,cd,md,actionbutton,statustext) VALUES (%s,%s,%s,CURDATE(),CURDATE(),NULL,%s)""" run_sql(q, (actname, actid, working_dir, status_text)) return 0 # Everything is OK else: return 1 # Everything not OK: rows may already exist for action with 'actid' ## Functions relating to WebSubmit Form Element JavaScript CHECKING FUNCTIONS, their addition, and their ## modification: def get_number_jschecks_with_chname(chname): """Return the number of Checks found for a given check name/id. @param chname: Check name/id (chname) to query for @return: an integer count of the number of Checks in the WebSubmit database for this chname. """ q = """SELECT COUNT(chname) FROM sbmCHECKS where chname=%s""" return int(run_sql(q, (chname,))[0][0]) def get_all_jscheck_names(): """Return a list of the names of all WebSubmit JSChecks""" q = """SELECT DISTINCT(chname) FROM sbmCHECKS ORDER BY chname ASC""" res = run_sql(q) return map(lambda x: str(x[0]), res) def get_chname_alljschecks(): """Get and return a tuple of tuples containing the "check name" (chname) for each JavaScript Check in the WebSubmit database. @return: tuple of tuples: (chname) """ q = """SELECT chname FROM sbmCHECKS ORDER BY chname ASC""" return run_sql(q) def get_jscheck_details(chname): """Get and return a tuple of tuples for all Checks with the check id/name "chname". @param chname: Check name/Identifier Code (chname). @return: tuple of tuples (one tuple per check row): (chname,chdesc,cd,md). """ q = """SELECT ch.chname, ch.chdesc, ch.cd, ch.md FROM sbmCHECKS AS ch WHERE ch.chname=%s""" return run_sql(q, (chname,)) def insert_jscheck_details(chname, chdesc): """Insert details of a new JavaScript Check into the WebSubmit database IF there are not already Checks with the same Check-name (chname). @param chname: unique check id/name (chname) @param chdesc: Check description (the JavaScript code body that is the Check) (chdesc) @return: 0 (ZERO) if insert is performed; 1 (ONE) if insert not performed due to rows existing for given Check name/id. """ # Check record with code 'chname' does not already exist: numrows_chname = get_number_jschecks_with_chname(chname) if numrows_chname == 0: # insert new Check: q = """INSERT INTO sbmCHECKS (chname,chdesc,cd,md,chefi1,chefi2) VALUES (%s,%s,CURDATE(),CURDATE(),NULL,NULL)""" run_sql(q, (chname, chdesc)) return 0 # Everything is OK else: return 1 # Everything not OK: rows may already exist for Check with 'chname' def update_jscheck_details(chname, chdesc): """Update the details of a Check in the WebSubmit database IF there was only one Check with that check id/name (chname). @param chname: unique Check id/name (chname) @param chdesc: Check description (the JavaScript code body that is the Check) (chdesc) @return: 0 (ZERO) if update is performed; 1 (ONE) if insert not performed due to rows existing for given Check. """ # Check record with code 'chname' does not already exist: numrows_chname = get_number_jschecks_with_chname(chname) if numrows_chname == 1: q = """UPDATE sbmCHECKS SET chdesc=%s, md=CURDATE() WHERE chname=%s""" run_sql(q, (chdesc, chname)) return 0 # Everything is OK else: return 1 # Everything not OK: Either no rows or more than one row for check "chname" ## Functions relating to WebSubmit FUNCTIONS, their addition, and their modification: def get_function_description(function): """Get and return a tuple containing the function description (description) for the function with the name held in the "function" parameter. @return: tuple of tuple (for one function): ((description,)) """ q = """SELECT description FROM sbmALLFUNCDESCR where function=%s""" return run_sql(q, (function,)) def get_function_parameter_vals_doctype(doctype, paramlist): res = [] q = """SELECT name, value FROM sbmPARAMETERS WHERE doctype=%s AND name=%s""" for par in paramlist: r = run_sql(q, (doctype, par)) if len(r) > 0: res.append(r[0]) else: res.append((par, "")) return res def get_function_parameters(function): """Get the list of paremeters for a given function @param function: the function name @return: tuple of tuple ((param,)) """ q = """SELECT param FROM sbmFUNDESC WHERE function=%s ORDER BY param ASC""" return run_sql(q, (function,)) def get_number_parameters_with_paramname_funcname(funcname, paramname): """Return the number of parameters found for a given function name and parameter name. I.e. count the number of times a given parameter appears for a given function. @param funcname: Function name (function) to query for. @param paramname: name of the parameter whose instances for the given function are to be counted. @return: an integer count of the number of parameters matching the criteria. """ q = """SELECT COUNT(param) FROM sbmFUNDESC WHERE function=%s AND param=%s""" return int(run_sql(q, (funcname, paramname))[0][0]) def get_distinct_paramname_all_function_parameters(): """Get the names of all function parameters. @return: tuple of tuples: (param,) """ q = """SELECT DISTINCT(param) FROM sbmFUNDESC ORDER BY param ASC""" return run_sql(q) def get_distinct_paramname_all_websubmit_parameters(): """Get the names of all WEBSUBMIT parameters (i.e. parameters that are used somewhere by WebSubmit actions. @return: tuple of tuples (param,) """ q = """SELECT DISTINCT(name) FROM sbmPARAMETERS ORDER BY name ASC""" return run_sql(q) def get_distinct_paramname_all_websubmit_function_parameters(): """Get and return a tuple of tuples containing the names of all parameters in the WebSubmit system. @return: tuple of tuples: ((param,),(param,)) """ param_names = {} all_params_list = [] all_function_params = get_distinct_paramname_all_function_parameters() all_websubmit_params = get_distinct_paramname_all_websubmit_parameters() for func_param in all_function_params: param_names[func_param[0]] = None for websubmit_param in all_websubmit_params: param_names[websubmit_param[0]] = None all_params_names = param_names.keys() all_params_names.sort() for param in all_params_names: all_params_list.append((param,)) return all_params_list def regulate_score_of_all_functions_in_step_to_ascending_multiples_of_10_for_submission(doctype, action, step): """Within a step of a submission, regulate the scores of all functions to multiples of 10. For example, for the following: Submission Func Step Score SBITEST Print 2 10 SBITEST Run 2 11 SBITEST Alert 2 20 SBITEST End 2 50 ...regulate the scores like this: Submission Func Step Score SBITEST Print 2 10 SBITEST Run 2 20 SBITEST Alert 2 30 SBITEST End 2 40 @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @param step: (integer) the number of the step in which functions scores are to be regulated @return: None @Exceptions raised: InvenioWebSubmitAdminWarningDeleteFailed - in the case that it wasn't possible to delete functions """ functnres = get_name_step_score_of_all_functions_in_step_of_submission(doctype=doctype, action=action, step=step) i = 1 score_order_broken = 0 for functn in functnres: cur_functn_score = int(functn[2]) if cur_functn_score != i * 10: ## this score is not a correct multiple of 10 for its place in the order score_order_broken = 1 i += 1 if score_order_broken == 1: ## the function scores were not good. ## delete the functions within this step try: delete_all_functions_in_step_of_submission(doctype=doctype, action=action, step=step) except InvenioWebSubmitAdminWarningDeleteFailed, e: ## unable to delete some or all functions ## pass the exception back up to the caller - raise e + raise ## re-insert them with the correct scores i = 10 for functn in functnres: insert_functn_name = functn[0] try: insert_function_into_submission_at_step_and_score(doctype=doctype, action=action, function=insert_functn_name, step=step, score=i) except InvenioWebSubmitAdminWarningReferentialIntegrityViolation, e: ## tried to insert a function that doesn't exist in WebSubmit DB ## TODO : LOG ERROR ## continue onto next loop iteration - don't increment value of I continue i += 10 return def get_number_of_functions_with_functionname_in_submission_at_step_and_score(doctype, action, function, step, score): """Get the number or rows for a particular function at a given step and score of a doctype submission""" q = """SELECT COUNT(doctype) FROM sbmFUNCTIONS where doctype=%s AND action=%s AND function=%s AND step=%s AND score=%s""" return int(run_sql(q, (doctype, action, function, step, score))[0][0]) def get_number_functions_doctypesubmission_step_score(doctype, action, step, score): """Get the number or rows for a particular function at a given step and score of a doctype submission""" q = """SELECT COUNT(doctype) FROM sbmFUNCTIONS where doctype=%s AND action=%s AND step=%s AND score=%s""" return int(run_sql(q, (doctype, action, step, score))[0][0]) def update_step_score_doctypesubmission_function(doctype, action, function, oldstep, oldscore, newstep, newscore): numrows_function = get_number_of_functions_with_functionname_in_submission_at_step_and_score(doctype=doctype, action=action, function=function, step=oldstep, score=oldscore) if numrows_function == 1: q = """UPDATE sbmFUNCTIONS SET step=%s, score=%s WHERE doctype=%s AND action=%s AND function=%s AND step=%s AND score=%s""" run_sql(q, (newstep, newscore, doctype, action, function, oldstep, oldscore)) return 0 ## Everything OK else: ## Everything NOT OK - perhaps this function doesn't exist at this posn - cannot update return 1 def move_position_submissionfunction_up(doctype, action, function, funccurstep, funccurscore): functions_above = get_functionname_step_score_allfunctions_beforereference_doctypesubmission(doctype=doctype, action=action, step=funccurstep, score=funccurscore) numrows_functions_above = len(functions_above) if numrows_functions_above < 1: ## there are no functions above this - nothing to do return 0 ## Everything OK ## get the details of the function above this one: name_function_above = functions_above[numrows_functions_above-1][0] step_function_above = int(functions_above[numrows_functions_above-1][1]) score_function_above = int(functions_above[numrows_functions_above-1][2]) if step_function_above < int(funccurstep): ## the function above the function to be moved is in a lower step. Put the function to be moved in the same step ## as the one above, but set its score to be greater by 10 than the one above error_code = update_step_score_doctypesubmission_function(doctype=doctype, action=action, function=function, oldstep=funccurstep, oldscore=funccurscore, newstep=step_function_above, newscore=int(score_function_above)+10) return error_code else: ## the function above is in the same step as the function to be moved. just switch them around (scores) ## first, delete the function above: error_code = delete_function_doctypesubmission_step_score(doctype=doctype, action=action, function=name_function_above, step=step_function_above, score=score_function_above) if error_code == 0: ## now update the function to be moved with the step and score of the function that was above it error_code = update_step_score_doctypesubmission_function(doctype=doctype, action=action, function=function, oldstep=funccurstep, oldscore=funccurscore, newstep=step_function_above, newscore=score_function_above) if error_code == 0: ## now insert the function that *was* above, into the position of the function that we have just moved try: insert_function_into_submission_at_step_and_score(doctype=doctype, action=action, function=name_function_above, step=funccurstep, score=funccurscore) return 0 except InvenioWebSubmitAdminWarningReferentialIntegrityViolation, e: return 1 else: ## could not update the function that was to be moved! Try to re-insert that which was deleted try: insert_function_into_submission_at_step_and_score(doctype=doctype, action=action, function=name_function_above, step=step_function_above, score=score_function_above) except InvenioWebSubmitAdminWarningReferentialIntegrityViolation, e: pass return 1 ## Returning an ERROR code to signal that the move did not work else: ## Unable to delete the function above that which we want to move. Cannot move the function then. ## Return an error code to signal that things went wrong return 1 def add_10_to_score_of_all_functions_in_step_of_submission(doctype, action, step): """Add 10 to the score of all functions within a particular step of a submission. @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @param step: (integer) the step in which all function scores are to be incremented by 10 @return: None """ q = """UPDATE sbmFUNCTIONS SET score=score+10 WHERE doctype=%s AND action=%s AND step=%s""" run_sql(q, (doctype, action, step)) return def update_score_of_allfunctions_from_score_within_step_in_submission_reduce_by_val(doctype, action, step, fromscore, val): q = """UPDATE sbmFUNCTIONS SET score=score-%s WHERE doctype=%s AND action=%s AND step=%s AND score >= %s""" run_sql(q, (val, doctype, action, step, fromscore)) return def add_10_to_score_of_all_functions_in_step_of_submission_and_with_score_equalto_or_above_val(doctype, action, step, fromscore): """Add 10 to the score of all functions within a particular step of a submission, but with a score equal-to, or higher than a given value (fromscore). @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @param step: (integer) the step in which all function scores are to be incremented by 10 @param fromscore: (integer) the score from which all scores are incremented by 10 @return: None """ q = """UPDATE sbmFUNCTIONS SET score=score+10 WHERE doctype=%s AND action=%s AND step=%s AND score >= %s""" run_sql(q, (doctype, action, step, fromscore)) return def get_number_of_submission_functions_in_step_between_two_scores(doctype, action, step, score1, score2): """Return the number of submission functions found within a particular step of a submission, and between two scores. @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @param step: (integer) the number of the step @param score1: (integer) the first score boundary @param score2: (integer) the second score boundary @return: (integer) the number of functions found """ q = """SELECT COUNT(doctype) FROM sbmFUNCTIONS WHERE doctype=%s AND action=%s AND step=%s AND (score BETWEEN %s AND %s)""" return int(run_sql(q, (doctype, action, step, ((score1 <= score2 and score1) or (score2)), ((score1 <= score2 and score2) or (score1))))[0][0]) def move_submission_function_from_one_position_to_another_position(doctype, action, movefuncname, movefuncfromstep, movefuncfromscore, movefunctostep, movefunctoscore): """Move a submission function from one score/step to another position. @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @param movefuncname: (string) the name of the function to be moved @param movefuncfromstep: (integer) the step in which the function to be moved is located @param movefuncfromscore: (integer) the score at which the function to be moved is located @parm movefunctostep: (integer) the step to which the function is to be moved @param movefunctoscore: (integer) the to which the function is to be moved @return: None @exceptions raised: InvenioWebSubmitAdminWarningDeleteFailed - when unable to delete functions when regulating their scores InvenioWebSubmitAdminWarningNoRowsFound - when the function to be moved is not found InvenioWebSubmitAdminWarningInsertFailed - when regulating the scores of functions, and unable to insert a function InvenioWebSubmitAdminWarningReferentialIntegrityViolation - when the function to be inserted does not exist in WebSubmit InvenioWebSubmitAdminWarningNoUpdate - when the function was not moved because there would have been no change in its position, or because the function could not be moved for some reason """ ## first check that there is a function "movefuncname"->"movefuncfromstep";"movefuncfromscore" numrows_movefunc = \ get_number_of_functions_with_functionname_in_submission_at_step_and_score(doctype=doctype, action=action, function=movefuncname, step=movefuncfromstep, score=movefuncfromscore) if numrows_movefunc < 1: ## the function to move doesn't exist msg = """Could not move function [%s] at step [%s], score [%s] in submission [%s] to another position. """\ """This function does not exist at this position."""\ % (movefuncname, movefuncfromstep, movefuncfromscore, "%s%s" % (action, doctype)) raise InvenioWebSubmitAdminWarningNoRowsFound(msg) ## check that the function is not being moved to the same position: if movefuncfromstep == movefunctostep: num_functs_between_old_and_new_posn =\ get_number_of_submission_functions_in_step_between_two_scores(doctype=doctype, action=action, step=movefuncfromstep, score1=movefuncfromscore, score2=movefunctoscore) if num_functs_between_old_and_new_posn < 3 and (movefuncfromscore <= movefunctoscore): ## moving the function to the same position - no point msg = """The function [%s] of the submission [%s] was not moved from step [%s], score [%s] to """\ """step [%s], score [%s] as there would have been no change in position."""\ % (movefuncname, "%s%s" % (action, doctype), movefuncfromstep, movefuncfromscore, movefunctostep, movefunctoscore) raise InvenioWebSubmitAdminWarningNoUpdate(msg) ## delete the function that is being moved: try: delete_the_function_at_step_and_score_from_a_submission(doctype=doctype, action=action, function=movefuncname, step=movefuncfromstep, score=movefuncfromscore) except InvenioWebSubmitAdminWarningDeleteFailed, e: ## unable to delete the function - cannot perform the move. msg = """Unable to move function [%s] at step [%s], score [%s] of submission [%s] - couldn't """\ """delete the function from its current position."""\ % (movefuncname, movefuncfromstep, movefuncfromscore, "%s%s" % (action, doctype)) raise InvenioWebSubmitAdminWarningNoUpdate(msg) ## now insert the function into its new position and correct the order of all functions within that step: insert_function_into_submission_at_step_and_score_then_regulate_scores_of_functions_in_step(doctype=doctype, action=action, function=movefuncname, step=movefunctostep, score=movefunctoscore) ## regulate the scores of the functions in the step from which the function was moved try: regulate_score_of_all_functions_in_step_to_ascending_multiples_of_10_for_submission(doctype=doctype, action=action, step=movefuncfromstep) except InvenioWebSubmitAdminWarningDeleteFailed, e: ## couldn't delete some or all functions msg = """Moved function [%s] to step [%s], score [%s] of submission [%s]. However, when trying to regulate"""\ """ scores of functions in step [%s], failed to delete some functions. Check that they have not been lost."""\ % (movefuncname, movefuncfromstep, movefuncfromscore, "%s%s" % (action, doctype), movefuncfromstep) raise InvenioWebSubmitAdminWarningDeleteFailed(msg) ## finished return def move_position_submissionfunction_fromposn_toposn(doctype, action, movefuncname, movefuncfromstep, movefuncfromscore, movefunctoname, movefunctostep, movefunctoscore): ## first check that there is a function "movefuncname"->"movefuncfromstep";"movefuncfromscore" numrows_movefunc = get_number_of_functions_with_functionname_in_submission_at_step_and_score(doctype=doctype, action=action, function=movefuncname, step=movefuncfromstep, score=movefuncfromscore) if numrows_movefunc < 1: ## the function to move does not exist! return 1 ## now check that there is a function "movefunctoname"->"movefunctostep";"movefunctoscore" numrows_movefunctoposn = get_number_of_functions_with_functionname_in_submission_at_step_and_score(doctype=doctype, action=action, function=movefunctoname, step=movefunctostep, score=movefunctoscore) if numrows_movefunctoposn < 1: ## the function in the position to move to does not exist! return 1 ## functions_above = get_functionname_step_score_allfunctions_beforereference_doctypesubmission(doctype=doctype, action=action, step=movefunctostep, score=movefunctoscore) numrows_functions_above = len(functions_above) if numrows_functions_above >= 1: function_above_name = functions_above[numrows_functions_above-1][0] function_above_step = int(functions_above[numrows_functions_above-1][1]) function_above_score = int(functions_above[numrows_functions_above-1][2]) ## Check that the place to which we are moving our function is NOT the same place that it is currently ## situated! if (numrows_functions_above < 1) or (int(functions_above[numrows_functions_above-1][1]) < int(movefunctostep)): ### NICK SEPARATE THESE 2 OUT ## EITHER: there are no functions above the destination position; -OR- the function immediately above the ## destination position function is in a lower step. ## So, it is not important to care about any functions above for the move if ((numrows_functions_above < 1) and (int(movefunctoscore) > 10)): ## There is a space of 10 or more between the score of the function into whose place we are moving ## a function, and the one above it. Set the new function score for the moved function as the ## score of the function whose place it is taking in the order - 10 error_code = update_step_score_doctypesubmission_function(doctype=doctype, action=action, function=movefuncname, oldstep=movefuncfromstep, oldscore=movefuncfromscore, newstep=movefunctostep, newscore=int(movefunctoscore)-10) return error_code elif (int(movefunctoscore) - 10 > function_above_score): ## There is a space of 10 or more between the score of the function into whose place we are moving ## a function, and the one above it. Set the new function score for the moved function as the ## score of the function whose place it is taking in the order - 10 error_code = update_step_score_doctypesubmission_function(doctype=doctype, action=action, function=movefuncname, oldstep=movefuncfromstep, oldscore=movefuncfromscore, newstep=movefunctostep, newscore=int(movefunctoscore)-10) return error_code else: ## There is not a space of 10 or more in the scores of the function into whose position we are moving ## a function and the function above it. It is necessary to augment the score of all functions ## within the step of the one into whose position our function will be moved, from that position onwards, ## by 10; then the function to be moved can be inserted into the newly created space ## First, delete the function to be moved so that it is not changed during any augmentation: error_code = delete_function_doctypesubmission_step_score(doctype=doctype, action=action, function=movefuncname, step=movefuncfromstep, score=movefuncfromscore) if error_code == 0: ## deletion successful ## now augment the relevant scores: add_10_to_score_of_all_functions_in_step_of_submission_and_with_score_equalto_or_above_val(doctype=doctype, action=action, step=movefunctostep, fromscore=movefunctoscore) try: insert_function_into_submission_at_step_and_score(doctype=doctype, action=action, function=movefuncname, step=movefunctostep, score=movefunctoscore) except InvenioWebSubmitAdminWarningReferentialIntegrityViolation, e: return 1 return 0 else: ## could not delete it - cannot continue: return 1 else: ## there are functions above the destination position function and they are in the same step as it. if int(movefunctoscore) - 10 > function_above_score: ## the function above has a score that is more than 10 below that into whose position we are moving ## a function. It is therefore possible to set the new score as movefunctoscore - 10: error_code = update_step_score_doctypesubmission_function(doctype=doctype, action=action, function=movefuncname, oldstep=movefuncfromstep, oldscore=movefuncfromscore, newstep=movefunctostep, newscore=int(movefunctoscore)-10) return error_code else: ## there is not a space of 10 or more in the scores of the function into whose position our function ## is to be moved and the function above it. It is necessary to augment the score of all functions ## within the step of the one into whose position our function will be moved, from that position onwards, ## by 10; then the function to be moved can be inserted into the newly created space ## First, delete the function to be moved so that it is not changed during any augmentation: error_code = delete_function_doctypesubmission_step_score(doctype=doctype, action=action, function=movefuncname, step=movefuncfromstep, score=movefuncfromscore) if error_code == 0: ## deletion successful ## now augment the relevant scores: add_10_to_score_of_all_functions_in_step_of_submission_and_with_score_equalto_or_above_val(doctype=doctype, action=action, step=movefunctostep, fromscore=movefunctoscore) try: insert_function_into_submission_at_step_and_score(doctype=doctype, action=action, function=movefuncname, step=movefunctostep, score=movefunctoscore) except InvenioWebSubmitAdminWarningReferentialIntegrityViolation, e: return 1 return 0 else: ## could not delete it - cannot continue: return 1 def move_position_submissionfunction_down(doctype, action, function, funccurstep, funccurscore): functions_below = get_functionname_step_score_allfunctions_afterreference_doctypesubmission(doctype=doctype, action=action, step=funccurstep, score=funccurscore) numrows_functions_below = len(functions_below) if numrows_functions_below < 1: ## there are no functions below this - nothing to do return 0 ## Everything OK ## get the details of the function below this one: name_function_below = functions_below[0][0] step_function_below = int(functions_below[0][1]) score_function_below = int(functions_below[0][2]) if step_function_below > int(funccurstep): ## the function below is in a higher step: update all functions in that step with their score += 10, ## then place the function to be moved into that step with a score of that which the function below had if score_function_below <= 10: ## the score of the function below is 10 or less: add 10 to the score of all functions in that step add_10_to_score_of_all_functions_in_step_of_submission(doctype=doctype, action=action, step=step_function_below) numrows_function_stepscore_moveto = get_number_functions_doctypesubmission_step_score(doctype=doctype, action=action, step=step_function_below, score=score_function_below) if numrows_function_stepscore_moveto == 0: ## the score of the step that the function will be moved to is empty - it's safe to move the function there: error_code = update_step_score_doctypesubmission_function(doctype=doctype, action=action, function=function, oldstep=funccurstep, oldscore=funccurscore, newstep=step_function_below, newscore=score_function_below) return error_code else: ## could not move the functions below? Cannot move this function then return 1 else: ## the function below is already on a score higher than 10 - just move the function into score 10 in that step error_code = update_step_score_doctypesubmission_function(doctype=doctype, action=action, function=function, oldstep=funccurstep, oldscore=funccurscore, newstep=step_function_below, newscore=10) return error_code else: ## the function below is in the same step. Switch it with this function ## first, delete the function below: error_code = delete_function_doctypesubmission_step_score(doctype=doctype, action=action, function=name_function_below, step=step_function_below, score=score_function_below) if error_code == 0: ## now update the function to be moved with the step and score of the function that was below it error_code = update_step_score_doctypesubmission_function(doctype=doctype, action=action, function=function, oldstep=funccurstep, oldscore=funccurscore, newstep=step_function_below, newscore=score_function_below) if error_code == 0: ## now insert the function that *was* below, into the position of the function that has just been moved try: insert_function_into_submission_at_step_and_score(doctype=doctype, action=action, function=name_function_below, step=funccurstep, score=funccurscore) except InvenioWebSubmitAdminWarningReferentialIntegrityViolation, e: return 1 return 0 else: ## could not update the function that was to be moved! Try to re-insert that which was deleted try: insert_function_into_submission_at_step_and_score(doctype=doctype, action=action, function=name_function_below, step=step_function_below, score=score_function_below) except InvenioWebSubmitAdminWarningReferentialIntegrityViolation, e: pass return 1 ## Returning an ERROR code to signal that the move did not work else: ## Unable to delete the function below that which we want to move. Cannot move the function then. ## Return an error code to signal that things went wrong return 1 def get_names_of_all_functions(): """Return a list of the names of all WebSubmit functions (as strings). The function names will be sorted in ascending alphabetical order. @return: a list of strings """ q = """SELECT function FROM sbmALLFUNCDESCR ORDER BY function ASC""" res = run_sql(q) return map(lambda x: str(x[0]), res) def get_funcname_funcdesc_allfunctions(): """Get and return a tuple of tuples containing the "function name" (function) and function textual description (description) for each WebSubmit function in the WebSubmit database. @return: tuple of tuples: ((function,description),(function,description)[,...]) """ q = """SELECT function, description FROM sbmALLFUNCDESCR ORDER BY function ASC""" return run_sql(q) def get_function_usage_details(function): """Get the details of a function's usage in WebSubmit. This means get the following usage details: - doctype: the unique ID of the document type with which the usage is associated - docname: the long-name of the document type - action id: the unique ID of the action of the doctype, with which the usage is associated - action name: the long name of this action - function step: the step in which the instance of function usage occurs - function score: the score (of the above-mentioned step) at which the function is called @param function: (string) the name of the function whose WebSubmit usage is to be examined. @return: tuple of tuples whereby each tuple represents one instance of the function's usage: (doctype, docname, action id, action name, function-step, function-score) """ q = """SELECT fun.doctype, dt.ldocname, fun.action, actn.lactname, fun.step, fun.score """ +\ """FROM sbmDOCTYPE AS dt LEFT JOIN sbmFUNCTIONS AS fun ON (fun.doctype=dt.sdocname) """ +\ """LEFT JOIN sbmIMPLEMENT as imp ON (fun.action=imp.actname AND fun.doctype=imp.docname) """ +\ """LEFT JOIN sbmACTION AS actn ON (actn.sactname=imp.actname) WHERE fun.function=%s """ +\ """ORDER BY dt.sdocname ASC, fun.action ASC, fun.step ASC, fun.score ASC""" return run_sql(q, (function,)) def get_number_of_functions_with_funcname(funcname): """Return the number of Functions found in the WebSubmit DB for a given function name. @param funcname: (string) the name of the function @return: an integer count of the number of Functions in the WebSubmit database for this function name. """ q = """SELECT COUNT(function) FROM sbmALLFUNCDESCR where function=%s""" return int(run_sql(q, (funcname,))[0][0]) def insert_function_details(function, fundescr): """""" numrows_function = get_number_of_functions_with_funcname(function) if numrows_function == 0: ## Insert new function q = """INSERT INTO sbmALLFUNCDESCR (function, description) VALUES (%s, %s)""" run_sql(q, (function, fundescr)) return 0 # Everything is OK else: return 1 # Everything not OK: rows may already exist for function with name 'function' def update_function_description(funcname, funcdescr): """Update the description of function "funcname", with string contained in "funcdescr". Function description will be updated only if one row was found for the function in the DB. @param funcname: the unique function name of the function whose description is to be updated @param funcdescr: the new, updated description of the function @return: error code (0 is OK, 1 is BAD insert) """ numrows_function = get_number_of_functions_with_funcname(funcname) if numrows_function == 1: ## perform update of description q = """UPDATE sbmALLFUNCDESCR SET description=%s WHERE function=%s""" run_sql(q, ( (funcdescr != "" and funcdescr) or (None), funcname ) ) return 0 ## Everything OK else: return 1 ## Everything not OK: either no rows, or more than 1 row for function "funcname" def delete_function_parameter(function, parameter_name): """Delete a given parameter from a from a given function. @param function: name of the function from which the parameter is to be deleted. @param parameter_name: name of the parameter to be deleted from the function. @return: error-code. 0 means successful deletion of the parameter; 1 means deletion failed because the parameter did not exist for the given function. """ numrows_function_parameter = get_number_parameters_with_paramname_funcname(funcname=function, paramname=parameter_name) if numrows_function_parameter >= 1: ## perform deletion of parameter(s) q = """DELETE FROM sbmFUNDESC WHERE function=%s AND param=%s""" run_sql(q, (function, parameter_name)) return 0 ## Everything OK else: return 1 ## Everything not OK: no rows - this parameter doesn't exist for this function def add_function_parameter(function, parameter_name): """Add a parameter (parameter_name) to a given function. @param function: name of the function from which the parameter is to be deleted. @param parameter_name: name of the parameter to be deleted from the function. @return: error-code. 0 means successful addition of the parameter; 1 means addition failed because the parameter already existed for the given function. """ numrows_function_parameter = get_number_parameters_with_paramname_funcname(funcname=function, paramname=parameter_name) if numrows_function_parameter == 0: ## perform addition of parameter q = """INSERT INTO sbmFUNDESC (function, param) VALUES (%s, %s)""" run_sql(q, (function, parameter_name)) return 0 ## Everything OK else: return 1 ## Everything NOT OK: parameter already exists for function ## Functions relating to WebSubmit ELEMENTS, their addition, and their modification: def get_number_elements_with_elname(elname): """Return the number of Elements found for a given element name/id. @param elname: Element name/id (name) to query for @return: an integer count of the number of Elements in the WebSubmit database for this elname. """ q = """SELECT COUNT(name) FROM sbmFIELDDESC where name=%s""" return int(run_sql(q, (elname,))[0][0]) def get_doctype_action_pagenb_for_submissions_using_element(elname): """Get and return a tuple of tuples containing the doctype, the action, and the page number (pagenb) for the instances of use of the element identified by "elname". I.e. get the information about which submission pages the element is used on. @param elname: The unique identifier for an element ("name" in "sbmFIELDDESC", "fidesc" in "sbmFIELD"). @return: tuple of tuples (doctype, action, pagenb) """ q = """SELECT subm.docname, subm.actname, sf.pagenb FROM sbmIMPLEMENT AS subm LEFT JOIN sbmFIELD AS sf ON sf.subname=CONCAT(subm.actname, subm.docname) WHERE sf.fidesc=%s ORDER BY sf.subname ASC, sf.pagenb ASC""" return run_sql(q, (elname,)) def get_subname_pagenb_element_use(elname): """Get and return a tuple of tuples containing the "submission name" (subname) and the page number (pagenb) for the instances of use of the element identified by "elname". I.e. get the information about which submission pages the element is used on. @param elname: The unique identifier for an element ("name" in "sbmFIELDDESC", "fidesc" in "sbmFIELD"). @return: tuple of tuples (subname, pagenb) """ q = """SELECT sf.subname, sf.pagenb FROM sbmFIELD AS sf WHERE sf.fidesc=%s ORDER BY sf.subname ASC, sf.pagenb ASC""" return run_sql(q, (elname,)) def get_elename_allelements(): """Get and return a tuple of tuples containing the "element name" (name) for each WebSubmit element in the WebSubmit database. @return: tuple of tuples: (name) """ q = """SELECT name FROM sbmFIELDDESC ORDER BY name""" return run_sql(q) def get_all_element_names(): """Return a list of the names of all "elements" in the WebSubmit DB. @return: a list of strings, where each string is a WebSubmit element """ q = """SELECT DISTINCT(name) FROM sbmFIELDDESC ORDER BY name""" res = run_sql(q) return map(lambda x: str(x[0]), res) def get_element_details(elname): """Get and return a tuple of tuples for all ELEMENTS with the element name "elname". @param elname: ELEMENT name (elname). @return: tuple of tuples (one tuple per element): (marccode,type,size,rows,cols,maxlength, val,fidesc,cd,md,modifytext) """ q = "SELECT el.marccode, el.type, el.size, el.rows, el.cols, el.maxlength, " + \ "el.val, el.fidesc, el.cd, el.md, el.modifytext FROM sbmFIELDDESC AS el WHERE el.name=%s" return run_sql(q, (elname,)) def update_element_details(elname, elmarccode, eltype, elsize, elrows, elcols, elmaxlength, \ elval, elfidesc, elmodifytext): """Update the details of an ELEMENT in the WebSubmit database IF there was only one Element with that element id/name (name). @param elname: unique Element id/name (name) @param elmarccode: element's MARC code @param eltype: type of element @param elsize: size of element @param elrows: number of rows in element @param elcols: number of columns in element @param elmaxlength: element maximum length @param elval: element default value @param elfidesc: element description @param elmodifytext: element's modification text @return: 0 (ZERO) if update is performed; 1 (ONE) if update not performed due to rows existing for given Element. """ # Check record with code 'elname' does not already exist: numrows_elname = get_number_elements_with_elname(elname) if numrows_elname == 1: q = """UPDATE sbmFIELDDESC SET marccode=%s, type=%s, size=%s, rows=%s, cols=%s, maxlength=%s, """ +\ """val=%s, fidesc=%s, modifytext=%s, md=CURDATE() WHERE name=%s""" run_sql(q, ( elmarccode, (eltype != "" and eltype) or (None), (elsize != "" and elsize) or (None), (elrows != "" and elrows) or (None), (elcols != "" and elcols) or (None), (elmaxlength != "" and elmaxlength) or (None), (elval != "" and elval) or (None), (elfidesc != "" and elfidesc) or (None), (elmodifytext != "" and elmodifytext) or (None), elname ) ) return 0 # Everything is OK else: return 1 # Everything not OK: Either no rows or more than one row for element "elname" def insert_element_details(elname, elmarccode, eltype, elsize, elrows, elcols, \ elmaxlength, elval, elfidesc, elmodifytext): """Insert details of a new Element into the WebSubmit database IF there are not already elements with the same element name (name). @param elname: unique Element id/name (name) @param elmarccode: element's MARC code @param eltype: type of element @param elsize: size of element @param elrows: number of rows in element @param elcols: number of columns in element @param elmaxlength: element maximum length @param elval: element default value @param elfidesc: element description @param elmodifytext: element's modification text @return: 0 (ZERO) if insert is performed; 1 (ONE) if insert not performed due to rows existing for given Element. """ # Check element record with code 'elname' does not already exist: numrows_elname = get_number_elements_with_elname(elname) if numrows_elname == 0: # insert new Check: q = """INSERT INTO sbmFIELDDESC (name, alephcode, marccode, type, size, rows, cols, """ +\ """maxlength, val, fidesc, cd, md, modifytext, fddfi2) VALUES(%s, NULL, """ +\ """%s, %s, %s, %s, %s, %s, %s, %s, CURDATE(), CURDATE(), %s, NULL)""" run_sql(q, ( elname, elmarccode, (eltype != "" and eltype) or (None), (elsize != "" and elsize) or (None), (elrows != "" and elrows) or (None), (elcols != "" and elcols) or (None), (elmaxlength != "" and elmaxlength) or (None), (elval != "" and elval) or (None), (elfidesc != "" and elfidesc) or (None), (elmodifytext != "" and elmodifytext) or (None) ) ) return 0 # Everything is OK else: return 1 # Everything not OK: rows may already exist for Element with 'elname' # Functions relating to WebSubmit DOCUMENT TYPES: def get_docid_docname_alldoctypes(): """Get and return a tuple of tuples containing the "doctype id" (sdocname) and "doctype name" (ldocname) for each action in the WebSubmit database. @return: tuple of tuples: (docid,docname) """ q = """SELECT sdocname, ldocname FROM sbmDOCTYPE ORDER BY ldocname ASC""" return run_sql(q) def get_docid_docname_and_docid_alldoctypes(): """Get and return a tuple of tuples containing the "doctype id" (sdocname) and "doctype name" (ldocname) for each action in the WebSubmit database. @return: tuple of tuples: (docid,docname) """ q = """SELECT sdocname, CONCAT(ldocname, " [", sdocname, "]") FROM sbmDOCTYPE ORDER BY ldocname ASC""" return run_sql(q) def get_number_doctypes_docid(docid): """Return the number of DOCUMENT TYPES found for a given document type id (sdocname). @param docid: unique ID of document type whose instances are to be counted. @return: an integer count of the number of document types in the WebSubmit database for this doctype id. """ q = """SELECT COUNT(sdocname) FROM sbmDOCTYPE where sdocname=%s""" return int(run_sql(q, (docid,))[0][0]) def get_number_functions_doctype(doctype): """Return the number of FUNCTIONS found for a given DOCUMENT TYPE. @param doctype: unique ID of doctype for which the number of functions are to be counted @return: an integer count of the number of functions in the WebSubmit database for this doctype. """ q = """SELECT COUNT(doctype) FROM sbmFUNCTIONS where doctype=%s""" return int(run_sql(q, (doctype,))[0][0]) def get_number_functions_action_doctype(doctype, action): """Return the number of FUNCTIONS found for a given ACTION of a given DOCUMENT TYPE. @param doctype: unique ID of doctype for which the number of functions are to be counted @param action: the action (of the document type "doctype") that owns the functions to be counted @return: an integer count of the number of functions in the WebSubmit database for this doctype/action. """ q = """SELECT COUNT(doctype) FROM sbmFUNCTIONS where doctype=%s AND action=%s""" return int(run_sql(q, (doctype, action))[0][0]) def get_number_of_functions_in_step_of_submission(doctype, action, step): """Return the number of FUNCTIONS within a step of a submission. @param doctype: (string) unique ID of a doctype @param action: (string) unique ID of an action @param step: (integer) the number of the step in which the functions to be counted are situated @return: an integer count of the number of functions found within the step of the submission """ q = """SELECT COUNT(doctype) FROM sbmFUNCTIONS where doctype=%s AND action=%s AND step=%s""" return int(run_sql(q, (doctype, action, step))[0][0]) def get_number_categories_doctype(doctype): """Return the number of CATEGORIES (used to distinguish between submissions) found for a given DOCUMENT TYPE. @param doctype: unique ID of doctype for which submission categories are to be counted @return: an integer count of the number of categories in the WebSubmit database for this doctype. """ q = """SELECT COUNT(doctype) FROM sbmCATEGORIES where doctype=%s""" return int(run_sql(q, (doctype,))[0][0]) def get_number_categories_doctype_category(doctype, categ): """Return the number of CATEGORIES (used to distinguish between submissions) found for a given DOCUMENT TYPE/CATEGORY NAME. Basically, test to see whether a given category already exists for a given document type. @param doctype: unique ID of doctype for which the submission category is to be tested @param categ: the category ID of the category to be tested for @return: an integer count of the number of categories in the WebSubmit database for this doctype. """ q = """SELECT COUNT(sname) FROM sbmCATEGORIES where doctype=%s and sname=%s""" return int(run_sql(q, (doctype, categ))[0][0]) def get_number_parameters_doctype(doctype): """Return the number of PARAMETERS (used by functions) found for a given DOCUMENT TYPE. @param doctype: unique ID of doctype whose parameters are to be counted @return: an integer count of the number of parameters in the WebSubmit database for this doctype. """ q = """SELECT COUNT(name) FROM sbmPARAMETERS where doctype=%s""" return int(run_sql(q, (doctype,))[0][0]) def get_number_submissionfields_submissionnames(submission_names): """Return the number of SUBMISSION FIELDS found for a given list of submissions. A doctype can have several submissions, and each submission can have many fields making up its interface. Using this function, the fields owned by several submissions can be counted. If the submissions in the list are all owned by one doctype, then it is possible to count the submission fields owned by one doctype. @param submission_names: unique IDs of all submissions whose fields are to be counted. If this value is a string, it will be classed as a single submission name. Otherwise, a list/tuple of strings must be passed - where each string is a submission name. @return: an integer count of the number of fields in the WebSubmit database for these submission(s) """ q = """SELECT COUNT(subname) FROM sbmFIELD WHERE subname=%s""" if type(submission_names) in (str, unicode): submission_names = (submission_names,) number_submissionnames = len(submission_names) if number_submissionnames == 0: return 0 if number_submissionnames > 1: for i in range(1,number_submissionnames): ## Ensure that we delete all elements used by all submissions for the doctype in question: q += """ OR subname=%s""" return int(run_sql(q, map(lambda x: str(x), submission_names))[0][0]) def get_doctypeid_doctypes_implementing_action(action): q = """SELECT doc.sdocname, CONCAT("[", doc.sdocname, "] ", doc.ldocname) FROM sbmDOCTYPE AS doc """\ """LEFT JOIN sbmIMPLEMENT AS subm ON """\ """subm.docname = doc.sdocname """\ """WHERE subm.actname=%s """\ """ORDER BY doc.sdocname ASC""" return run_sql(q, (action,)) def get_number_submissions_doctype(doctype): """Return the number of SUBMISSIONS found for a given document type @param doctype: the unique ID of the document type for which submissions are to be counted @return: an integer count of the number of submissions owned by this doctype """ q = """SELECT COUNT(subname) FROM sbmIMPLEMENT WHERE docname=%s""" return int(run_sql(q, (doctype,))[0][0]) def get_number_submissions_doctype_action(doctype, action): """Return the number of SUBMISSIONS found for a given document type/action @param doctype: the unique ID of the document type for which submissions are to be counted @param actname: the unique ID of the action that the submission implements, that is to be counted @return: an integer count of the number of submissions found for this doctype/action ID """ q = """SELECT COUNT(subname) FROM sbmIMPLEMENT WHERE docname=%s and actname=%s""" return int(run_sql(q, (doctype, action))[0][0]) def get_number_collection_doctype_entries_doctype(doctype): """Return the number of collection_doctype entries found for a given doctype @param doctype: the document type for which the collection-doctypes are to be counted @return: an integer count of the number of collection-doctype entries found for the given document type """ q = """SELECT COUNT(id_father) FROM sbmCOLLECTION_sbmDOCTYPE WHERE id_son=%s""" return int(run_sql(q, (doctype,))[0][0]) def get_all_category_details_for_doctype(doctype): """Return all details (short-name, long-name, position number) of all CATEGORIES found for a given document type. If the position number is NULL, it will be assigned a value of zero. Categories will be ordered primarily by ascending position number and then by ascending alphabetical order of short-name. @param doctype: (string) The document type for which categories are to be retrieved. @return: (tuple) of tuples whereby each tuple is a row containing 3 items: (short-name, long-name, position) """ q = """SELECT sname, lname, score FROM sbmCATEGORIES where doctype=%s ORDER BY score ASC,""" \ """ lname ASC""" return run_sql(q, (doctype,)) def get_all_categories_sname_lname_for_doctype_categsname(doctype, categsname): """Return the short and long names of all CATEGORIES found for a given DOCUMENT TYPE. @param doctype: unique ID of doctype for which submission categories are to be counted @return: a tuple of tuples: (sname, lname) """ q = """SELECT sname, lname FROM sbmCATEGORIES where doctype=%s AND sname=%s""" return run_sql(q, (doctype, categsname) ) def get_all_submissionnames_doctype(doctype): """Get and return a tuple of tuples containing the "submission name" (subname) of all submissions for the document type identified by "doctype". In other words, get a list of the submissions that document type "doctype" has. @param doctype: unique ID of the document type whose submissions are to be retrieved @return: tuple of tuples (subname,) """ q = """SELECT subname FROM sbmIMPLEMENT WHERE docname=%s ORDER BY subname ASC""" return run_sql(q, (doctype,)) def get_actname_all_submissions_doctype(doctype): """Get and return a tuple of tuples containing the "action name" (actname) of all submissions for the document type identified by "doctype". In other words, get a list of the action IDs of the submissions implemented by document type "doctype". @param doctype: unique ID of the document type whose actions are to be retrieved @return: tuple of tuples (actname,) """ q = """SELECT actname FROM sbmIMPLEMENT WHERE docname=%s ORDER BY actname ASC""" return run_sql(q, (doctype,)) def get_submissiondetails_doctype_action(doctype, action): """Get the details of all submissions for a given document type, ordered by the action name. @param doctype: details of the document type for which the details of all submissions are to be retrieved. @return: a tuple containing the details of a submission: (subname, docname, actname, displayed, nbpg, cd, md, buttonorder, statustext, level, score, stpage, endtext) """ q = """SELECT subname, docname, actname, displayed, nbpg, cd, md, buttonorder, statustext, level, """ \ """score, stpage, endtxt FROM sbmIMPLEMENT WHERE docname=%s AND actname=%s""" return run_sql(q, (doctype, action)) def get_all_categories_of_doctype_ordered_by_score_lname(doctype): """Return a tuple containing all categories of a given document type, ordered by ascending order of score, and ascending order of category long-name. @param doctype: (string) the document type ID. @return: (tuple) or tuples, whereby each tuple is a row representing a category, with the following structure: (sname, lname, score) """ qstr = """SELECT sname, lname, score FROM sbmCATEGORIES WHERE doctype=%s ORDER BY score ASC, lname ASC""" res = run_sql(qstr, (doctype,)) return res def update_score_of_doctype_category(doctype, categid, newscore): """Update the score of a given category of a given document type. @param doctype: (string) the document type id @param categid: (string) the category id @param newscore: (integer) the score that the category is to be given @return: (integer) - 0 on update of row; 1 on failure to update. """ qstr = """UPDATE sbmCATEGORIES SET score=%s WHERE doctype=%s AND sname=%s""" res = run_sql(qstr, (newscore, doctype, categid)) if int(res) > 0: ## row(s) were updated return 0 else: ## no rows were updated return 1 def normalize_doctype_category_scores(doctype): """Get details of all categories of a given document type, ordered by score and long name; Loop through each category and check its score vs a counter in the result-set; if the score does not match the counter number, update the score of that category to match that of the counter. In this way, the category scores will be normalized sequentially. E.g.: categories numbered [1,4,6,8,9] will be allocated normalized scores [1,2,3,4,5]. I.e. the order won't change, but the scores will be corrected. @param doctype: (string) the document type id @return: (None) """ all_categs = get_all_categories_of_doctype_ordered_by_score_lname(doctype) num_categs = len(all_categs) for row_idx in xrange(0, num_categs): ## Get the details of the current categories: cur_row_score = row_idx + 1 cur_categ_id = all_categs[row_idx][0] cur_categ_lname = all_categs[row_idx][1] cur_categ_score = int(all_categs[row_idx][2]) ## Check the score of the categ vs its position in the list: if cur_categ_score != cur_row_score: ## update this score: update_score_of_doctype_category(doctype=doctype, categid=cur_categ_id, newscore=cur_row_score) def move_category_to_new_score(doctype, sourcecateg, destinationcatg): """Move a category of a document type from one score, to another. @param doctype: (string) -the ID of the document type whose categories are to be moved. @param sourcecateg: (string) - the category ID of the category to be moved. @param destinationcatg: (string) - the category ID of the category to whose position sourcecateg is to be moved. @return: (integer) 0 - successfully moved category; 1 - failed to correctly move category. """ qstr_increment_scores_from_scorex = """UPDATE sbmCATEGORIES SET score=score+1 WHERE doctype=%s AND score >= %s""" move_categ_from_score = mave_categ_to_score = -1 ## get the (categid, lname, score) of all categories for this document type: res_all_categs = get_all_categories_of_doctype_ordered_by_score_lname(doctype=doctype) num_categs = len(res_all_categs) ## if the category scores are not ordered properly (1,2,3,4,...), correct them. ## Also, get the row-count (therefore score-position) of the categ to be moved, and the destination score: for row_idx in xrange(0, num_categs): current_row_score = row_idx + 1 current_categid = res_all_categs[row_idx][0] current_categ_score = int(res_all_categs[row_idx][2]) ## Check the score of the categ vs its position in the list: if current_categ_score != current_row_score: ## bad score - fix it: update_score_of_doctype_category(doctype=doctype, categid=current_categid, newscore=current_row_score) if current_categid == sourcecateg: ## this is the place from which the category is being jumped-out: move_categ_from_score = current_row_score elif current_categid == destinationcatg: ## this is the place into which the categ is being jumped: move_categ_to_score = current_row_score ## If couldn't find the scores of both 'sourcecateg' and 'destinationcatg', return error: if -1 in (move_categ_from_score, move_categ_to_score) or \ move_categ_from_score == mave_categ_to_score: ## either trying to move a categ to the same place or can't find both the source and destination categs: return 1 ## add 1 to score of all categories from the score position into which the sourcecateg is to be moved: qres = run_sql(qstr_increment_scores_from_scorex, (doctype, move_categ_to_score)) ## update the score of the category to be moved: update_score_of_doctype_category(doctype=doctype, categid=sourcecateg, newscore=move_categ_to_score) ## now re-order all category scores correctly: normalize_doctype_category_scores(doctype) return 0 ## return success def move_category_by_one_place_in_score(doctype, categsname, direction): """Move a category up or down in score by one place. @param doctype: (string) - the ID of the document type to which the category belongs. @param categsname: (string) - the ID of the category to be moved. @param direction: (string) - the direction in which to move the category ('up' or 'down'). @return: (integer) - 0 on successful move of category; 1 on failure to properly move category. """ qstr_update_score = """UPDATE sbmCATEGORIES SET score=%s WHERE doctype=%s AND score=%s""" move_categ_score = -1 ## get the (categid, lname, score) of all categories for this document type: res_all_categs = get_all_categories_of_doctype_ordered_by_score_lname(doctype=doctype) num_categs = len(res_all_categs) ## if the category scores are not ordered properly (1,2,3,4,...), correct them ## Also, get the row-count (therefore score-position) of the categ to be moved for row_idx in xrange(0, num_categs): current_row_score = row_idx + 1 current_categid = res_all_categs[row_idx][0] current_categ_score = int(res_all_categs[row_idx][2]) ## Check the score of the categ vs its position in the list: if current_categ_score != current_row_score: ## bad score - fix it: update_score_of_doctype_category(doctype=doctype, categid=current_categid, newscore=current_row_score) if current_categid == categsname: ## this is the category to be moved: move_categ_score = current_row_score ## move the category: if direction.lower() == "up": ## Moving the category upwards (reducing its score): if num_categs > 1 and move_categ_score > 1: ## move the category above down by one place: run_sql(qstr_update_score, (move_categ_score, doctype, (move_categ_score - 1))) ## move the chosen category up: update_score_of_doctype_category(doctype=doctype, categid=categsname, newscore=(move_categ_score - 1)) ## return success return 0 else: ## return error - not enough categs, or categ already in first posn return 1 elif direction.lower() == "down": ## move the category downwards (increasing its score): if num_categs > 1 and move_categ_score < num_categs: ## move category below, up by one place: run_sql(qstr_update_score, (move_categ_score, doctype, (move_categ_score + 1))) ## move the chosen category down: update_score_of_doctype_category(doctype=doctype, categid=categsname, newscore=(move_categ_score + 1)) ## return success return 0 else: ## return error - not enough categs, or categ already in last posn return 1 else: ## invalid move direction - no action return 1 def update_submissiondetails_doctype_action(doctype, action, displayed, buttonorder, statustext, level, score, stpage, endtxt): """Update the details of a submission. @param doctype: the document type for which the submission details are to be updated @param action: the action ID of the submission to be modified @param displayed: displayed on main submission page? (Y/N) @param buttonorder: button order @param statustext: statustext @param level: level @param score: score @param stpage: stpage @param endtxt: endtxt @return: an integer error code: 0 for successful update; 1 for update failure. """ numrows_submission = get_number_submissions_doctype_action(doctype, action) if numrows_submission == 1: ## there is only one row for this submission - can update q = """UPDATE sbmIMPLEMENT SET md=CURDATE(), displayed=%s, buttonorder=%s, statustext=%s, level=%s, """\ """score=%s, stpage=%s, endtxt=%s WHERE docname=%s AND actname=%s""" run_sql(q, (displayed, ((str(buttonorder).isdigit() and int(buttonorder) >= 0) and buttonorder) or (None), statustext, level, ((str(score).isdigit() and int(score) >= 0) and score) or (""), ((str(stpage).isdigit() and int(stpage) >= 0) and stpage) or (""), endtxt, doctype, action ) ) return 0 ## Everything OK else: ## Everything NOT OK - either multiple rows exist for submission, or submission doesn't exist return 1 def update_doctype_details(doctype, doctypename, doctypedescr): """Update a document type's details. In effect the document type name (ldocname) and the description are updated, as is the last modification date (md). @param doctype: the ID of the document type to be updated @param doctypename: the new/updated name of the document type @param doctypedescr: the new/updated description of the document type @return: Integer error code: 0 = update successful; 1 = update failed """ numrows_doctype = get_number_doctypes_docid(docid=doctype) if numrows_doctype == 1: ## doctype exists - perform update q = """UPDATE sbmDOCTYPE SET ldocname=%s, description=%s, md=CURDATE() WHERE sdocname=%s""" run_sql(q, (doctypename, doctypedescr, doctype)) return 0 ## Everything OK else: ## Everything NOT OK - either doctype does not exists, or key is duplicated return 1 def get_submissiondetails_all_submissions_doctype(doctype): """Get the details of all submissions for a given document type, ordered by the action name. @param doctype: details of the document type for which the details of all submissions are to be retrieved. @return: a tuple of tuples, each tuple containing the details of a submission: (subname, docname, actname, displayed, nbpg, cd, md, buttonorder, statustext, level, score, stpage, endtext) """ q = """SELECT subname, docname, actname, displayed, nbpg, cd, md, buttonorder, statustext, level, """ \ """score, stpage, endtxt FROM sbmIMPLEMENT WHERE docname=%s ORDER BY actname ASC""" return run_sql(q, (doctype,)) def delete_doctype(doctype): """Delete a document type's details from the document types table (sbmDOCTYPE). Effectively, this means that the document type has been deleted, but this function should be called after other functions that delete all of the other components of a document type (such as "delete_all_submissions_doctype" to delete the doctype's submissions, "delete_all_functions_doctype" to delete its functions, etc. @param doctype: the unique ID of the document type to be deleted. @return: 0 (ZERO) if doctype was deleted successfully; 1 (ONE) if doctype remains after the deletion attempt. """ q = """DELETE FROM sbmDOCTYPE WHERE sdocname=%s""" run_sql(q, (doctype,)) numrows_doctype = get_number_doctypes_docid(doctype) if numrows_doctype == 0: ## everything OK - deleted this doctype return 0 else: ## everything NOT OK - could not delete all entries for this doctype ## make a last attempt: run_sql(q, (doctype,)) if get_number_doctypes_docid(doctype) == 0: ## everything OK this time - could delete doctype return 0 else: ## everything still NOT OK - could not delete the doctype return 1 def delete_collection_doctype_entry_doctype(doctype): """Delete a document type's entry from the collection-doctype list @param doctype: the unique ID of the document type to be deleted from the collection-doctypes list @return: 0 (ZERO) if doctype was deleted successfully from collection-doctypes list; 1 (ONE) if doctype remains in the collection-doctypes list after the deletion attempt """ q = """DELETE FROM sbmCOLLECTION_sbmDOCTYPE WHERE id_son=%s""" run_sql(q, (doctype,)) numrows_coll_doctype_doctype = get_number_collection_doctype_entries_doctype(doctype) if numrows_coll_doctype_doctype == 0: ## everything OK - deleted the document type from the collection-doctype list return 0 else: ## everything NOT OK - could not delete the doctype from the collection-doctype list ## try once more run_sql(q, (doctype,)) if get_number_collection_doctype_entries_doctype(doctype) == 0: ## everything now OK - could delete this time return 0 else: ## everything still NOT OK - could not delete return 1 def delete_all_submissions_doctype(doctype): """Delete all SUBMISSIONS (actions) for a given document type @param doctype: the doument type from which the submissions are to be deleted @return: 0 (ZERO) if all submissions are deleted successfully; 1 (ONE) if submissions remain after the delete has been performed (i.e. all submissions could not be deleted for some reason) """ q = """DELETE FROM sbmIMPLEMENT WHERE docname=%s""" run_sql(q, (doctype,)) numrows_submissionsdoctype = get_number_submissions_doctype(doctype) if numrows_submissionsdoctype == 0: ## everything OK - no submissions remain for this doctype return 0 else: ## everything NOT OK - still submissions remaining for this doctype ## make a last attempt to delete them: run_sql(q, (doctype,)) ## last check to see whether submissions remain: if get_number_submissions_doctype(doctype) == 0: ## Everything OK - all submissions deleted this time return 0 else: ## Everything NOT OK - still could not delete the submissions return 1 def delete_all_parameters_doctype(doctype): """Delete all PARAMETERS (as used by functions) for a given document type @param doctype: the doctype for which all function-parameters are to be deleted @return: 0 (ZERO) if all parameters are deleted successfully; 1 (ONE) if parameters remain after the delete has been performed (i.e. all parameters could not be deleted for some reason) """ q = """DELETE FROM sbmPARAMETERS WHERE doctype=%s""" run_sql(q, (doctype,)) numrows_paramsdoctype = get_number_parameters_doctype(doctype) if numrows_paramsdoctype == 0: ## Everything OK - no parameters remain for this doctype return 0 else: ## Everything NOT OK - still some parameters remaining for doctype ## make a last attempt to delete them: run_sql(q, (doctype,)) ## check once more to see if parameters remain: if get_number_parameters_doctype(doctype) == 0: ## Everything OK - all parameters were deleted successfully this time return 0 else: ## still unable to recover - could not delete all parameters return 1 def get_functionname_step_score_allfunctions_afterreference_doctypesubmission(doctype, action, step, score): q = """SELECT function, step, score FROM sbmFUNCTIONS WHERE (doctype=%s AND action=%s) AND ((step=%s AND score > %s)""" \ """ OR (step > %s)) ORDER BY step ASC, score ASC""" return run_sql(q, (doctype, action, step, score, step)) def get_functionname_step_score_allfunctions_beforereference_doctypesubmission(doctype, action, step, score): q = """SELECT function, step, score FROM sbmFUNCTIONS WHERE (doctype=%s AND action=%s) AND ((step=%s AND score < %s)""" if step > 1: q += """ OR (step < %s)""" q += """) ORDER BY step ASC, score ASC""" if step > 1: return run_sql(q, (doctype, action, step, score, step)) else: return run_sql(q, (doctype, action, step, score)) def get_functionname_step_score_allfunctions_doctypesubmission(doctype, action): """Return the details (function name, step, score) of all functions beloning to the submission (action) of doctype. @param doctype: unique ID of doctype for which the details of the functions of the given submission are to be retrieved @param action: the action ID of the submission whose function details ore to be retrieved @return: a tuple of tuples: ((function, step, score),(function, step, score),[...]) """ q = """SELECT function, step, score FROM sbmFUNCTIONS where doctype=%s AND action=%s ORDER BY step ASC, score ASC""" return run_sql(q, (doctype, action)) def get_name_step_score_of_all_functions_in_step_of_submission(doctype, action, step): """Return a list of the details of all functions within a given step of a submission. The functions will be ordered in ascending order of score. @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @param step: (integer) the step in which the functions are located @return: a tuple of tuples (function-name, step, score) """ q = """SELECT function, step, score FROM sbmFUNCTIONS WHERE doctype=%s AND action=%s AND step=%s ORDER BY score ASC""" res = run_sql(q, (doctype, action, step)) return res def delete_function_doctypesubmission_step_score(doctype, action, function, step, score): """Delete a given function at a particular step/score for a given doctype submission""" q = """DELETE FROM sbmFUNCTIONS WHERE doctype=%s AND action=%s AND function=%s AND step=%s AND score=%s""" run_sql(q, (doctype, action, function, step, score)) numrows_function_doctypesubmission_step_score = \ get_number_of_functions_with_functionname_in_submission_at_step_and_score(doctype=doctype, action=action, function=function, step=step, score=score) if numrows_function_doctypesubmission_step_score == 0: ## Everything OK - function deleted return 0 else: ## Everything NOT OK - still some functions remaining for doctype/action ## make a last attempt to delete them: run_sql(q, (doctype, action, function, step, score)) ## check once more to see if functions remain: if get_number_of_functions_with_functionname_in_submission_at_step_and_score(doctype=doctype, action=action, function=function, step=step, score=score): ## Everything OK - all functions for this doctype/action were deleted successfully this time return 0 else: ## still unable to recover - could not delete all functions for this doctype/action return 1 def delete_the_function_at_step_and_score_from_a_submission(doctype, action, function, step, score): ## THIS SHOULD REPLACE "delete_function_doctypesubmission_step_score(doctype, action, function, step, score)" """Delete a given function at a particular step/score for a given submission""" q = """DELETE FROM sbmFUNCTIONS WHERE doctype=%s AND action=%s AND function=%s AND step=%s AND score=%s""" run_sql(q, (doctype, action, function, step, score)) numrows_deletedfunc = \ get_number_of_functions_with_functionname_in_submission_at_step_and_score(doctype=doctype, action=action, function=function, step=step, score=score) if numrows_deletedfunc == 0: ## Everything OK - function deleted return else: ## Everything NOT OK - still some functions remaining for doctype/action ## make a last attempt to delete them: run_sql(q, (doctype, action, function, step, score)) ## check once more to see if functions remain: numrows_deletedfunc = \ get_number_of_functions_with_functionname_in_submission_at_step_and_score(doctype=doctype, action=action, function=function, step=step, score=score) if numrows_deletedfunc == 0: ## Everything OK - all functions for this doctype/action were deleted successfully this time return else: ## still unable to recover - could not delete all functions for this doctype/action msg = """Failed to delete the function [%s] at score [%s] of step [%s], from submission [%s]"""\ % (function, score, step, "%s%s" % (action, doctype)) raise InvenioWebSubmitAdminWarningDeleteFailed(msg) def delete_function_at_step_and_score_from_submission(doctype, action, function, step, score): """Delete the function at a particular step/score from a submission. @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @param function: (string) the name of the function to be deleted @param step: (integer) the step in which the function to be deleted is found @param score: (integer) the score at which the function to be deleted is found @return: None @Exceptions raised: InvenioWebSubmitAdminWarningDeleteFailed - when unable to delete the function """ q = """DELETE FROM sbmFUNCTIONS WHERE doctype=%s AND action=%s AND function=%s AND step=%s AND score=%s""" run_sql(q, (doctype, action, function, step, score)) numrows_function_at_stepscore = \ get_number_of_functions_with_functionname_in_submission_at_step_and_score(doctype=doctype, action=action, function=function, step=step, score=score) if numrows_function_at_stepscore == 0: ## Everything OK - function deleted return else: ## Everything NOT OK - still some functions remaining for doctype/action ## make a last attempt to delete them: run_sql(q, (doctype, action, function, step, score)) ## check once more to see if functions remain: numrows_function_at_stepscore = \ get_number_of_functions_with_functionname_in_submission_at_step_and_score(doctype=doctype, action=action, function=function, step=step, score=score) if numrows_function_at_stepscore == 0: ## Everything OK - all functions for this doctype/action were deleted successfully this time return else: ## still unable to recover - could not delete all functions for this doctype/action msg = """Failed to delete function [%s] from step [%s] and score [%s] from submission [%s]""" \ % (function, step, score, "%s%s" % (action, doctype)) raise InvenioWebSubmitAdminWarningDeleteFailed(msg) def delete_all_functions_in_step_of_submission(doctype, action, step): """Delete all functions from a given step of a submission. @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @param step: (integer) the number of the step in which the functions are to be deleted @return: None @Exceptions raised: InvenioWebSubmitAdminWarningDeleteFailed - when unable to delete some or all of the functions """ q = """DELETE FROM sbmFUNCTIONS WHERE doctype=%s AND action=%s AND step=%s""" run_sql(q, (doctype, action, step)) numrows_functions_in_step = get_number_of_functions_in_step_of_submission(doctype=doctype, action=action, step=step) if numrows_functions_in_step == 0: ## all functions in step of submission deleted return else: ## couldn't delete all of the functions - try again run_sql(q, (doctype, action, step)) numrows_functions_in_step = get_number_of_functions_in_step_of_submission(doctype=doctype, action=action, step=step) if numrows_functions_in_step == 0: ## success this time return else: msg = """Failed to delete all functions in step [%s] of submission [%s]""" % (step, "%s%s" % (action, doctype)) raise InvenioWebSubmitAdminWarningDeleteFailed(msg) def delete_all_functions_foraction_doctype(doctype, action): """Delete all FUNCTIONS for a given action, belonging to a given doctype. @param doctype: the document type for which the functions are to be deleted @param action: the action that owns the functions to be deleted @return: 0 (ZERO) if all functions for the doctype/action are deleted successfully; 1 (ONE) if functions for the doctype/action remain after the delete has been performed (i.e. the functions could not be deleted for some reason) """ q = """DELETE FROM sbmFUNCTIONS WHERE doctype=%s AND action=%s""" run_sql(q, (doctype, action)) numrows_functions_actiondoctype = get_number_functions_action_doctype(doctype=doctype, action=action) if numrows_functions_actiondoctype == 0: ## Everything OK - no functions remain for this doctype/action return 0 else: ## Everything NOT OK - still some functions remaining for doctype/action ## make a last attempt to delete them: run_sql(q, (doctype, action)) ## check once more to see if functions remain: if get_number_functions_action_doctype(doctype=doctype, action=action) == 0: ## Everything OK - all functions for this doctype/action were deleted successfully this time return 0 else: ## still unable to recover - could not delete all functions for this doctype/action return 1 def delete_all_functions_doctype(doctype): """Delete all FUNCTIONS for a given document type. @param doctype: the document type for which all functions are to be deleted @return: 0 (ZERO) if all functions are deleted successfully; 1 (ONE) if functions remain after the delete has been performed (i.e. all functions could not be deleted for some reason) """ q = """DELETE FROM sbmFUNCTIONS WHERE doctype=%s""" run_sql(q, (doctype,)) numrows_functionsdoctype = get_number_functions_doctype(doctype) if numrows_functionsdoctype == 0: ## Everything OK - no functions remain for this doctype return 0 else: ## Everything NOT OK - still some functions remaining for doctype ## make a last attempt to delete them: run_sql(q, (doctype,)) ## check once more to see if functions remain: if get_number_functions_doctype(doctype) == 0: ## Everything OK - all functions were deleted successfully this time return 0 else: ## still unable to recover - could not delete all functions return 1 def clone_submissionfields_from_doctypesubmission_to_doctypesubmission(fromsub, tosub): """ """ error_code = delete_all_submissionfields_submission(tosub) if error_code == 0: ## there are no fields for the submission "tosubm" - clone from "fromsub" q = """INSERT INTO sbmFIELD (subname, pagenb, fieldnb, fidesc, fitext, level, sdesc, checkn, cd, md, """ \ """fiefi1, fiefi2) """\ """(SELECT %s, pagenb, fieldnb, fidesc, fitext, level, sdesc, checkn, CURDATE(), CURDATE(), NULL, NULL """ \ """FROM sbmFIELD WHERE subname=%s)""" ## get number of submission fields for submission fromsub: numfields_fromsub = get_number_submissionfields_submissionnames(submission_names=fromsub) run_sql(q, (tosub, fromsub)) ## get number of submission fields for submission tosub (after cloning): numfields_tosub = get_number_submissionfields_submissionnames(submission_names=tosub) if numfields_fromsub == numfields_tosub: ## successful clone return 0 else: ## didn't manage to clone all fields - return 2 return 2 else: ## cannot delete "tosub"s fields - cannot clone - return 1 to signal this return 1 def clone_categories_fromdoctype_todoctype(fromdoctype, todoctype): """ TODO : docstring """ ## first, if categories exist for "todoctype", delete them error_code = delete_all_categories_doctype(todoctype) if error_code == 0: ## all categories were deleted - now clone those of "fromdoctype" ## first, count "fromdoctype"s categories: numcategs_fromdoctype = get_number_categories_doctype(fromdoctype) ## now perform the cloning: q = """INSERT INTO sbmCATEGORIES (doctype, sname, lname, score) (SELECT %s, sname, lname, score """\ """FROM sbmCATEGORIES WHERE doctype=%s)""" run_sql(q, (todoctype, fromdoctype)) ## get number categories for "todoctype" (should be the same as "fromdoctype" if the cloning was successful): numcategs_todoctype = get_number_categories_doctype(todoctype) if numcategs_fromdoctype == numcategs_todoctype: ## successful clone return 0 else: ## did not manage to clone all categories - return 2 to indicate this return 2 else: ## cannot delete "todoctype"s categories - return error code of 1 to signal this return 1 def insert_function_into_submission_at_step_and_score_then_regulate_scores_of_functions_in_step(doctype, action, function, step, score): """Insert a function into a submission at a particular score within a particular step, then regulate the scores of all functions within that step to spaces of 10. @param doctype: (string) @param action: (string) @param function: (string) @param step: (integer) @param score: (integer) @return: None """ ## check whether function exists in WebSubmit DB: numrows_function = get_number_of_functions_with_funcname(funcname=function) if numrows_function < 1: msg = """Failed to insert the function [%s] into submission [%s] at step [%s] and score [%s] - """\ """Could not find function [%s] in WebSubmit DB""" % (function, "%s%s" % (action, doctype), step, score, function) raise InvenioWebSubmitAdminWarningReferentialIntegrityViolation(msg) ## add 10 to the score of all functions at or below the position of this new function and within the same step ## (this ensures there is a vacant slot where the function is to be added) add_10_to_score_of_all_functions_in_step_of_submission_and_with_score_equalto_or_above_val(doctype=doctype, action=action, step=step, fromscore=score) ## now insert the new function into its position: try: insert_function_into_submission_at_step_and_score(doctype=doctype, action=action, function=function, step=step, score=score) except InvenioWebSubmitAdminWarningReferentialIntegrityViolation, e: ## The function doesn't exist in WebSubmit and therefore cannot be used in the submission ## regulate the scores of all functions within the step, to correct the "hole" that was made try: regulate_score_of_all_functions_in_step_to_ascending_multiples_of_10_for_submission(doctype=doctype, action=action, step=step) except InvenioWebSubmitAdminWarningDeleteFailed, f: ## can't regulate the functions' scores - couldn't delete some or all of them before re-inserting ## them in the correct position. Cannot fix this - report that some functions may have been lost. msg = """It wasn't possible to add the function [%s] to submission [%s] at step [%s], score [%s]."""\ """ Firstly, the function doesn't exist in WebSubmit. Secondly, when trying to correct the """\ """score of the functions within step [%s], it was not possible to delete some or all of them."""\ """ Some functions may have been lost - please check."""\ % (function, "%s%s" % (action, doctype), step, score, step) raise InvenioWebSubmitAdminWarningInsertFailed(msg) - raise e + raise ## try to regulate the scores of the functions in the step that the new function was just inserted into: try: regulate_score_of_all_functions_in_step_to_ascending_multiples_of_10_for_submission(doctype=doctype, action=action, step=step) except InvenioWebSubmitAdminWarningDeleteFailed, e: ## could not correctly regulate the functions - could not delete all functions in the step msg = """Could not regulate the scores of all functions within step [%s] of submission [%s]."""\ """ It was not possible to delete some or all of them. Some functions may have been lost -"""\ """ please chack.""" % (step, "%s%s" % (action, doctype)) raise InvenioWebSubmitAdminWarningDeleteFailed(msg) ## success return def insert_function_into_submission_at_step_and_score(doctype, action, function, step, score): """Insert a function into a submission, at the position dictated by step/score. @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @param function: (string) the unique name of a function @param step: (integer) the step into which the function should be inserted @param score: (integer) the score at which the function should be inserted @return: """ ## check that the function exists in WebSubmit: numrows_function = get_number_of_functions_with_funcname(function) if numrows_function > 0: ## perform the insert q = """INSERT INTO sbmFUNCTIONS (doctype, action, function, step, score) VALUES(%s, %s, %s, %s, %s)""" run_sql(q, (doctype, action, function, step, score)) return else: ## function doesnt exist - cannot insert a row for it in a submission! msg = """Failed to insert the function [%s] into submission [%s] at step [%s] and score [%s] - """\ """Could not find function [%s] in WebSubmit DB""" % (function, "%s%s" % (action, doctype), step, score, function) raise InvenioWebSubmitAdminWarningReferentialIntegrityViolation(msg) def clone_functions_foraction_fromdoctype_todoctype(fromdoctype, todoctype, action): ## delete all functions that error_code = delete_all_functions_foraction_doctype(doctype=todoctype, action=action) if error_code == 0: ## all functions for todoctype/action deleted - no clone those of "fromdoctype" ## count fromdoctype's functions for the given action numrows_functions_action_fromdoctype = get_number_functions_action_doctype(doctype=fromdoctype, action=action) ## perform the cloning: q = """INSERT INTO sbmFUNCTIONS (doctype, action, function, score, step) (SELECT %s, action, function, """ \ """score, step FROM sbmFUNCTIONS WHERE doctype=%s AND action=%s)""" run_sql(q, (todoctype, fromdoctype, action)) ## get number of functions for todoctype/action (these have just been cloned these from fromdoctype/action, so ## the counts should be the same) numrows_functions_action_todoctype = get_number_functions_action_doctype(doctype=todoctype, action=action) if numrows_functions_action_fromdoctype == numrows_functions_action_todoctype: ## successful clone: return 0 else: ## could not clone all functions from fromdoctype/action for todoctype/action return 2 else: ## unable to delete "todoctype"'s functions for action return 1 def get_number_functionparameters_for_action_doctype(action, doctype): """Get the number of parameters associated with a given action of a given document type. @param action: the action of the doctype, with which the parameters are associated @param doctype: the doctype with which the parameters are associated. @return: an integer count of the number of parameters associated with the given action of the given document type """ q = """SELECT COUNT(DISTINCT(par.name)) FROM sbmFUNDESC AS fundesc """ \ """LEFT JOIN sbmPARAMETERS AS par ON fundesc.param = par.name """ \ """LEFT JOIN sbmFUNCTIONS AS func ON par.doctype = func.doctype AND fundesc.function = func.function """ \ """WHERE par.doctype=%s AND func.action=%s""" return int(run_sql(q, (doctype, action))[0][0]) def delete_functionparameters_doctype_submission(doctype, action): def _get_list_params_to_delete(potential_delete_params, keep_params): del_params = [] for param in potential_delete_params: if param[0] not in keep_params and param[0] != "": ## this parameter is not used by the other actions - it can be deleted del_params.append(param[0]) return del_params ## get the parameters belonging to the given submission of the doctype: params_doctype_action = get_functionparameternames_doctype_action(doctype=doctype, action=action) ## get all parameters for the given doctype that belong to submissions OTHER than the submission for which we must ## delete parameters: params_doctype_other_actions = get_functionparameternames_doctype_not_action(doctype=doctype, action=action) ## "params_doctype_not_action" is a tuple of tuples, where each tuple contains only the parameter name: ((param,),(param,)) ## make a tuple of strings, instead of this tuple of tuples: params_to_keep = map(lambda x: (type(x[0]) in (str, unicode) and x[0]) or (""), params_doctype_other_actions) delete_params = _get_list_params_to_delete(potential_delete_params=params_doctype_action, keep_params=params_to_keep) ## now, if there are parameters to delete, do it: if len(delete_params) > 0: q = """DELETE FROM sbmPARAMETERS WHERE doctype=%s AND (name=%s""" if len(delete_params) > 1: for i in range(1, len(delete_params)): q += """ OR name=%s""" q += """)""" run_sql(q, [doctype,] + delete_params) params_remaining_doctype_action = get_functionparameternames_doctype_action(doctype=doctype, action=action) if len(_get_list_params_to_delete(potential_delete_params=params_remaining_doctype_action, keep_params=params_to_keep)) == 0: ## Everything OK - all parameters deleted return 0 else: ## Everything NOT OK - some parameters remain: try one final time to delete them run_sql(q, [doctype,] + delete_params) params_remaining_doctype_action = get_functionparameternames_doctype_action(doctype=doctype, action=action) if len(_get_list_params_to_delete(potential_delete_params=params_remaining_doctype_action, keep_params=params_to_keep)) > 0: ## Everything OK - deleted successfully this time return 0 else: ## Still unable to delete - give up return 1 ## no parameters to delete return 0 def update_value_of_function_parameter_for_doctype(doctype, paramname, paramval): """Update the value of a parameter as used by a document type. @param doctype: (string) the unique ID of a document type @param paramname: (string) the name of the parameter whose value is to be updated @param paramval: (string) the new value for the parameter @Exceptions raised: InvenioWebSubmitAdminTooManyRows - when multiple rows found for parameter InvenioWebSubmitAdminNoRowsFound - when no rows found for parameter """ q = """UPDATE sbmPARAMETERS SET value=%s WHERE doctype=%s AND name=%s""" ## get number of rows found for the parameter: numrows_param = get_numberparams_doctype_paramname(doctype=doctype, paramname=paramname) if numrows_param == 1: run_sql(q, (paramval, doctype, paramname)) return elif numrows_param > 1: ## multiple rows found for the parameter - not safe to edit msg = """When trying to update the [%s] parameter for the [%s] document type, [%s] rows were found for the parameter """\ """- not safe to update""" % (paramname, doctype, numrows_param) raise InvenioWebSubmitAdminWarningTooManyRows(msg) else: ## no row for parameter found insert_parameter_doctype(doctype=doctype, paramname=paramname, paramval=paramval) numrows_param = get_numberparams_doctype_paramname(doctype=doctype, paramname=paramname) if numrows_param != 1: msg = """When trying to update the [%s] parameter for the [%s] document type, could not insert a new value"""\ % (paramname, doctype) raise InvenioWebSubmitAdminWarningNoRowsFound(msg) return def get_parameters_name_and_value_for_function_of_doctype(doctype, function): """Get the names and values of all parameters of a given function, as they have been set for a particular document type. @param doctype: (string) the unique ID of a document type @param function: the name of the function from which the parameters names/values are to be retrieved @return: a tuple of 2-celled tuples, each tuple containing 2 strings: (parameter-name, parameter-value) """ q = """SELECT param.name, param.value FROM sbmPARAMETERS AS param """\ """LEFT JOIN sbmFUNDESC AS func ON func.param=param.name """\ """WHERE func.function=%s AND param.doctype=%s """\ """ORDER BY param.name ASC""" return run_sql(q, (function, doctype)) def get_value_of_parameter_for_doctype(doctype, parameter): q = """SELECT value FROM sbmPARAMETERS WHERE doctype=%s AND name=%s""" res = run_sql(q, (doctype, parameter)) if len(res) > 0: return res[0][0] else: return None def get_functionparameternames_doctype_action(doctype, action): """Get the unique NAMES function parameters for a given action of a given doctype. @param doctype: the document type with which the parameters are associated @param action: the action (of "doctype") with which the parameters are associated @return: a tuple of tuples, where each tuple represents a parameter name: (parameter name, parameter value, doctype) """ q = """SELECT DISTINCT(par.name) FROM sbmFUNDESC AS fundesc """ \ """LEFT JOIN sbmPARAMETERS AS par ON fundesc.param = par.name """ \ """LEFT JOIN sbmFUNCTIONS AS func ON par.doctype = func.doctype AND fundesc.function = func.function """ \ """WHERE par.doctype=%s AND func.action=%s """\ """GROUP BY par.name """ \ """ORDER BY fundesc.function ASC, par.name ASC""" return run_sql(q, (doctype, action)) def get_functionparameternames_doctype_not_action(doctype, action): """Get the unique NAMES function parameters for a given action of a given doctype. @param doctype: the document type with which the parameters are associated @param action: the action (of "doctype") with which the parameters are associated @return: a tuple of tuples, where each tuple represents a parameter name: (parameter name, parameter value, doctype) """ q = """SELECT DISTINCT(par.name) FROM sbmFUNDESC AS fundesc """ \ """LEFT JOIN sbmPARAMETERS AS par ON fundesc.param = par.name """ \ """LEFT JOIN sbmFUNCTIONS AS func ON par.doctype = func.doctype AND fundesc.function = func.function """ \ """WHERE par.doctype=%s AND func.action <> %s """\ """GROUP BY par.name """ \ """ORDER BY fundesc.function ASC, par.name ASC""" return run_sql(q, (doctype, action)) def get_functionparameters_for_action_doctype(action, doctype): """Get the details of all function parameter values for a given action of a given doctype. @param doctype: the document type with which the parameter values are associated @param action: the action (of "doctype") with which the parameter values are associated @return: a tuple of tuples, where each tuple represents a parameter/value: (parameter name, parameter value, doctype) """ q = """SELECT DISTINCT(par.name), par.value, par.doctype FROM sbmFUNDESC AS fundesc """ \ """LEFT JOIN sbmPARAMETERS AS par ON fundesc.param = par.name """ \ """LEFT JOIN sbmFUNCTIONS AS func ON par.doctype = func.doctype AND fundesc.function = func.function """ \ """WHERE par.doctype=%s AND func.action=%s """\ """GROUP BY par.name """ \ """ORDER BY fundesc.function ASC, par.name ASC""" return run_sql(q, (doctype, action)) def get_numberparams_doctype_paramname(doctype, paramname): """Return a count of the number of rows found for a given parameter of a given doctype. @param doctype: the doctype with which the parameter is associated @param paramname: the parameter to be counted @return: an integer count of the number of times this parameter is found for the document type "doctype" """ q = """SELECT COUNT(name) FROM sbmPARAMETERS WHERE doctype=%s AND name=%s""" return int(run_sql(q, (doctype, paramname))[0][0]) def get_doctype_docname_descr_cd_md_fordoctype(doctype): q = """SELECT sdocname, ldocname, description, cd, md FROM sbmDOCTYPE WHERE sdocname=%s""" return run_sql(q, (doctype,)) def get_actions_sname_lname_not_linked_to_doctype(doctype): q = """SELECT actn.sactname, CONCAT("[", actn.sactname, "] ", actn.lactname) FROM sbmACTION AS actn """ \ """LEFT JOIN sbmIMPLEMENT AS subm ON subm.docname=%s AND actn.sactname=subm.actname """ \ """WHERE subm.actname IS NULL""" return run_sql(q, (doctype,)) def insert_parameter_doctype(doctype, paramname, paramval): """Insert a new parameter and its value into the parameters table (sbmPARAMETERS) for a given document type. @param doctype: the document type for which the parameter is to be inserted @param paramname: @param paramval: @return: """ q = """INSERT INTO sbmPARAMETERS (doctype, name, value) VALUES (%s, %s, %s)""" numrows_paramdoctype = get_numberparams_doctype_paramname(doctype=doctype, paramname=paramname) if numrows_paramdoctype == 0: ## go ahead and insert run_sql(q, (doctype, paramname, paramval)) return 0 ## Everything is OK else: return 1 ## Everything NOT OK - this param already exists, so not inserted def clone_functionparameters_foraction_fromdoctype_todoctype(fromdoctype, todoctype, action): ## get a list of all function-parameters/values for fromdoctype/action functionparams_action_fromdoctype = get_functionparameters_for_action_doctype(action=action, doctype=fromdoctype) numrows_functionparams_action_fromdoctype = len(functionparams_action_fromdoctype) ## for each param, test whether "todoctype" already has a value for it, and if not, clone it: for docparam in functionparams_action_fromdoctype: docparam_name = docparam[0] docparam_val = docparam[1] insert_parameter_doctype(doctype=todoctype, paramname=docparam_name, paramval=docparam_val) numrows_functionparams_action_todoctype = get_number_functionparameters_for_action_doctype(action=action, doctype=todoctype) if numrows_functionparams_action_fromdoctype == numrows_functionparams_action_todoctype: ## All is OK - the action on both document types has the same number of parameters return 0 else: ## everything NOT OK - the action on both document types has a different number of parameters ## probably some could not be cloned. return 2 to signal that cloning not 100% successful return 2 def update_category_description_doctype_categ(doctype, categ, categdescr): """Update the description of the category "categ", belonging to the document type "doctype". Set the description of this category equal to "categdescr". @param doctype: the document type for which the given category description is to be updated @param categ: the name/ID of the category whose description is to be updated @param categdescr: the new description for the category @return: integer error code (0 is OK, 1 is BAD update) """ numrows_category_doctype = get_number_categories_doctype_category(doctype=doctype, categ=categ) if numrows_category_doctype == 1: ## perform update of description q = """UPDATE sbmCATEGORIES SET lname=%s WHERE doctype=%s AND sname=%s""" run_sql(q, (categdescr, doctype, categ)) return 0 ## Everything OK else: return 1 ## Everything not OK: either no rows, or more than 1 row for category def insert_category_into_doctype(doctype, categ, categdescr): """Insert a category for a document type. It will be inserted into the last position. If the category already exists for that document type, the insert will fail. @param doctype: (string) - the document type ID. @param categ: (string) - the ID of the new category. @param categdescr: (string) - the new category's description. @return: (integer) An error code: 0 on successful insert; 1 on failure to insert. """ qstr = """INSERT INTO sbmCATEGORIES (doctype, sname, lname, score) """\ """(SELECT %s, %s, %s, COUNT(sname)+1 FROM sbmCATEGORIES WHERE doctype=%s)""" ## does this category already exist for this document type? numrows_categ = get_number_categories_doctype_category(doctype=doctype, categ=categ) if numrows_categ == 0: ## it doesn't exist for this doctype - go ahead and insert it: run_sql(qstr, (doctype, categ, categdescr, doctype)) return 0 else: ## the category already existed for this doctype - cannot insert return 1 def delete_category_doctype(doctype, categ): """Delete a given CATEGORY from a document type. @param doctype: the document type from which the category is to be deleted @param categ: the name/ID of the category to be deleted from doctype @return: 0 (ZERO) if the category was successfully deleted from this doctype; 1 (ONE) not; """ q = """DELETE FROM sbmCATEGORIES WHERE doctype=%s and sname=%s""" run_sql(q, (doctype, categ)) ## check to see whether this category still exists for the doctype: numrows_categorydoctype = get_number_categories_doctype_category(doctype=doctype, categ=categ) if numrows_categorydoctype == 0: ## Everything OK - category deleted ## now re-order all category scores correctly: normalize_doctype_category_scores(doctype) return 0 else: ## Everything NOT OK - category still present ## make a last attempt to delete it: run_sql(q, (doctype, categ)) ## check once more to see if category remains: if get_number_categories_doctype_category(doctype=doctype, categ=categ) == 0: ## Everything OK - category was deleted successfully this time ## now re-order all category scores correctly: normalize_doctype_category_scores(doctype) return 0 else: ## still unable to recover - could not delete category return 1 def delete_all_categories_doctype(doctype): """Delete all CATEGORIES for a given document type. @param doctype: the document type for which all submission-categories are to be deleted @return: 0 (ZERO) if all categories for this doctype are deleted successfully; 1 (ONE) if categories remain after the delete has been performed (i.e. all categories could not be deleted for some reason) """ q = """DELETE FROM sbmCATEGORIES WHERE doctype=%s""" run_sql(q, (doctype,)) numrows_categoriesdoctype = get_number_categories_doctype(doctype) if numrows_categoriesdoctype == 0: ## Everything OK - no submission categories remain for this doctype return 0 else: ## Everything NOT OK - still some submission categories remaining for doctype ## make a last attempt to delete them: run_sql(q, (doctype,)) ## check once more to see if categories remain: if get_number_categories_doctype(doctype) == 0: ## Everything OK - all categories were deleted successfully this time return 0 else: ## still unable to recover - could not delete all categories return 1 def delete_all_submissionfields_submission(subname): """Delete all FIELDS (i.e. field elements used on a document type's submission pages - these are the instances of WebSubmit elements throughout the system) for a given submission. This means delete all fields used by a given action of a given doctype. @param subname: the unique name/ID of the submission from which all field elements are to be deleted. @return: 0 (ZERO) if all submission fields could be deleted for the given submission; 1 (ONE) if some fields remain after the deletion was performed (i.e. for some reason it was not possible to delete all fields for the submission). """ q = """DELETE FROM sbmFIELD WHERE subname=%s""" run_sql(q, (subname,)) numrows_submissionfields_subname = get_number_submissionfields_submissionnames(subname) if numrows_submissionfields_subname == 0: ## all submission fields have been deleted for this submission return 0 else: ## all fields not deleted. try once more: run_sql(q, (subname,)) numrows_submissionfields_subname = get_number_submissionfields_submissionnames(subname) if numrows_submissionfields_subname == 0: ## OK this time - all deleted return 0 else: ## still unable to delete all submission fields for this submission - give up return 1 def delete_all_submissionfields_doctype(doctype): """Delete all FIELDS (i.e. field elements used on a document type's submission pages - these are the instances of "WebSubmit Elements" throughout the system). @param doctype: the document type for which all submission fields are to be deleted @return: 0 (ZERO) if all submission fields for this doctype are deleted successfully; 1 (ONE) if submission- fields remain after the delete has been performed (i.e. all fields could not be deleted for some reason) """ all_submissions_doctype = get_all_submissionnames_doctype(doctype=doctype) number_submissions_doctype = len(all_submissions_doctype) if number_submissions_doctype > 0: ## for each of the submissions, delete the submission fields q = """DELETE FROM sbmFIELD WHERE subname=%s""" if number_submissions_doctype > 1: for i in range(1,number_submissions_doctype): ## Ensure that we delete all elements used by all submissions for the doctype in question: q += """ OR subname=%s""" run_sql(q, map(lambda x: str(x[0]), all_submissions_doctype)) ## get a count of the number of fields remaining for these submissions after deletion. numrows_submissions = get_number_submissionfields_submissionnames(submission_names=map(lambda x: str(x[0]), all_submissions_doctype)) if numrows_submissions == 0: ## Everything is OK - no submission fields left for this doctype return 0 else: ## Everything is NOT OK - some submission fields remain for this doctype - try one more time to delete them: run_sql(q, map(lambda x: str(x[0]), all_submissions_doctype)) numrows_submissions = get_number_submissionfields_submissionnames(submission_names=map(lambda x: str(x[0]), all_submissions_doctype)) if numrows_submissions == 0: ## everything OK this time return 0 else: ## still could not delete all fields return 1 else: ## there were no submissions to delete - therefore there should be no submission fields ## cannot check, so just return OK return 0 def delete_submissiondetails_doctype(doctype, action): """Delete a SUBMISSION (action) for a given document type @param doctype: the doument type from which the submission is to be deleted @param action: the action name for the submission that is to be deleted @return: 0 (ZERO) if all submissions are deleted successfully; 1 (ONE) if submissions remain after the delete has been performed (i.e. all submissions could not be deleted for some reason) """ q = """DELETE FROM sbmIMPLEMENT WHERE docname=%s AND actname=%s""" run_sql(q, (doctype, action)) numrows_submissiondoctype = get_number_submissions_doctype_action(doctype, action) if numrows_submissiondoctype == 0: ## everything OK - the submission has been deleted return 0 else: ## everything NOT OK - could not delete submission. retry. run_sql(q, (doctype, action)) if get_number_submissions_doctype_action(doctype, action) == 0: return 0 ## success this time else: return 1 ## still unable to delete doctype def insert_doctype_details(doctype, doctypename, doctypedescr): """Insert the details of a new document type into WebSubmit. @param doctype: the ID code of the new document type @param doctypename: the name of the new document type @param doctypedescr: the description of the new document type @return: integer (0/1). 0 when insert performed; 1 when doctype already existed, so no insert performed. """ numrows_doctype = get_number_doctypes_docid(doctype) if numrows_doctype == 0: # insert new document type: q = """INSERT INTO sbmDOCTYPE (ldocname, sdocname, cd, md, description) VALUES (%s, %s, CURDATE(), CURDATE(), %s)""" run_sql(q, (doctypename, doctype, (doctypedescr != "" and doctypedescr) or (None))) return 0 # Everything is OK else: return 1 # Everything not OK: rows may already exist for document type doctype def insert_submission_details_clonefrom_submission(addtodoctype, action, clonefromdoctype): numrows_submission_addtodoctype = get_number_submissions_doctype_action(addtodoctype, action) if numrows_submission_addtodoctype == 0: ## submission does not exist for "addtodoctype" - insert it q = """INSERT INTO sbmIMPLEMENT (docname, actname, displayed, subname, nbpg, cd, md, buttonorder, statustext, level, """ \ """score, stpage, endtxt) (SELECT %s, %s, displayed, %s, nbpg, CURDATE(), CURDATE(), IFNULL(buttonorder, 100), statustext, level, """ \ """score, stpage, endtxt FROM sbmIMPLEMENT WHERE docname=%s AND actname=%s LIMIT 1)""" run_sql(q, (addtodoctype, action, "%s%s" % (action, addtodoctype), clonefromdoctype, action)) return 0 ## cloning executed - everything OK else: ## submission already exists for "addtodoctype" - cannot insert it again! return 1 def insert_submission_details(doctype, action, displayed, nbpg, buttonorder, statustext, level, score, stpage, endtext): """Insert the details of a new submission of a given document type into WebSubmit. @param doctype: the doctype ID (string) @param action: the action ID (string) @param displayed: the value of displayed (char) @param nbpg: the value of nbpg (integer) @param buttonorder: the value of buttonorder (integer) @param statustext: the value of statustext (string) @param level: the value of level (char) @param score: the value of score (integer) @param stpage: the value of stpage (integer) @param endtext: the value of endtext (string) @return: integer (0/1). 0 when insert performed; 1 when submission already existed for doctype, so no insert performed. """ numrows_submission = get_number_submissions_doctype_action(doctype, action) if numrows_submission == 0: ## this submission does not exist for doctype - insert it q = """INSERT INTO sbmIMPLEMENT (docname, actname, displayed, subname, nbpg, cd, md, buttonorder, statustext, level, """ \ """score, stpage, endtxt) VALUES(%s, %s, %s, %s, %s, CURDATE(), CURDATE(), %s, %s, %s, %s, %s, %s)""" run_sql(q, (doctype, action, displayed, "%s%s" % (action, doctype), ((str(nbpg).isdigit() and int(nbpg) >= 0) and nbpg) or ("0"), ((str(buttonorder).isdigit() and int(buttonorder) >= 0) and buttonorder) or (None), statustext, level, ((str(score).isdigit() and int(score) >= 0) and score) or (""), ((str(stpage).isdigit() and int(stpage) >= 0) and stpage) or (""), endtext ) ) return 0 ## insert performed else: ## this submission already exists for the doctype - do not insert it return 1 def get_cd_md_numbersubmissionpages_doctype_action(doctype, action): """Return the creation date (cd), the modification date (md), and the number of submission pages for a given submission (action) of a given document type (doctype). @param doctype: the document type for which the number of pages of a given submission is to be determined. @param action: the submission (action) for which the number of pages is to be determined. @return: a tuple of tuples, where each tuple contains the creation date, the modification date, and the number of pages for the given submission: ((cd, md, nbpg), (cd, md, nbpg)[,...]) """ q = """SELECT cd, md, nbpg FROM sbmIMPLEMENT WHERE docname=%s AND actname=%s LIMIT 1""" return run_sql(q, (doctype, action)) def get_numbersubmissionpages_doctype_action(doctype, action): """Return the number of submission pages belonging to a given submission (action) of a document type (doctype) as an integer. In the case that the submission does not exist, 0 (ZERO) will be returned. In the case that an error occurs, -1 will be returned. @param doctype: (string) the unique ID of a document type. @param action: (string) the unique name/ID of an action. @return: an integer - the number of pages found for the submission """ q = """SELECT nbpg FROM sbmIMPLEMENT WHERE docname=%s AND actname=%s LIMIT 1""" res = run_sql(q, (doctype, action)) if len(res) > 0: try: return int(res[0][0]) except (IndexError, ValueError): ## unexpected result return -1 else: return 0 def get_numberfields_submissionpage_doctype_action(doctype, action, pagenum): """Return the number of fields on a given page of a given submission. @param doctype: (string) the unique ID of the document type to which the submission belongs @param action: (string) the unique name/ID of the action @param pagenum: (integer) the number of the page on which fields are to be counted @return: (integer) the number of fields found on the page """ q = """SELECT COUNT(subname) FROM sbmFIELD WHERE pagenb=%s AND subname=%s""" return int(run_sql(q, (pagenum, """%s%s""" % (action, doctype)))[0][0]) def get_number_of_fields_on_submissionpage_at_positionx(doctype, action, pagenum, positionx): """Return the number of fields at positionx on a given page of a given submission. @param doctype: (string) the unique ID of the document type to which the submission belongs @param action: (string) the unique name/ID of the action @param pagenum: (integer) the number of the page on which fields are to be counted @return: (integer) the number of fields found on the page """ q = """SELECT COUNT(subname) FROM sbmFIELD WHERE pagenb=%s AND subname=%s AND fieldnb=%s""" return int(run_sql(q, (pagenum, """%s%s""" % (action, doctype), positionx))[0][0]) def swap_elements_adjacent_pages_doctype_action(doctype, action, page1, page2): ## get number pages belonging to submission: num_pages = get_numbersubmissionpages_doctype_action(doctype=doctype, action=action) tmp_page = num_pages + randint(3,10) if page1 - page2 not in (1, -1): ## pages are not adjacent - cannot swap return 1 if page1 > num_pages or page2 > num_pages or page1 < 1 or page2 < 1: ## atl least one page is out of range of legal pages: return 2 q = """UPDATE sbmFIELD SET pagenb=%s WHERE subname=%s AND pagenb=%s""" ## move fields from p1 to tmp run_sql(q, (tmp_page, "%s%s" % (action, doctype), page1)) num_fields_p1 = get_numberfields_submissionpage_doctype_action(doctype=doctype, action=action, pagenum=page1) if num_fields_p1 != 0: ## problem moving some fields from page 1 - move them back from tmp run_sql(q, (page1, "%s%s" % (action, doctype), tmp_page)) return 3 ## move fields from p2 to p1 run_sql(q, (page1, "%s%s" % (action, doctype), page2)) num_fields_p2 = get_numberfields_submissionpage_doctype_action(doctype=doctype, action=action, pagenum=page2) if num_fields_p2 != 0: ## problem moving some fields from page 2 to page 1 - try to move everything back run_sql(q, (page2, "%s%s" % (action, doctype), page1)) run_sql(q, (page1, "%s%s" % (action, doctype), tmp_page)) return 4 ## move fields from tmp_page to page2: run_sql(q, (page2, "%s%s" % (action, doctype), tmp_page)) num_fields_tmp_page = get_numberfields_submissionpage_doctype_action(doctype=doctype, action=action, pagenum=tmp_page) if num_fields_tmp_page != 0: ## problem moving some fields from tmp_page to page 2 ## stop - this problem should be examined by admin return 5 ## success - update modification date for all fields on the swapped pages update_modificationdate_fields_submissionpage(doctype=doctype, action=action, subpage=page1) update_modificationdate_fields_submissionpage(doctype=doctype, action=action, subpage=page2) return 0 def update_modificationdate_fields_submissionpage(doctype, action, subpage): q = """UPDATE sbmFIELD SET md=CURDATE() WHERE subname=%s AND pagenb=%s""" run_sql(q, ("%s%s" % (action, doctype), subpage)) return 0 def update_modificationdate_of_field_on_submissionpage(doctype, action, subpage, fieldnb): q = """UPDATE sbmFIELD SET md=CURDATE() WHERE subname=%s AND pagenb=%s AND fieldnb=%s""" run_sql(q, ("%s%s" % (action, doctype), subpage, fieldnb)) return 0 def decrement_by_one_pagenumber_submissionelements_abovepage(doctype, action, frompage): q = """UPDATE sbmFIELD SET pagenb=pagenb-1, md=CURDATE() WHERE subname=%s AND pagenb > %s""" run_sql(q, ("%s%s" % (action, doctype), frompage)) return 0 def get_details_and_description_of_all_fields_on_submissionpage(doctype, action, pagenum): """Get the details and descriptions of all fields on a given submission page, ordered by ascending field number. @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @param pagenum: (integer) the number of the page on which the fields to be displayed are found @return: a tuple of tuples. Each tuple represents one field on the page. (fieldname, field-label, check-name, field-type, size, rows, cols, field-description, field-default-value) """ q = """SELECT field.fidesc, field.fitext, field.checkn, el.type, el.size, el.rows, el.cols, el.fidesc, IFNULL(el.val,"") """\ """FROM sbmFIELD AS field """\ """LEFT JOIN sbmFIELDDESC AS el ON el.name=field.fidesc """\ """WHERE field.subname=%s AND field.pagenb=%s """\ """ORDER BY field.fieldnb ASC""" res = run_sql(q, ("%s%s" % (action, doctype), pagenum)) return res def insert_field_onto_submissionpage(doctype, action, pagenum, fieldname, fieldtext, fieldlevel, fieldshortdesc, fieldcheck): """Insert a field onto a given submission page, in the last position. @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @param pagenum: (integer) the number of the page onto which the field is to be added @param fieldname: (string) the "element name" of the field to be added to the page @param fieldtext: (string) the label to be displayed for the fieldon a submission page @param fieldlevel: (char) the level of a field ('M' or 'O') - Mandatory or Optional @param fieldshortdesc: (string) the short description for a field @param fieldcheck: (string) the name of a check to be associated with a field @return: None @Exceptions raised: InvenioWebSubmitAdminWarningInsertFailed - raised if it was not possible to insert the row for the field """ ## get the number of fields on the page onto which the new field is to be inserted: numfields_preinsert = get_numberfields_submissionpage_doctype_action(doctype=doctype, action=action, pagenum=pagenum) q = """INSERT INTO sbmFIELD (subname, pagenb, fieldnb, fidesc, fitext, level, sdesc, checkn, cd, md, """ \ """fiefi1, fiefi2) """\ """(SELECT %s, %s, COUNT(subname)+1, %s, %s, %s, %s, %s, CURDATE(), CURDATE(), NULL, NULL FROM sbmFIELD """ \ """WHERE subname=%s AND pagenb=%s)""" run_sql(q, ("%s%s" % (action, doctype), pagenum, fieldname, fieldtext, fieldlevel, fieldshortdesc, fieldcheck, "%s%s" % (action, doctype), pagenum)) numfields_postinsert = get_numberfields_submissionpage_doctype_action(doctype=doctype, action=action, pagenum=pagenum) if not (numfields_postinsert > numfields_preinsert): ## seems as though the new field was not inserted: msg = """Failed when trying to add a new field to page %s of submission %s""" % (pagenum, "%s%s" % (action, doctype)) raise InvenioWebSubmitAdminWarningInsertFailed(msg) return def delete_a_field_from_submissionpage(doctype, action, pagenum, fieldposn): q = """DELETE FROM sbmFIELD WHERE subname=%s AND pagenb=%s AND fieldnb=%s""" run_sql(q, ("""%s%s""" % (action, doctype), pagenum, fieldposn)) ## check number of fields at deleted field's position. If 0, promote all fields below it by 1 posn; ## If field(s) still exists at deleted field's posn, report error. numfields_deletedfieldposn = \ get_number_of_fields_on_submissionpage_at_positionx(doctype=doctype, action=action, pagenum=pagenum, positionx=fieldposn) if numfields_deletedfieldposn == 0: ## everything OK - field was successfully deleted return 0 else: ## everything NOT OK - couldn't delete field - retry run_sql(q, ("""%s%s""" % (action, doctype), pagenum, fieldposn)) numfields_deletedfieldposn = \ get_number_of_fields_on_submissionpage_at_positionx(doctype=doctype, action=action, pagenum=pagenum, positionx=fieldposn) if numfields_deletedfieldposn == 0: ## success this time return 0 else: ## still unable to delete all fields - return fail code return 'WRN_WEBSUBMITADMIN_UNABLE_TO_DELETE_FIELD_FROM_SUBMISSION_PAGE' def update_details_of_a_field_on_a_submissionpage(doctype, action, pagenum, fieldposn, fieldtext, fieldlevel, fieldshortdesc, fieldcheck): """Update the details of one field, as found at a given location on a given submission page. @param doctype: (string) unique ID for a document type @param action: (string) unique ID for an action @param pagenum: (integer) number of page on which field is found @param fieldposn: (integer) number of field on page @param fieldtext: (string) text label for field on page @param fieldlevel: (char) level of field (should be 'M' or 'O' - mandatory or optional) @param fieldshortdesc: (string) short description of field @param fieldcheck: (string) name of JavaScript Check to be applied to field @return: None @Exceptions raised: InvenioWebSubmitAdminWarningTooManyRows - when multiple rows found for field InvenioWebSubmitAdminWarningNoRowsFound - when no rows found for field """ q = """UPDATE sbmFIELD SET fitext=%s, level=%s, sdesc=%s, checkn=%s, md=CURDATE() WHERE subname=%s AND pagenb=%s AND fieldnb=%s""" queryargs = (fieldtext, fieldlevel, fieldshortdesc, fieldcheck, "%s%s" % (action, doctype), pagenum, fieldposn) ## get number of rows found for field: numrows_field = get_number_of_fields_on_submissionpage_at_positionx(doctype=doctype, action=action, pagenum=pagenum, positionx=fieldposn) if numrows_field == 1: run_sql(q, queryargs) return elif numrows_field > 1: ## multiple rows found for the field at this position - not safe to edit msg = """When trying to update the field in position %s on page %s of the submission %s, %s rows were found for the field""" \ % (fieldposn, pagenum, "%s%s" % (action, doctype), numrows_field) raise InvenioWebSubmitAdminWarningTooManyRows(msg) else: ## no row for field found msg = """When trying to update the field in position %s on page %s of the submission %s, no rows were found for the field""" \ % (fieldposn, pagenum, "%s%s" % (action, doctype)) raise InvenioWebSubmitAdminWarningNoRowsFound(msg) def delete_a_field_from_submissionpage_then_reorder_fields_below_to_fill_vacant_position(doctype, action, pagenum, fieldposn): """Delete a submission field from a given page of a given document-type submission. E.g. Delete the field in position 3, from page 2 of the "SBI" submission of the "TEST" document-type. @param doctype: (string) the unique ID of the document type @param action: (string) the unique name/ID of the submission/action @param pagenum: (integer) the number of the page from which the field is to be deleted @param fieldposn: (integer) the number of the field to be deleted (e.g. field at position number 1, or number 2, etc.) @return: An integer number containing the number of rows deleted; -OR- An error string in the event that something goes wrong. """ delete_res = delete_a_field_from_submissionpage(doctype=doctype, action=action, pagenum=pagenum, fieldposn=fieldposn) if delete_res == 0: ## deletion was successful - demote fields below deleted field into gap: update_res = decrement_position_of_all_fields_atposition_greaterthan_positionx_on_submissionpage(doctype=doctype, action=action, pagenum=pagenum, positionx=fieldposn, decrement=1) ## update the modification date of the page: update_modification_date_for_submission(doctype=doctype, action=action) return 0 else: ## could not delete field! return an appropriate error message return delete_res def update_modification_date_for_submission(doctype, action): """Update the "last-modification" date for a submission to the current date (today). @param doctype: (string) the unique ID of a document type @param action: (string) the unique ID of an action @return: None """ q = """UPDATE sbmIMPLEMENT SET md=CURDATE() WHERE docname=%s AND actname=%s""" run_sql(q, (doctype, action)) return def move_field_on_submissionpage_from_positionx_to_positiony(doctype, action, pagenum, movefieldfrom, movefieldto): ## get number of fields on submission page: try: movefieldfrom = int(movefieldfrom) movefieldto = int(movefieldto) except ValueError: return 'WRN_WEBSUBMITADMIN_INVALID_FIELD_NUMBERS_SUPPLIED_WHEN_TRYING_TO_MOVE_FIELD_ON_SUBMISSION_PAGE' numfields_page = get_numberfields_submissionpage_doctype_action(doctype=doctype, action=action, pagenum=pagenum) if movefieldfrom > numfields_page or movefieldto > numfields_page or movefieldfrom < 1 or \ movefieldto < 1 or movefieldfrom == movefieldto: ## invalid move-field coordinates: return 'WRN_WEBSUBMITADMIN_INVALID_FIELD_NUMBERS_SUPPLIED_WHEN_TRYING_TO_MOVE_FIELD_ON_SUBMISSION_PAGE' q = """UPDATE sbmFIELD SET fieldnb=%s WHERE subname=%s AND pagenb=%s AND fieldnb=%s""" ## process movement: if movefieldfrom - movefieldto in (1, -1): ## fields are adjacent - swap them around: tmp_fieldnb = numfields_page + randint(3,10) ## move field from position 'movefieldfrom' to tempoary position 'tmp_fieldnb': run_sql(q, (tmp_fieldnb, "%s%s" % (action, doctype), pagenum, movefieldfrom)) num_fields_posn_movefieldfrom = \ get_number_of_fields_on_submissionpage_at_positionx(doctype=doctype, action=action, pagenum=pagenum, positionx=movefieldfrom) if num_fields_posn_movefieldfrom != 0: ## problem moving the field from its position to the temporary position ## try to move it back, and return with an error return 'WRN_WEBSUBMITADMIN_UNABLE_TO_SWAP_TWO_FIELDS_ON_SUBMISSION_PAGE_COULDNT_MOVE_FIELD1_TO_TEMP_POSITION' ## move field from position 'movefieldto' to position 'movefieldfrom': run_sql(q, (movefieldfrom, "%s%s" % (action, doctype), pagenum, movefieldto)) num_fields_posn_movefieldto = \ get_number_of_fields_on_submissionpage_at_positionx(doctype=doctype, action=action, pagenum=pagenum, positionx=movefieldto) if num_fields_posn_movefieldto != 0: ## problem moving the field at 'movefieldto' into the position 'movefieldfrom' ## try to reverse the changes made so far, then return with an error: ## move field at temporary posn back to 'movefieldfrom' position: run_sql(q, (movefieldfrom, "%s%s" % (action, doctype), pagenum, tmp_fieldnb)) return 'WRN_WEBSUBMITADMIN_UNABLE_TO_SWAP_TWO_FIELDS_ON_SUBMISSION_PAGE_COULDNT_MOVE_FIELD2_TO_FIELD1_POSITION' ## move field from temporary position 'tmp_fieldnb' to position 'movefieldto': run_sql(q, (movefieldto, "%s%s" % (action, doctype), pagenum, tmp_fieldnb)) num_fields_posn_tmp_fieldnb = \ get_number_of_fields_on_submissionpage_at_positionx(doctype=doctype, action=action, pagenum=pagenum, positionx=tmp_fieldnb) if num_fields_posn_tmp_fieldnb != 0: ## problem moving the field from the temporary position to position 'movefieldto' ## stop - admin should examine and fix this problem return 'WRN_WEBSUBMITADMIN_UNABLE_TO_SWAP_TWO_FIELDS_ON_SUBMISSION_PAGE_COULDNT_MOVE_FIELD1_TO_POSITION_FIELD2_FROM_TEMPORARY_POSITION' ## successfully swapped fields - update modification date of the swapped fields and of the submission update_modificationdate_of_field_on_submissionpage(doctype=doctype, action=action, subpage=pagenum, fieldnb=movefieldfrom) update_modificationdate_of_field_on_submissionpage(doctype=doctype, action=action, subpage=pagenum, fieldnb=movefieldto) update_modification_date_for_submission(doctype=doctype, action=action) return 0 else: ## fields not adjacent - perform a move: tmp_fieldnb = 0 - randint(3,10) ## move field from position 'movefieldfrom' to tempoary position 'tmp_fieldnb': run_sql(q, (tmp_fieldnb, "%s%s" % (action, doctype), pagenum, movefieldfrom)) num_fields_posn_movefieldfrom = \ get_number_of_fields_on_submissionpage_at_positionx(doctype=doctype, action=action, pagenum=pagenum, positionx=movefieldfrom) if num_fields_posn_movefieldfrom != 0: ## problem moving the field from its position to the temporary position ## try to move it back, and return with an error return 'WRN_WEBSUBMITADMIN_UNABLE_TO_SWAP_TWO_FIELDS_ON_SUBMISSION_PAGE_COULDNT_MOVE_FIELD1_TO_TEMP_POSITION' ## fill the gap created by the moved field by decrementing by one the position of all fields below it: qres = decrement_position_of_all_fields_atposition_greaterthan_positionx_on_submissionpage(doctype=doctype, action=action, pagenum=pagenum, positionx=movefieldfrom, decrement=1) if movefieldfrom < numfields_page: ## check that there is now a field in the position of "movefieldfrom": num_fields_posn_movefieldfrom = \ get_number_of_fields_on_submissionpage_at_positionx(doctype=doctype, action=action, pagenum=pagenum, positionx=movefieldfrom) if num_fields_posn_movefieldfrom == 0: ## no field there - it was not possible to decrement the field position of all fields below the field moved 'tmp_fieldnb' ## try to move the field back from 'tmp_fieldnb' run_sql(q, (movefieldfrom, "%s%s" % (action, doctype), pagenum, tmp_fieldnb)) ## return an ERROR message return 'WRN_WEBSUBMITADMIN_UNABLE_TO_MOVE_FIELD_TO_NEW_POSITION_ON_SUBMISSION_PAGE_COULDNT_DECREMENT_POSITION_OF_FIELDS_BELOW_FIELD1' ## now increment (by one) the position of the fields at and below the field at position 'movefieldto': qres = increment_position_of_all_fields_atposition_greaterthan_positionx_on_submissionpage(doctype=doctype, action=action, pagenum=pagenum, positionx=movefieldto-1, increment=1) ## there should now be an empty space at position 'movefieldto': num_fields_posn_movefieldto = \ get_number_of_fields_on_submissionpage_at_positionx(doctype=doctype, action=action, pagenum=pagenum, positionx=movefieldto) if num_fields_posn_movefieldto != 0: ## there isn't! the increment of position has failed - return warning: return 'WRN_WEBSUBMITADMIN_UNABLE_TO_MOVE_FIELD_TO_NEW_POSITION_ON_SUBMISSION_PAGE_COULDNT_INCREMENT_POSITION_OF_FIELDS_AT_AND_BELOW_FIELD2' ## Move field from temporary position to position 'movefieldto': run_sql(q, (movefieldto, "%s%s" % (action, doctype), pagenum, tmp_fieldnb)) num_fields_posn_movefieldto = \ get_number_of_fields_on_submissionpage_at_positionx(doctype=doctype, action=action, pagenum=pagenum, positionx=movefieldto) if num_fields_posn_movefieldto == 0: ## failed to move field1 from temp posn to final posn return 'WRN_WEBSUBMITADMIN_UNABLE_TO_SWAP_TWO_FIELDS_ON_SUBMISSION_PAGE_COULDNT_MOVE_FIELD1_TO_POSITION_FIELD2_FROM_TEMPORARY_POSITION' ## successfully moved field - update modification date of the moved field and of the submission update_modificationdate_of_field_on_submissionpage(doctype=doctype, action=action, subpage=pagenum, fieldnb=movefieldfrom) update_modification_date_for_submission(doctype=doctype, action=action) return 0 def increment_position_of_all_fields_atposition_greaterthan_positionx_on_submissionpage(doctype, action, pagenum, positionx, increment=1): """Increment (by the number provided via the "increment" parameter) the position of all fields (on a given submission page) found at a position greater than that of positionx @param doctype: (string) the unique ID of a document type @param action: (string) the unique name/ID of the action @param pagenum: (integer) the number of the submission page on which the fields are situated @param positionx: (integer) the position after which fields' positions are to be promoted @param increment: (integer) the number by which to increment the field positions (defaults to 1) @return: """ if type(increment) is not int: increment = 1 q = """UPDATE sbmFIELD SET fieldnb=fieldnb+%s WHERE subname=%s AND pagenb=%s AND fieldnb > %s""" res = run_sql(q, (increment, "%s%s" % (action, doctype), pagenum, positionx)) try: return int(res) except ValueError: return None def decrement_position_of_all_fields_atposition_greaterthan_positionx_on_submissionpage(doctype, action, pagenum, positionx, decrement=1): """Decrement (by the number provided via the "decrement" parameter) the position of all fields (on a given submission page) found at a position greater than that of positionx @param doctype: (string) the unique ID of a document type @param action: (string) the unique name/ID of the action @param pagenum: (integer) the number of the submission page on which the fields are situated @param positionx: (integer) the position after which fields' positions are to be promoted @param decrement: (integer) the number by which to increment the field positions (defaults to 1) @return: """ if type(decrement) is not int: decrement = 1 q = """UPDATE sbmFIELD SET fieldnb=fieldnb-%s WHERE subname=%s AND pagenb=%s AND fieldnb > %s""" res = run_sql(q, (decrement, "%s%s" % (action, doctype), pagenum, positionx)) try: return int(res) except ValueError: return None def delete_allfields_submissionpage_doctype_action(doctype, action, pagenum): q = """DELETE FROM sbmFIELD WHERE pagenb=%s AND subname=%s""" run_sql(q, (pagenum, """%s%s""" % (action, doctype))) numrows_fields = get_numberfields_submissionpage_doctype_action(doctype=doctype, action=action, pagenum=pagenum) if numrows_fields == 0: ## everything OK - all fields deleted return 0 else: ## everything NOT OK - couldn't delete all fields for page ## retry run_sql(q, (pagenum, doctype, action)) numrows_fields = get_numberfields_submissionpage_doctype_action(doctype=doctype, action=action, pagenum=pagenum) if numrows_fields == 0: ## success this time return 0 else: ## still unable to delete all fields - return fail code return 1 def get_details_allsubmissionfields_on_submission_page(doctype, action, pagenum): """Get the details of all submission elements belonging to a particular page of the submission. Results are returned ordered by field number. @param doctype: (string) the unique ID of a document type @param action: (string) the unique name/ID of an action @param pagenum: (string/integer): the integer number of the page for which element details are to be retrieved @return: a tuple of tuples: (subname, fieldnb, fidesc, fitext, level, sdesc, checkn, cd, md). Each tuple contains the details of one element. """ q = """SELECT subname, fieldnb, fidesc, fitext, level, sdesc, checkn, cd, md FROM sbmFIELD """\ """WHERE subname=%s AND pagenb=%s ORDER BY fieldnb ASC""" return run_sql(q, ("%s%s" % (action, doctype), pagenum)) def get_details_of_field_at_positionx_on_submissionpage(doctype, action, pagenum, fieldposition): """Get the details of a particular field in a submission page. @param doctype: (string) the unique ID of a document type @param action: (string) the unique name/ID of an action @param pagenum: (integer) the number of the submission page on which the field is found @param fieldposition: (integer) the position on the submission page of the field for which details are to be retrieved. @return: a tuple of the field's details: (subname, fieldnb, fidesc, fitext, level, sdesc, checkn, cd, md). Each tuple contains the details of one element. """ fielddets = [] q = """SELECT subname, fieldnb, fidesc, fitext, level, sdesc, checkn, cd, md FROM sbmFIELD """\ """WHERE subname=%s AND pagenb=%s AND fieldnb=%s LIMIT 1""" res = run_sql(q, ("%s%s" % (action, doctype), pagenum, fieldposition)) if len(res) > 0: fielddets = res[0] return fielddets def decrement_by_one_number_submissionpages_doctype_action(doctype, action): numrows_submission = get_number_submissions_doctype_action(doctype, action) if numrows_submission == 1: ## there is only one row for this submission - can update q = """UPDATE sbmIMPLEMENT SET nbpg=IFNULL(nbpg, 1)-1, md=CURDATE() WHERE docname=%s AND actname=%s and IFNULL(nbpg, 1) > 0""" run_sql(q, (doctype, action)) return 0 ## Everything OK else: ## Everything NOT OK - either multiple rows exist for submission, or submission doesn't exist return 1 def add_submission_page_doctype_action(doctype, action): """Increment the number of pages associated with a given submission by 1 @param doctype: the unique ID of the document type that owns the submission. @param action: the action name/ID of the given submission of the document type, for which the number of pages is to be incremented. @return: an integer error code. 0 (ZERO) means that the update was performed without error; 1 (ONE) means that there was a problem and the update could not be performed. Problems could be: multiple rows found for the submission; no rows found for the submission. """ numrows_submission = get_number_submissions_doctype_action(doctype, action) if numrows_submission == 1: ## there is only one row for this submission - can update q = """UPDATE sbmIMPLEMENT SET nbpg=IFNULL(nbpg, 0)+1, md=CURDATE() WHERE docname=%s AND actname=%s""" run_sql(q, (doctype, action)) return 0 ## Everything OK else: ## Everything NOT OK - either multiple rows exist for submission, or submission doesn't exist return 1