diff --git a/INSTALL b/INSTALL
index 9ab34edf6..53204197b 100644
--- a/INSTALL
+++ b/INSTALL
@@ -1,696 +1,694 @@
 Invenio INSTALLATION
 ====================
 
 About
 =====
 
 This document specifies how to build, customize, and install Invenio
 v1.0.1 for the first time.  See RELEASE-NOTES if you are upgrading
 from a previous Invenio release.
 
 Contents
 ========
 
 0. Prerequisites
 1. Quick instructions for the impatient Invenio admin
 2. Detailed instructions for the patient Invenio admin
 
 0. Prerequisites
 ================
 
    Here is the software you need to have around before you
    start installing Invenio:
 
      a) Unix-like operating system.  The main development and
         production platforms for Invenio at CERN are GNU/Linux
         distributions Debian, Gentoo, Scientific Linux (aka RHEL),
         Ubuntu, but we also develop on Mac OS X.  Basically any Unix
         system supporting the software listed below should do.
 
         If you are using Debian GNU/Linux ``Lenny'' or later, then you
         can install most of the below-mentioned prerequisites and
         recommendations by running:
 
           $ sudo aptitude install python-dev apache2-mpm-prefork \
               mysql-server mysql-client python-mysqldb \
               python-4suite-xml python-simplejson python-xml \
               python-libxml2 python-libxslt1 gnuplot poppler-utils \
               gs-common clisp gettext libapache2-mod-wsgi unzip \
               python-dateutil python-rdflib \
               python-gnuplot python-magic pdftk html2text giflib-tools \
               pstotext netpbm
 
         You may also want to install some of the following packages,
         if you have them available on your concrete architecture:
 
-          $ sudo aptitude install python-psyco sbcl cmucl pylint \
-              pychecker pyflakes python-profiler python-epydoc \
-              libapache2-mod-xsendfile openoffice.org
+          $ sudo aptitude install sbcl cmucl pylint pychecker pyflakes \
+              python-profiler python-epydoc libapache2-mod-xsendfile \
+              openoffice.org
 
         Moreover, you should install some Message Transfer Agent (MTA)
         such as Postfix so that Invenio can email notification
         alerts or registration information to the end users, contact
         moderators and reviewers of submitted documents, inform
         administrators about various runtime system information, etc:
 
           $ sudo aptitude install postfix
 
         After running the above-quoted aptitude command(s), you can
         proceed to configuring your MySQL server instance
         (max_allowed_packet in my.cnf, see item 0b below) and then to
         installing the Invenio software package in the section 1
         below.
 
         If you are using another operating system, then please
         continue reading the rest of this prerequisites section, and
         please consult our wiki pages for any concrete hints for your
         specific operating system.
         <https://twiki.cern.ch/twiki/bin/view/CDS/Invenio>
 
      b) MySQL server (may be on a remote machine), and MySQL client
         (must be available locally too).  MySQL versions 4.1 or 5.0
         are supported.  Please set the variable "max_allowed_packet"
         in your "my.cnf" init file to at least 4M.  (For sites such as
         INSPIRE, having 1M records with 10M citer-citee pairs in its
         citation map, you may need to increase max_allowed_packet to
         1G.)  You may perhaps also want to run your MySQL server
         natively in UTF-8 mode by setting "default-character-set=utf8"
         in various parts of your "my.cnf" file, such as in the
         "[mysql]" part and elsewhere; but this is not really required.
         <http://mysql.com/>
 
      c) Apache 2 server, with support for loading DSO modules, and
         optionally with SSL support for HTTPS-secure user
         authentication, and mod_xsendfile for off-loading file
         downloads away from Invenio processes to Apache.
           <http://httpd.apache.org/>
           <http://tn123.ath.cx/mod_xsendfile/>
 
      d) Python v2.4 or above:
           <http://python.org/>
         as well as the following Python modules:
           - (mandatory) MySQLdb (version >= 1.2.1_p2; see below)
              <http://sourceforge.net/projects/mysql-python>
           - (recommended) python-dateutil, for complex date processing:
              <http://labix.org/python-dateutil>
           - (recommended) PyXML, for XML processing:
              <http://pyxml.sourceforge.net/topics/download.html>
           - (recommended) PyRXP, for very fast XML MARC processing:
              <http://www.reportlab.org/pyrxp.html>
           - (recommended) libxml2-python, for XML/XLST processing:
              <ftp://xmlsoft.org/libxml2/python/>
           - (recommended) simplejson, for AJAX apps:
              <http://undefined.org/python/#simplejson>
              Note that if you are using Python-2.6, you don't need to
              install simplejson, because the module is already included
              in the main Python distribution.
           - (recommended) Gnuplot.Py, for producing graphs:
              <http://gnuplot-py.sourceforge.net/>
           - (recommended) Snowball Stemmer, for stemming:
              <http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz>
           - (recommended) py-editdist, for record merging:
              <http://www.mindrot.org/projects/py-editdist/>
           - (recommended) numpy, for citerank methods:
              <http://numpy.scipy.org/>
           - (recommended) magic, for full-text file handling:
              <http://www.darwinsys.com/file/>
           - (optional) 4suite, slower alternative to PyRXP and
              libxml2-python:
              <http://4suite.org/>
           - (optional) feedparser, for web journal creation:
              <http://feedparser.org/>
-          - (optional) Psyco, if you are running on a 32-bit OS:
-             <http://psyco.sourceforge.net/>
           - (optional) RDFLib, to use RDF ontologies and thesauri:
              <http://rdflib.net/>
           - (optional) mechanize, to run regression web test suite:
              <http://wwwsearch.sourceforge.net/mechanize/>
           - (optional) hashlib, needed only for Python-2.4 and only
              if you would like to use AWS connectivity:
              <http://pypi.python.org/pypi/hashlib>
 
         Note: MySQLdb version 1.2.1_p2 or higher is recommended.  If
               you are using an older version of MySQLdb, you may get
               into problems with character encoding.
 
      e) mod_wsgi Apache module.  Versions 3.x and above are
         recommended.
           <http://code.google.com/p/modwsgi/>
 
         Note: if you are using Python 2.4 or earlier, then you should
               also install the wsgiref Python module, available from:
               <http://pypi.python.org/pypi/wsgiref/> (As of Python 2.5
               this module is included in standard Python
               distribution.)
 
      f) If you want to be able to extract references from PDF fulltext
         files, then you need to install pdftotext version 3 at least.
           <http://poppler.freedesktop.org/>
           <http://www.foolabs.com/xpdf/home.html>
 
      g) If you want to be able to search for words in the fulltext
         files (i.e. to have fulltext indexing) or to stamp submitted
         files, then you need as well to install some of the following
         tools:
           - for Microsoft Office/OpenOffice.org document conversion:
                 OpenOffice.org
               <http://www.openoffice.org/>
           - for PDF file stamping: pdftk, pdf2ps
               <http://www.accesspdf.com/pdftk/>
               <http://www.cs.wisc.edu/~ghost/doc/AFPL/>
           - for PDF files: pdftotext or pstotext
               <http://poppler.freedesktop.org/>
               <http://www.foolabs.com/xpdf/home.html>
               <http://www.cs.wisc.edu/~ghost/doc/AFPL/>
           - for PostScript files: pstotext or ps2ascii
               <http://www.cs.wisc.edu/~ghost/doc/AFPL/>
           - for DjVu creation, elaboration: DjVuLibre
               <http://djvu.sourceforge.net>
           - to perform OCR: OCRopus (tested only with release 0.3.1)
               <http://code.google.com/p/ocropus/>
           - to perform different image elaborations: ImageMagick
               <http://www.imagemagick.org/>
           - to generate PDF after OCR: ReportLab
               <http://www.reportlab.org/rl_toolkit.html>
           - to analyze images to generate PDF after OCR: netpbm
               <http://netpbm.sourceforge.net/>
 
      h) If you have chosen to install fast XML MARC Python processors
         in the step d) above, then you have to install the parsers
         themselves:
           - (optional) 4suite:
              <http://4suite.org/>
 
      i) (recommended) Gnuplot, the command-line driven interactive
         plotting program.  It is used to display download and citation
         history graphs on the Detailed record pages on the web
         interface.  Note that Gnuplot must be compiled with PNG output
         support, that is, with the GD library.  Note also that Gnuplot
         is not required, only recommended.
           <http://www.gnuplot.info/>
 
      j) (recommended) A Common Lisp implementation, such as CLISP,
         SBCL or CMUCL.  It is used for the web server log analysing
         tool and the metadata checking program.  Note that any of the
         three implementations CLISP, SBCL, or CMUCL will do.  CMUCL
         produces fastest machine code, but it does not support UTF-8
         yet.  Pick up CLISP if you don't know what to do.  Note that a
         Common Lisp implementation is not required, only recommended.
           <http://clisp.cons.org/>
           <http://www.cons.org/cmucl/>
           <http://sbcl.sourceforge.net/>
 
      k) GNU gettext, a set of tools that makes it possible to
         translate the application in multiple languages.
            <http://www.gnu.org/software/gettext/>
         This is available by default on many systems.
 
    Note that the configure script checks whether you have all the
    prerequisite software installed and that it won't let you continue
    unless everything is in order.  It also warns you if it cannot find
    some optional but recommended software.
 
 
 1. Quick instructions for the impatient Invenio admin
 =========================================================
 
 1a. Installation
 ----------------
 
       $ cd $HOME/src/
       $ wget http://invenio-software.org/download/invenio-1.0.1.tar.gz
       $ wget http://invenio-software.org/download/invenio-1.0.1.tar.gz.md5
       $ wget http://invenio-software.org/download/invenio-1.0.1.tar.gz.sig
       $ md5sum -c invenio-1.0.1.tar.gz.md5
       $ gpg --verify invenio-1.0.1.tar.gz.sig invenio-1.0.1.tar.gz
       $ tar xvfz invenio-1.0.1.tar.gz
       $ cd invenio-1.0.1
       $ ./configure
       $ make
       $ make install
       $ make install-mathjax-plugin     ## optional
       $ make install-jquery-plugins    ## optional
       $ make install-fckeditor-plugin  ## optional
       $ make install-pdfa-helper-files ## optional
 
 1b. Configuration
 -----------------
 
       $ sudo chown -R www-data.www-data /opt/invenio
       $ sudo -u www-data emacs /opt/invenio/etc/invenio-local.conf
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --update-all
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-tables
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-webstat-conf
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-apache-conf
       $ sudo /etc/init.d/apache2 restart
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-demo-site
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-demo-records
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-unit-tests
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-regression-tests
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-web-tests
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --remove-demo-records
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --drop-demo-site
       $ firefox http://your.site.com/help/admin/howto-run
 
 2. Detailed instructions for the patient Invenio admin
 ==========================================================
 
 2a. Installation
 ----------------
 
     The Invenio uses standard GNU autoconf method to build and
     install its files.  This means that you proceed as follows:
 
       $ cd $HOME/src/
 
           Change to a directory where we will build the Invenio
           sources.  (The built files will be installed into different
           "target" directories later.)
 
       $ wget http://invenio-software.org/download/invenio-1.0.1.tar.gz
       $ wget http://invenio-software.org/download/invenio-1.0.1.tar.gz.md5
       $ wget http://invenio-software.org/download/invenio-1.0.1.tar.gz.sig
 
           Fetch Invenio source tarball from the distribution server,
           together with MD5 checksum and GnuPG cryptographic signature
           files useful for verifying the integrity of the tarball.
 
       $ md5sum -c invenio-1.0.1.tar.gz.md5
 
           Verify MD5 checksum.
 
       $ gpg --verify invenio-1.0.1.tar.gz.sig invenio-1.0.1.tar.gz
 
           Verify GnuPG cryptographic signature.  Note that you may
           first have to import my public key into your keyring, if you
           haven't done that already:
             $ gpg --keyserver wwwkeys.eu.pgp.net --recv-keys 0xBA5A2B67
           The output of the gpg --verify command should then read:
             Good signature from "Tibor Simko <tibor@simko.info>"
           You can safely ignore any trusted signature certification
           warning that may follow after the signature has been
           successfully verified.
 
       $ tar xvfz invenio-1.0.1.tar.gz
 
           Untar the distribution tarball.
 
       $ cd invenio-1.0.1
 
           Go to the source directory.
 
       $ ./configure
 
           Configure Invenio software for building on this specific
           platform.  You can use the following optional parameters:
 
               --prefix=/opt/invenio
 
                  Optionally, specify the Invenio general
                  installation directory (default is /opt/invenio).
                  It will contain command-line binaries and program
                  libraries containing the core Invenio
                  functionality, but also store web pages, runtime log
                  and cache information, document data files, etc.
                  Several subdirs like `bin', `etc', `lib', or `var'
                  will be created inside the prefix directory to this
                  effect.  Note that the prefix directory should be
                  chosen outside of the Apache htdocs tree, since only
                  one its subdirectory (prefix/var/www) is to be
                  accessible directly via the Web (see below).
 
                  Note that Invenio won't install to any other
                  directory but to the prefix mentioned in this
                  configuration line.
 
               --with-python=/opt/python/bin/python2.4
 
                  Optionally, specify a path to some specific Python
                  binary.  This is useful if you have more than one
                  Python installation on your system.  If you don't set
                  this option, then the first Python that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-mysql=/opt/mysql/bin/mysql
 
                  Optionally, specify a path to some specific MySQL
                  client binary.  This is useful if you have more than
                  one MySQL installation on your system.  If you don't
                  set this option, then the first MySQL client
                  executable that will be found in your PATH will be
                  chosen for running Invenio.
 
               --with-clisp=/opt/clisp/bin/clisp
 
                  Optionally, specify a path to CLISP executable.  This
                  is useful if you have more than one CLISP
                  installation on your system.  If you don't set this
                  option, then the first executable that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-cmucl=/opt/cmucl/bin/lisp
 
                  Optionally, specify a path to CMUCL executable.  This
                  is useful if you have more than one CMUCL
                  installation on your system.  If you don't set this
                  option, then the first executable that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-sbcl=/opt/sbcl/bin/sbcl
 
                  Optionally, specify a path to SBCL executable.  This
                  is useful if you have more than one SBCL
                  installation on your system.  If you don't set this
                  option, then the first executable that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-openoffice-python
 
                  Optionally, specify the path to the Python interpreter
                  embedded with OpenOffice.org. This is normally not
                  contained in the normal path. If you don't specify this
                  it won't be possible to use OpenOffice.org to convert from and
                  to Microsoft Office and OpenOffice.org documents.
 
           This configuration step is mandatory.  Usually, you do this
           step only once.
 
           (Note that if you are building Invenio not from a
           released tarball, but from the Git sources, then you have to
           generate the configure file via autotools:
 
               $ sudo aptitude install automake1.9 autoconf
               $ aclocal-1.9
               $ automake-1.9 -a
               $ autoconf
 
           after which you proceed with the usual configure command.)
 
       $ make
 
           Launch the Invenio build.  Since many messages are printed
           during the build process, you may want to run it in a
           fast-scrolling terminal such as rxvt or in a detached screen
           session.
 
           During this step all the pages and scripts will be
           pre-created and customized based on the config you have
           edited in the previous step.
 
           Note that on systems such as FreeBSD or Mac OS X you have to
           use GNU make ("gmake") instead of "make".
 
       $ make install
 
           Install the web pages, scripts, utilities and everything
           needed for Invenio runtime into respective installation
           directories, as specified earlier by the configure command.
 
           Note that if you are installing Invenio for the first
           time, you will be asked to create symbolic link(s) from
           Python's site-packages system-wide directory(ies) to the
           installation location.  This is in order to instruct Python
           where to find Invenio's Python files.  You will be
           hinted as to the exact command to use based on the
           parameters you have used in the configure command.
 
       $ make install-mathjax-plugin  ## optional
 
           This will automatically download and install in the proper
           place MathJax, a JavaScript library to render LaTeX formulas
           in the client browser.
 
           Note that in order to enable the rendering you will have to
           set the variable CFG_WEBSEARCH_USE_MATHJAX_FOR_FORMATS in
           invenio-local.conf to a suitable list of output format
           codes. For example:
           CFG_WEBSEARCH_USE_MATHJAX_FOR_FORMATS = hd,hb
 
       $ make install-jquery-plugins  ## optional
 
           This will automatically download and install in the proper
           place jQuery and related plugins.  They are used for AJAX
           applications such as the record editor.
 
           Note that `unzip' is needed when installing jquery plugins.
 
       $ make install-fckeditor-plugin  ## optional
 
           This will automatically download and install in the proper
           place FCKeditor, a WYSIWYG Javascript-based editor (e.g. for
           the WebComment module).
 
           Note that in order to enable the editor you have to set the
           CFG_WEBCOMMENT_USE_FCKEDITOR to True.
 
       $ make install-pdfa-helper-files ## optional
 
           This will automatically download and install in the proper
           place the helper files needed to create PDF/A files out of
           existing PDF files.
 
 2b. Configuration
 -----------------
 
     Once the basic software installation is done, we proceed to
     configuring your Invenio system.
 
       $ sudo chown -R www-data.www-data /opt/invenio
 
           For the sake of simplicity, let us assume that your Invenio
           installation will run under the `www-data' user process
           identity.  The above command changes ownership of installed
           files to www-data, so that we shall run everything under
           this user identity from now on.
 
           For production purposes, you would typically enable Apache
           server to read all files from the installation place but to
           write only to the `var' subdirectory of your installation
           place.  You could achieve this by configuring Unix directory
           group permissions, for example.
 
       $ sudo -u www-data emacs /opt/invenio/etc/invenio-local.conf
 
           Customize your Invenio installation.  Please read the
           'invenio.conf' file located in the same directory that
           contains the vanilla default configuration parameters of
           your Invenio installation.  If you want to customize some of
           these parameters, you should create a file named
           'invenio-local.conf' in the same directory where
           'invenio.conf' lives and you should write there only the
           customizations that you want to be different from the
           vanilla defaults.
 
           Here is a realistic, minimalist, yet production-ready
           example of what you would typically put there:
 
              $ cat /opt/invenio/etc/invenio-local.conf
              [Invenio]
              CFG_SITE_NAME = John Doe's Document Server
              CFG_SITE_NAME_INTL_fr = Serveur des Documents de John Doe
              CFG_SITE_URL = http://your.site.com
              CFG_SITE_SECURE_URL = https://your.site.com
              CFG_SITE_ADMIN_EMAIL = john.doe@your.site.com
              CFG_SITE_SUPPORT_EMAIL = john.doe@your.site.com
              CFG_WEBALERT_ALERT_ENGINE_EMAIL = john.doe@your.site.com
              CFG_WEBCOMMENT_ALERT_ENGINE_EMAIL = john.doe@your.site.com
              CFG_WEBCOMMENT_DEFAULT_MODERATOR = john.doe@your.site.com
              CFG_DATABASE_HOST = localhost
              CFG_DATABASE_NAME = invenio
              CFG_DATABASE_USER = invenio
              CFG_DATABASE_PASS = my123p$ss
 
           You should override at least the parameters mentioned above
           in order to define some very essential runtime parameters
           such as the name of your document server (CFG_SITE_NAME and
           CFG_SITE_NAME_INTL_*), the visible URL of your document
           server (CFG_SITE_URL and CFG_SITE_SECURE_URL), the email
           address of the local Invenio administrator, comment
           moderator, and alert engine (CFG_SITE_SUPPORT_EMAIL,
           CFG_SITE_ADMIN_EMAIL, etc), and last but not least your
           database credentials (CFG_DATABASE_*).
 
           The Invenio system will then read both the default
           invenio.conf file and your customized invenio-local.conf
           file and it will override any default options with the ones
           you have specifield in your local file.  This cascading of
           configuration parameters will ease your future upgrades.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --update-all
 
           Make the rest of the Invenio system aware of your
           invenio-local.conf changes.  This step is mandatory each
           time you edit your conf files.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-tables
 
           If you are installing Invenio for the first time, you
           have to create database tables.
 
           Note that this step checks for potential problems such as
           the database connection rights and may ask you to perform
           some more administrative steps in case it detects a problem.
           Notably, it may ask you to set up database access
           permissions, based on your configure values.
 
           If you are installing Invenio for the first time, you
           have to create a dedicated database on your MySQL server
           that the Invenio can use for its purposes.  Please
           contact your MySQL administrator and ask him to execute the
           commands this step proposes you.
 
           At this point you should now have successfully completed the
           "make install" process.  We continue by setting up the
           Apache web server.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-webstat-conf
 
           Load the configuration file of webstat module. It will create
           the tables in the database for register customevents, such as
           basket hits.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-apache-conf
 
           Running this command will generate Apache virtual host
           configurations matching your installation.  You will be
           instructed to check created files (usually they are located
           under /opt/invenio/etc/apache/) and edit your httpd.conf
           to activate Invenio virtual hosts.
 
           If you are using Debian GNU/Linux ``Lenny'' or later, then
           you can do the following to create your SSL certificate and
           to activate your Invenio vhosts:
 
               ## make SSL certificate:
               $ sudo aptitude install ssl-cert
               $ sudo mkdir /etc/apache2/ssl
               $ sudo /usr/sbin/make-ssl-cert /usr/share/ssl-cert/ssleay.cnf \
                      /etc/apache2/ssl/apache.pem
 
               ## add Invenio web sites:
               $ sudo ln -s /opt/invenio/etc/apache/invenio-apache-vhost.conf \
                            /etc/apache2/sites-available/invenio
               $ sudo ln -s /opt/invenio/etc/apache/invenio-apache-vhost-ssl.conf \
                            /etc/apache2/sites-available/invenio-ssl
 
               ## disable Debian's default web site:
               $ sudo /usr/sbin/a2dissite default
 
               ## enable Invenio web sites:
               $ sudo /usr/sbin/a2ensite invenio
               $ sudo /usr/sbin/a2ensite invenio-ssl
 
               ## enable SSL module:
               $ sudo /usr/sbin/a2enmod ssl
 
               ## if you are using xsendfile module, enable it too:
               $ sudo /usr/sbin/a2enmod xsendfile
 
           If you are using another operating system, you should do the
           equivalent, for example edit your system-wide httpd.conf and
           put the following include statements:
 
              Include /opt/invenio/etc/apache/invenio-apache-vhost.conf
              Include /opt/invenio/etc/apache/invenio-apache-vhost-ssl.conf
 
           Note that you may need to adapt generated vhost file
           snippets to match your concrete operating system specifics.
           For example, the generated configuration snippet will
           preload Invenio WSGI daemon application upon Apache start up
           for faster site response.  The generated configuration
           assumes that you are using mod_wsgi version 3 or later.  If
           you are using the old legacy mod_wsgi version 2, then you
           would need to comment out the WSGIImportScript directive
           from the generated snippet, or else move the WSGI daemon
           setup to the top level, outside of the VirtualHost section.
 
           Note also that you may want to tweak the generated Apache
           vhost snippet for performance reasons, especially with
           respect to WSGIDaemonProcess parameters.  For example, you
           can increase the number of processes from the default value
           `processes=5' if you have lots of RAM and if many concurrent
           users may access your site in parallel.  However, note that
           you must use `threads=1' there, because Invenio WSGI daemon
           processes are not fully thread safe yet.  This may change in
           the future.
 
       $ sudo /etc/init.d/apache2 restart
 
           Please ask your webserver administrator to restart the
           Apache server after the above "httpd.conf" changes.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-demo-site
 
           This step is recommended to test your local Invenio
           installation.  It should give you our "Atlantis Institute of
           Science" demo installation, exactly as you see it at
           <http://invenio-demo.cern.ch/>.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-demo-records
 
           Optionally, load some demo records to be able to test
           indexing and searching of your local Invenio demo
           installation.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-unit-tests
 
           Optionally, you can run the unit test suite to verify the
           unit behaviour of your local Invenio installation.  Note
           that this command should be run only after you have
           installed the whole system via `make install'.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-regression-tests
 
            Optionally, you can run the full regression test suite to
            verify the functional behaviour of your local Invenio
            installation.  Note that this command requires to have
            created the demo site and loaded the demo records.  Note
            also that running the regression test suite may alter the
            database content with junk data, so that rebuilding the
            demo site is strongly recommended afterwards.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-web-tests
 
            Optionally, you can run additional automated web tests
            running in a real browser.  This requires to have Firefox
            with the Selenium IDE extension installed.
            <http://en.www.mozilla.com/en/firefox/>
            <http://selenium-ide.openqa.org/>
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --remove-demo-records
 
           Optionally, remove the demo records loaded in the previous
           step, but keeping otherwise the demo collection, submission,
           format, and other configurations that you may reuse and
           modify for your own production purposes.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --drop-demo-site
 
           Optionally, drop also all the demo configuration so that
           you'll end up with a completely blank Invenio system.
           However, you may want to find it more practical not to drop
           the demo site configuration but to start customizing from
           there.
 
       $ firefox http://your.site.com/help/admin/howto-run
 
           In order to start using your Invenio installation, you
           can start indexing, formatting and other daemons as
           indicated in the "HOWTO Run" guide on the above URL.  You
           can also use the Admin Area web interfaces to perform
           further runtime configurations such as the definition of
           data collections, document types, document formats, word
           indexes, etc.
 
       $ sudo ln -s /opt/invenio/etc/bash_completion.d/inveniocfg \
                    /etc/bash_completion.d/inveniocfg
 
            Optionally, if you are using Bash shell completion, then
            you may want to create the above symlink in order to
            configure completion for the inveniocfg command.
 
 Good luck, and thanks for choosing Invenio.
 
        - Invenio Development Team
          <info@invenio-software.org>
          <http://invenio-software.org/>
diff --git a/configure-tests.py b/configure-tests.py
index fb8c813b6..87419e3aa 100644
--- a/configure-tests.py
+++ b/configure-tests.py
@@ -1,368 +1,343 @@
 ## This file is part of Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """
 Test the suitability of Python core and the availability of various
 Python modules for running Invenio.  Warn the user if there are
 eventual troubles.  Exit status: 0 if okay, 1 if not okay.  Useful for
 running from configure.ac.
 """
 
 ## minimally recommended/required versions:
 cfg_min_python_version = "2.4"
 cfg_max_python_version = "2.9.9999"
 cfg_min_mysqldb_version = "1.2.1_p2"
 
 ## 0) import modules needed for this testing:
 import string
 import sys
 import getpass
 
 def wait_for_user(msg):
     """Print MSG and prompt user for confirmation."""
     try:
         raw_input(msg)
     except KeyboardInterrupt:
         print "\n\nInstallation aborted."
         sys.exit(1)
     except EOFError:
         print " (continuing in batch mode)"
         return
 
 ## 1) check Python version:
 if sys.version < cfg_min_python_version:
     print """
     *******************************************************
     ** ERROR: TOO OLD PYTHON DETECTED: %s
     *******************************************************
     ** You seem to be using a too old version of Python. **
     ** You must use at least Python %s.                 **
     **                                                   **
     ** Note that if you have more than one Python        **
     ** installed on your system, you can specify the     **
     ** --with-python configuration option to choose      **
     ** a specific (e.g. non system wide) Python binary.  **
     **                                                   **
     ** Please upgrade your Python before continuing.     **
     *******************************************************
     """ % (string.replace(sys.version, "\n", ""), cfg_min_python_version)
     sys.exit(1)
 if sys.version > cfg_max_python_version:
     print """
     *******************************************************
     ** ERROR: TOO NEW PYTHON DETECTED: %s
     *******************************************************
     ** You seem to be using a too new version of Python. **
     ** You must use at most Python %s.             **
     **                                                   **
     ** Perhaps you have downloaded and are installing an **
     ** old Invenio version?  Please look for more recent **
     ** Invenio version or please contact the development **
     ** team at <info@invenio-software.org> about this    **
     ** problem.                                          **
     **                                                   **
     ** Installation aborted.                             **
     *******************************************************
     """ % (string.replace(sys.version, "\n", ""), cfg_max_python_version)
     sys.exit(1)
 
 ## 2) check for required modules:
 try:
     import MySQLdb
     import base64
     import cPickle
     import cStringIO
     import cgi
     import copy
     import fileinput
     import getopt
     import sys
     if sys.hexversion < 0x2060000:
         import md5
     else:
         import hashlib
     import marshal
     import os
     import signal
     import tempfile
     import time
     import traceback
     import unicodedata
     import urllib
     import zlib
     import wsgiref
 except ImportError, msg:
     print """
     *************************************************
     ** IMPORT ERROR %s
     *************************************************
     ** Perhaps you forgot to install some of the   **
     ** prerequisite Python modules?  Please look   **
     ** at our INSTALL file for more details and    **
     ** fix the problem before continuing!          **
     *************************************************
     """ % msg
     sys.exit(1)
 
 ## 3) check for recommended modules:
-try:
-    if (2**31 - 1) == sys.maxint:
-        # check for Psyco since we seem to run in 32-bit environment
-        import psyco
-    else:
-        # no need to advise on Psyco on 64-bit systems
-        pass
-except ImportError, msg:
-    print """
-    *****************************************************
-    ** IMPORT WARNING %s
-    *****************************************************
-    ** Note that Psyco is not really required but we   **
-    ** recommend it for faster Invenio operation   **
-    ** if you are running in 32-bit operating system.  **
-    **                                                 **
-    ** You can safely continue installing Invenio  **
-    ** now, and add this module anytime later.  (I.e.  **
-    ** even after your Invenio installation is put **
-    ** into production.)                               **
-    *****************************************************
-    """ % msg
-
-    wait_for_user("Press ENTER to continue the installation...")
-
 try:
     import rdflib
 except ImportError, msg:
     print """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that rdflib is needed only if you plan     **
     ** to work with the automatic classification of    **
     ** documents based on RDF-based taxonomies.        **
     **                                                 **
     ** You can safely continue installing Invenio  **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put **
     ** into production.)                               **
     *****************************************************
     """ % msg
     wait_for_user("Press ENTER to continue the installation...")
 
 try:
     import pyRXP
 except ImportError, msg:
     print """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that PyRXP is not really required but      **
     ** we recommend it for fast XML MARC parsing.      **
     **                                                 **
     ** You can safely continue installing Invenio  **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put **
     ** into production.)                               **
     *****************************************************
     """ % msg
     wait_for_user("Press ENTER to continue the installation...")
 
 try:
     import dateutil
 except ImportError, msg:
     print """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that dateutil is not really required but   **
     ** we recommend it for user-friendly date          **
     ** parsing.                                        **
     **                                                 **
     ** You can safely continue installing Invenio  **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put **
     ** into production.)                               **
     *****************************************************
     """ % msg
     wait_for_user("Press ENTER to continue the installation...")
 
 try:
     import libxml2
 except ImportError, msg:
     print """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that libxml2 is not really required but    **
     ** we recommend it for XML metadata conversions    **
     ** and for fast XML parsing.                       **
     **                                                 **
     ** You can safely continue installing Invenio  **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put **
     ** into production.)                               **
     *****************************************************
     """ % msg
     wait_for_user("Press ENTER to continue the installation...")
 
 try:
     import libxslt
 except ImportError, msg:
     print """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that libxslt is not really required but    **
     ** we recommend it for XML metadata conversions.   **
     **                                                 **
     ** You can safely continue installing Invenio  **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put **
     ** into production.)                               **
     *****************************************************
     """ % msg
     wait_for_user("Press ENTER to continue the installation...")
 
 try:
     import Gnuplot
 except ImportError, msg:
     print """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that Gnuplot.py is not really required but **
     ** we recommend it in order to have nice download  **
     ** and citation history graphs on Detailed record  **
     ** pages.                                          **
     **                                                 **
     ** You can safely continue installing Invenio  **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put **
     ** into production.)                               **
     *****************************************************
     """ % msg
     wait_for_user("Press ENTER to continue the installation...")
 
 try:
     import magic
     if not hasattr(magic, "open"):
         raise StandardError
 except ImportError, msg:
     print """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that magic module is not really required   **
     ** but we recommend it in order to have detailed   **
     ** content information about fulltext files.       **
     **                                                 **
     ** You can safely continue installing Invenio  **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put **
     ** into production.)                               **
     *****************************************************
     """ % msg
 except StandardError:
     print """
     *****************************************************
     ** IMPORT WARNING python-magic
     *****************************************************
     ** The python-magic package you installed is not   **
     ** the one supported by Invenio. Please refer to   **
     ** the INSTALL file for more details.              **
     **                                                 **
     ** You can safely continue installing Invenio  **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put **
     ** into production.)                               **
     *****************************************************
     """
 
 try:
     import reportlab
 except ImportError, msg:
     print """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that reportlab module is not really        **
     ** required, but we recommend it you want to       **
     ** enrich PDF with OCR information.                **
     **                                                 **
     ** You can safely continue installing Invenio  **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put **
     ** into production.)                               **
     *****************************************************
     """ % msg
     wait_for_user("Press ENTER to continue the installation...")
 
 ## 4) check for versions of some important modules:
 if MySQLdb.__version__ < cfg_min_mysqldb_version:
     print """
     *****************************************************
     ** ERROR: PYTHON MODULE MYSQLDB %s DETECTED
     *****************************************************
     ** You have to upgrade your MySQLdb to at least    **
     ** version %s.  You must fix this problem    **
     ** before continuing.  Please see the INSTALL file **
     ** for more details.                               **
     *****************************************************
     """ % (MySQLdb.__version__, cfg_min_mysqldb_version)
     sys.exit(1)
 
 try:
     import Stemmer
     try:
         from Stemmer import algorithms
     except ImportError, msg:
         print """
         *****************************************************
         ** ERROR: STEMMER MODULE PROBLEM %s
         *****************************************************
         ** Perhaps you are using an old Stemmer version?   **
         ** You must either remove your old Stemmer or else **
         ** upgrade to Snowball Stemmer
         **   <http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz>
         ** before continuing.  Please see the INSTALL file **
         ** for more details.                               **
         *****************************************************
         """ % (msg)
         sys.exit(1)
 except ImportError:
     pass # no prob, Stemmer is optional
 
 ## 5) check for Python.h (needed for intbitset):
 try:
     from distutils.sysconfig import get_python_inc
     path_to_python_h = get_python_inc() + os.sep + 'Python.h'
     if not os.path.exists(path_to_python_h):
         raise StandardError, "Cannot find %s" % path_to_python_h
 except StandardError, msg:
     print """
     *****************************************************
     ** ERROR: PYTHON HEADER FILE ERROR %s
     *****************************************************
     ** You do not seem to have Python developer files  **
     ** installed (such as Python.h).  Some operating   **
     ** systems provide these in a separate Python      **
     ** package called python-dev or python-devel.      **
     ** You must install such a package before          **
     ** continuing the installation process.            **
     *****************************************************
     """ % (msg)
     sys.exit(1)
diff --git a/modules/bibedit/lib/bibrecord.py b/modules/bibedit/lib/bibrecord.py
index 20dc3463f..f3080d958 100644
--- a/modules/bibedit/lib/bibrecord.py
+++ b/modules/bibedit/lib/bibrecord.py
@@ -1,1540 +1,1523 @@
 # -*- coding: utf-8 -*-
 ##
 ## This file is part of Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """BibRecord - XML MARC processing library for Invenio.
 
 For API, see create_record(), record_get_field_instances() and friends
 in the source code of this file in the section entitled INTERFACE.
 
 Note: Does not access the database, the input is MARCXML only."""
 
 ### IMPORT INTERESTING MODULES AND XML PARSERS
 
 import re
 import sys
-try:
-    import psyco
-    PSYCO_AVAILABLE = True
-except ImportError:
-    PSYCO_AVAILABLE = False
 
 if sys.hexversion < 0x2040000:
     # pylint: disable=W0622
     from sets import Set as set
     # pylint: enable=W0622
 
 from invenio.bibrecord_config import CFG_MARC21_DTD, \
     CFG_BIBRECORD_WARNING_MSGS, CFG_BIBRECORD_DEFAULT_VERBOSE_LEVEL, \
     CFG_BIBRECORD_DEFAULT_CORRECT, CFG_BIBRECORD_PARSERS_AVAILABLE, \
     InvenioBibRecordParserError, InvenioBibRecordFieldError
 from invenio.config import CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG
 from invenio.textutils import encode_for_xml
 
 # Some values used for the RXP parsing.
 TAG, ATTRS, CHILDREN = 0, 1, 2
 
 # Find out about the best usable parser:
 AVAILABLE_PARSERS = []
 
 # Do we remove singletons (empty tags)?
 # NOTE: this is currently set to True as there are some external workflow
 # exploiting singletons, e.g. bibupload -c used to delete fields, and
 # bibdocfile --fix-marc called on a record where the latest document
 # has been deleted.
 CFG_BIBRECORD_KEEP_SINGLETONS = True
 
 try:
     import pyRXP
     if 'pyrxp' in CFG_BIBRECORD_PARSERS_AVAILABLE:
         AVAILABLE_PARSERS.append('pyrxp')
 except ImportError:
     pass
 
 try:
     import Ft.Xml.Domlette
     if '4suite' in CFG_BIBRECORD_PARSERS_AVAILABLE:
         AVAILABLE_PARSERS.append('4suite')
 except ImportError:
     pass
 
 try:
     import xml.dom.minidom
     import xml.parsers.expat
     if 'minidom' in CFG_BIBRECORD_PARSERS_AVAILABLE:
         AVAILABLE_PARSERS.append('minidom')
 except ImportError:
     pass
 
 ### INTERFACE / VISIBLE FUNCTIONS
 
 def create_field(subfields=None, ind1=' ', ind2=' ', controlfield_value='',
     global_position=-1):
     """
     Returns a field created with the provided elements. Global position is
     set arbitrary to -1."""
     if subfields is None:
         subfields = []
 
     ind1, ind2 = _wash_indicators(ind1, ind2)
     field = (subfields, ind1, ind2, controlfield_value, global_position)
     _check_field_validity(field)
     return field
 
 def create_records(marcxml, verbose=CFG_BIBRECORD_DEFAULT_VERBOSE_LEVEL,
     correct=CFG_BIBRECORD_DEFAULT_CORRECT, parser='',
     keep_singletons=CFG_BIBRECORD_KEEP_SINGLETONS):
     """Creates a list of records from the marcxml description. Returns a
     list of objects initiated by the function create_record(). Please
     see that function's docstring."""
     # Use the DOTALL flag to include newlines.
     regex = re.compile('<record.*?>.*?</record>', re.DOTALL)
     record_xmls = regex.findall(marcxml)
 
     return [create_record(record_xml, verbose=verbose, correct=correct,
             parser=parser, keep_singletons=keep_singletons) for record_xml in record_xmls]
 
 def create_record(marcxml, verbose=CFG_BIBRECORD_DEFAULT_VERBOSE_LEVEL,
     correct=CFG_BIBRECORD_DEFAULT_CORRECT, parser='',
     sort_fields_by_indicators=False,
     keep_singletons=CFG_BIBRECORD_KEEP_SINGLETONS):
     """Creates a record object from the marcxml description.
 
     Uses the best parser available in CFG_BIBRECORD_PARSERS_AVAILABLE or
     the parser specified.
 
     The returned object is a tuple (record, status_code, list_of_errors),
     where status_code is 0 when there are errors, 1 when no errors.
 
     The return record structure is as follows:
     Record := {tag : [Field]}
     Field := (Subfields, ind1, ind2, value)
     Subfields := [(code, value)]
 
     For example:
                                 ______
                                |record|
                                 ------
         __________________________|_______________________________________
        |record['001']             |record['909']           |record['520'] |
        |                          |                        |              |
 [list of fields]             [list of fields]       [list of fields]     ...
        |                    ______|______________          |
        |[0]                |[0]          |[1]    |         |[0]
     ___|_____         _____|___       ___|_____ ...    ____|____
    |Field 001|       |Field 909|     |Field 909|      |Field 520|
     ---------         ---------       ---------        ---------
      |     _______________|_________________    |             |
     ...   |[0]            |[1]    |[2]      |  ...           ...
           |               |       |         |
     [list of subfields]  'C'     '4'
        ___|__________________________________________
        |                    |                        |
 ('a', 'value') ('b', 'value for subfield b') ('a', 'value for another a')
 
     @param marcxml: an XML string representation of the record to create
     @param verbose: the level of verbosity: 0 (silent), 1-2 (warnings),
         3(strict:stop when errors)
     @param correct: 1 to enable correction of marcxml syntax. Else 0.
     @return: a tuple (record, status_code, list_of_errors), where status
         code is 0 where there are errors, 1 when no errors"""
     # Select the appropriate parser.
     parser = _select_parser(parser)
 
     try:
         if parser == 'pyrxp':
             rec = _create_record_rxp(marcxml, verbose, correct,
                 keep_singletons=keep_singletons)
         elif parser == '4suite':
             rec = _create_record_4suite(marcxml,
                 keep_singletons=keep_singletons)
         elif parser == 'minidom':
             rec = _create_record_minidom(marcxml,
                 keep_singletons=keep_singletons)
     except InvenioBibRecordParserError, ex1:
         return (None, 0, str(ex1))
 
 #   _create_record = {
 #       'pyrxp': _create_record_rxp,
 #       '4suite': _create_record_4suite,
 #       'minidom': _create_record_minidom,
 #       }
 
 #   try:
 #       rec = _create_record[parser](marcxml, verbose)
 #   except InvenioBibRecordParserError, ex1:
 #       return (None, 0, str(ex1))
 
     if sort_fields_by_indicators:
         _record_sort_by_indicators(rec)
 
     errs = []
     if correct:
         # Correct the structure of the record.
         errs = _correct_record(rec)
 
     return (rec, int(not errs), errs)
 
 def record_get_field_instances(rec, tag="", ind1=" ", ind2=" "):
     """Returns the list of field instances for the specified tag and
     indicators of the record (rec).
 
     Returns empty list if not found.
     If tag is empty string, returns all fields
 
     Parameters (tag, ind1, ind2) can contain wildcard %.
 
     @param rec: a record structure as returned by create_record()
     @param tag: a 3 characters long string
     @param ind1: a 1 character long string
     @param ind2: a 1 character long string
     @param code: a 1 character long string
     @return: a list of field tuples (Subfields, ind1, ind2, value,
         field_position_global) where subfields is list of (code, value)"""
     if not rec:
         return []
     if not tag:
         return rec.items()
     else:
         out = []
         ind1, ind2 = _wash_indicators(ind1, ind2)
 
         if '%' in tag:
             # Wildcard in tag. Check all possible
             for field_tag in rec:
                 if _tag_matches_pattern(field_tag, tag):
                     for possible_field_instance in rec[field_tag]:
                         if (ind1 in ('%', possible_field_instance[1]) and
                             ind2 in ('%', possible_field_instance[2])):
                             out.append(possible_field_instance)
         else:
             # Completely defined tag. Use dict
             for possible_field_instance in rec.get(tag, []):
                 if (ind1 in ('%', possible_field_instance[1]) and
                     ind2 in ('%', possible_field_instance[2])):
                     out.append(possible_field_instance)
         return out
 
 def record_add_field(rec, tag, ind1=' ', ind2=' ', controlfield_value='',
     subfields=None, field_position_global=None, field_position_local=None):
     """
     Adds a new field into the record.
     If field_position_global or field_position_local is specified then
     this method will insert the new field at the desired position.
     Otherwise a global field position will be computed in order to
     insert the field at the best position (first we try to keep the
     order of the tags and then we insert the field at the end of the
     fields with the same tag).
 
     If both field_position_global and field_position_local are present,
     then field_position_local takes precedence.
 
     @param rec: the record data structure
     @param tag: the tag of the field to be added
     @param ind1: the first indicator
     @param ind2: the second indicator
     @param controlfield_value: the value of the controlfield
     @param subfields: the subfields (a list of tuples (code, value))
     @param field_position_global: the global field position (record wise)
     @param field_position_local: the local field position (tag wise)
     @return: the global field position of the newly inserted field or -1 if the
         operation failed
     """
     error = validate_record_field_positions_global(rec)
     if error:
         # FIXME one should write a message here
         pass
 
     # Clean the parameters.
     if subfields is None:
         subfields = []
     ind1, ind2 = _wash_indicators(ind1, ind2)
 
     if controlfield_value and (ind1 != ' ' or ind2 != ' ' or subfields):
         return -1
 
     # Detect field number to be used for insertion:
     # Dictionaries for uniqueness.
     tag_field_positions_global = {}.fromkeys([field[4]
                                               for field in rec.get(tag, [])])
     all_field_positions_global = {}.fromkeys([field[4]
                                               for fields in rec.values()
                                               for field in fields])
 
     if field_position_global is None and field_position_local is None:
         # Let's determine the global field position of the new field.
         if tag in rec:
             try:
                 field_position_global = max([field[4] for field in rec[tag]]) \
                     + 1
             except IndexError:
                 if tag_field_positions_global:
                     field_position_global = max(tag_field_positions_global) + 1
                 elif all_field_positions_global:
                     field_position_global = max(all_field_positions_global) + 1
                 else:
                     field_position_global = 1
         else:
             if tag in ('FMT', 'FFT'):
                 # Add the new tag to the end of the record.
                 if tag_field_positions_global:
                     field_position_global = max(tag_field_positions_global) + 1
                 elif all_field_positions_global:
                     field_position_global = max(all_field_positions_global) + 1
                 else:
                     field_position_global = 1
             else:
                 # Insert the tag in an ordered way by selecting the
                 # right global field position.
                 immediate_lower_tag = '000'
                 for rec_tag in rec:
                     if (tag not in ('FMT', 'FFT') and
                         immediate_lower_tag < rec_tag < tag):
                         immediate_lower_tag = rec_tag
 
                 if immediate_lower_tag == '000':
                     field_position_global = 1
                 else:
                     field_position_global = rec[immediate_lower_tag][-1][4] + 1
 
         field_position_local = len(rec.get(tag, []))
         _shift_field_positions_global(rec, field_position_global, 1)
     elif field_position_local is not None:
         if tag in rec:
             if field_position_local >= len(rec[tag]):
                 field_position_global = rec[tag][-1][4] + 1
             else:
                 field_position_global = rec[tag][field_position_local][4]
             _shift_field_positions_global(rec, field_position_global, 1)
         else:
             if all_field_positions_global:
                 field_position_global = max(all_field_positions_global) + 1
             else:
                 # Empty record.
                 field_position_global = 1
     elif field_position_global is not None:
         # If the user chose an existing global field position, shift all the
         # global field positions greater than the input global field position.
         if tag not in rec:
             if all_field_positions_global:
                 field_position_global = max(all_field_positions_global) + 1
             else:
                 field_position_global = 1
             field_position_local = 0
         elif field_position_global < min(tag_field_positions_global):
             field_position_global = min(tag_field_positions_global)
             _shift_field_positions_global(rec, min(tag_field_positions_global),
                 1)
             field_position_local = 0
         elif field_position_global > max(tag_field_positions_global):
             field_position_global = max(tag_field_positions_global) + 1
             _shift_field_positions_global(rec,
                 max(tag_field_positions_global) + 1, 1)
             field_position_local = len(rec.get(tag, []))
         else:
             if field_position_global in tag_field_positions_global:
                 _shift_field_positions_global(rec, field_position_global, 1)
 
             field_position_local = 0
             for position, field in enumerate(rec[tag]):
                 if field[4] == field_position_global + 1:
                     field_position_local = position
 
     # Create the new field.
     newfield = (subfields, ind1, ind2, str(controlfield_value),
         field_position_global)
     rec.setdefault(tag, []).insert(field_position_local, newfield)
 
     # Return new field number:
     return field_position_global
 
 def record_has_field(rec, tag):
     """
     Checks if the tag exists in the record.
 
     @param rec: the record data structure
     @param the: field
     @return: a boolean
     """
     return tag in rec
 
 def record_delete_field(rec, tag, ind1=' ', ind2=' ',
     field_position_global=None, field_position_local=None):
     """
     If global field position is specified, deletes the field with the
     corresponding global field position.
     If field_position_local is specified, deletes the field with the
     corresponding local field position and tag.
     Else deletes all the fields matching tag and optionally ind1 and
     ind2.
 
     If both field_position_global and field_position_local are present,
     then field_position_local takes precedence.
 
     @param rec: the record data structure
     @param tag: the tag of the field to be deleted
     @param ind1: the first indicator of the field to be deleted
     @param ind2: the second indicator of the field to be deleted
     @param field_position_global: the global field position (record wise)
     @param field_position_local: the local field position (tag wise)
     @return: the list of deleted fields
     """
     error = validate_record_field_positions_global(rec)
     if error:
         # FIXME one should write a message here.
         pass
 
     if tag not in rec:
         return False
 
     ind1, ind2 = _wash_indicators(ind1, ind2)
 
     deleted = []
     newfields = []
 
     if field_position_global is None and field_position_local is None:
         # Remove all fields with tag 'tag'.
         for field in rec[tag]:
             if field[1] != ind1 or field[2] != ind2:
                 newfields.append(field)
             else:
                 deleted.append(field)
         rec[tag] = newfields
     elif field_position_global is not None:
         # Remove the field with 'field_position_global'.
         for field in rec[tag]:
             if (field[1] != ind1 and field[2] != ind2 or
                 field[4] != field_position_global):
                 newfields.append(field)
             else:
                 deleted.append(field)
         rec[tag] = newfields
     elif field_position_local is not None:
         # Remove the field with 'field_position_local'.
         try:
             del rec[tag][field_position_local]
         except IndexError:
             return []
 
     if not rec[tag]:
         # Tag is now empty, remove it.
         del rec[tag]
 
     return deleted
 
 def record_delete_fields(rec, tag, field_positions_local=None):
     """
     Delete all/some fields defined with MARC tag 'tag' from record 'rec'.
 
     @param rec: a record structure.
     @type rec: tuple
     @param tag: three letter field.
     @type tag: string
     @param field_position_local: if set, it is the list of local positions
         within all the fields with the specified tag, that should be deleted.
         If not set all the fields with the specified tag will be deleted.
     @type field_position_local: sequence
     @return: the list of deleted fields.
     @rtype: list
     @note: the record is modified in place.
     """
     if tag not in rec:
         return []
 
     new_fields, deleted_fields = [], []
 
     for position, field in enumerate(rec.get(tag, [])):
         if field_positions_local is None or position in field_positions_local:
             deleted_fields.append(field)
         else:
             new_fields.append(field)
 
     if new_fields:
         rec[tag] = new_fields
     else:
         del rec[tag]
 
     return deleted_fields
 
 def record_add_fields(rec, tag, fields, field_position_local=None,
     field_position_global=None):
     """
     Adds the fields into the record at the required position. The
     position is specified by the tag and the field_position_local in
     the list of fields.
 
     @param rec: a record structure
     @param tag: the tag of the fields
     to be moved
     @param field_position_local: the field_position_local to which the
     field will be inserted. If not specified, appends the fields to
     the tag.
     @param a: list of fields to be added
     @return: -1 if the operation failed, or the field_position_local
     if it was successful
     """
     if field_position_local is None and field_position_global is None:
         for field in fields:
             record_add_field(rec, tag, ind1=field[1],
                 ind2=field[2], subfields=field[0],
                 controlfield_value=field[3])
     else:
         fields.reverse()
         for field in fields:
             record_add_field(rec, tag, ind1=field[1], ind2=field[2],
                 subfields=field[0], controlfield_value=field[3],
                 field_position_local=field_position_local,
                 field_position_global=field_position_global)
 
     return field_position_local
 
 def record_move_fields(rec, tag, field_positions_local,
     field_position_local=None):
     """
     Moves some fields to the position specified by
     'field_position_local'.
 
     @param rec: a record structure as returned by create_record()
     @param tag: the tag of the fields to be moved
     @param field_positions_local: the positions of the
     fields to move
     @param field_position_local: insert the field before that
     field_position_local. If unspecified, appends the fields
     @return: the field_position_local is the operation was successful
     """
     fields = record_delete_fields(rec, tag,
         field_positions_local=field_positions_local)
     return record_add_fields(rec, tag, fields,
         field_position_local=field_position_local)
 
 def record_delete_subfield(rec, tag, subfield_code, ind1=' ', ind2=' '):
     """Deletes all subfields with subfield_code in the record."""
     ind1, ind2 = _wash_indicators(ind1, ind2)
 
     for field in rec.get(tag, []):
         if field[1] == ind1 and field[2] == ind2:
             field[0][:] = [subfield for subfield in field[0]
                         if subfield_code != subfield[0]]
 
 def record_get_field(rec, tag, field_position_global=None,
     field_position_local=None):
     """
     Returns the the matching field. One has to enter either a global
     field position or a local field position.
 
     @return: a list of subfield tuples (subfield code, value).
     @rtype:  list
     """
     if field_position_global is None and field_position_local is None:
         raise InvenioBibRecordFieldError("A field position is required to "
             "complete this operation.")
     elif field_position_global is not None and field_position_local is not None:
         raise InvenioBibRecordFieldError("Only one field position is required "
             "to complete this operation.")
     elif field_position_global:
         if not tag in rec:
             raise InvenioBibRecordFieldError("No tag '%s' in record." % tag)
 
         for field in rec[tag]:
             if field[4] == field_position_global:
                 return field
         raise InvenioBibRecordFieldError("No field has the tag '%s' and the "
             "global field position '%d'." % (tag, field_position_global))
     else:
         try:
             return rec[tag][field_position_local]
         except KeyError:
             raise InvenioBibRecordFieldError("No tag '%s' in record." % tag)
         except IndexError:
             raise InvenioBibRecordFieldError("No field has the tag '%s' and "
                 "the local field position '%d'." % (tag, field_position_local))
 
 def record_replace_field(rec, tag, new_field, field_position_global=None,
     field_position_local=None):
     """Replaces a field with a new field."""
     if field_position_global is None and field_position_local is None:
         raise InvenioBibRecordFieldError("A field position is required to "
             "complete this operation.")
     elif field_position_global is not None and field_position_local is not None:
         raise InvenioBibRecordFieldError("Only one field position is required "
             "to complete this operation.")
     elif field_position_global:
         if not tag in rec:
             raise InvenioBibRecordFieldError("No tag '%s' in record." % tag)
 
         replaced = False
         for position, field in enumerate(rec[tag]):
             if field[4] == field_position_global:
                 rec[tag][position] = new_field
                 replaced = True
 
         if not replaced:
             raise InvenioBibRecordFieldError("No field has the tag '%s' and "
                 "the global field position '%d'." %
                 (tag, field_position_global))
     else:
         try:
             rec[tag][field_position_local] = new_field
         except KeyError:
             raise InvenioBibRecordFieldError("No tag '%s' in record." % tag)
         except IndexError:
             raise InvenioBibRecordFieldError("No field has the tag '%s' and "
                 "the local field position '%d'." % (tag, field_position_local))
 
 def record_get_subfields(rec, tag, field_position_global=None,
     field_position_local=None):
     """
     Returns the subfield of the matching field. One has to enter either a
     global field position or a local field position.
 
     @return: a list of subfield tuples (subfield code, value).
     @rtype:  list
     """
     field = record_get_field(rec, tag,
         field_position_global=field_position_global,
         field_position_local=field_position_local)
 
     return field[0]
 
 def record_delete_subfield_from(rec, tag, subfield_position,
     field_position_global=None, field_position_local=None):
     """Delete subfield from position specified by tag, field number and
     subfield position."""
     subfields = record_get_subfields(rec, tag,
         field_position_global=field_position_global,
         field_position_local=field_position_local)
 
     try:
         del subfields[subfield_position]
     except IndexError:
         from invenio.xmlmarc2textmarc import create_marc_record
         recordMarc = create_marc_record(rec, 0, {"text-marc": 1, "aleph-marc": 0})
         raise InvenioBibRecordFieldError("The record : %(recordCode)s does not contain the subfield "
             "'%(subfieldIndex)s' inside the field (local: '%(fieldIndexLocal)s, global: '%(fieldIndexGlobal)s' ) of tag '%(tag)s'." % \
             {"subfieldIndex" : subfield_position, \
              "fieldIndexLocal" : str(field_position_local), \
              "fieldIndexGlobal" : str(field_position_global), \
              "tag" : tag, \
              "recordCode" : recordMarc})
     if not subfields:
         if field_position_global is not None:
             for position, field in enumerate(rec[tag]):
                 if field[4] == field_position_global:
                     del rec[tag][position]
         else:
             del rec[tag][field_position_local]
 
         if not rec[tag]:
             del rec[tag]
 
 def record_add_subfield_into(rec, tag, subfield_code, value,
     subfield_position=None, field_position_global=None,
     field_position_local=None):
     """Add subfield into position specified by tag, field number and
     optionally by subfield position."""
     subfields = record_get_subfields(rec, tag,
         field_position_global=field_position_global,
         field_position_local=field_position_local)
 
     if subfield_position is None:
         subfields.append((subfield_code, value))
     else:
         subfields.insert(subfield_position, (subfield_code, value))
 
 def record_modify_controlfield(rec, tag, controlfield_value,
     field_position_global=None, field_position_local=None):
     """Modify controlfield at position specified by tag and field number."""
     field = record_get_field(rec, tag,
         field_position_global=field_position_global,
         field_position_local=field_position_local)
 
     new_field = (field[0], field[1], field[2], controlfield_value, field[4])
 
     record_replace_field(rec, tag, new_field,
         field_position_global=field_position_global,
         field_position_local=field_position_local)
 
 def record_modify_subfield(rec, tag, subfield_code, value, subfield_position,
     field_position_global=None, field_position_local=None):
     """Modify subfield at position specified by tag, field number and
     subfield position."""
     subfields = record_get_subfields(rec, tag,
         field_position_global=field_position_global,
         field_position_local=field_position_local)
 
     try:
         subfields[subfield_position] = (subfield_code, value)
     except IndexError:
         raise InvenioBibRecordFieldError("There is no subfield with position "
             "'%d'." % subfield_position)
 
 def record_move_subfield(rec, tag, subfield_position, new_subfield_position,
     field_position_global=None, field_position_local=None):
     """Move subfield at position specified by tag, field number and
     subfield position to new subfield position."""
     subfields = record_get_subfields(rec, tag,
         field_position_global=field_position_global,
         field_position_local=field_position_local)
 
     try:
         subfield = subfields.pop(subfield_position)
         subfields.insert(new_subfield_position, subfield)
     except IndexError:
         raise InvenioBibRecordFieldError("There is no subfield with position "
             "'%d'." % subfield_position)
 
 def record_get_field_value(rec, tag, ind1=" ", ind2=" ", code=""):
     """Returns first (string) value that matches specified field
     (tag, ind1, ind2, code) of the record (rec).
 
     Returns empty string if not found.
 
     Parameters (tag, ind1, ind2, code) can contain wildcard %.
 
     Difference between wildcard % and empty '':
 
     - Empty char specifies that we are not interested in a field which
       has one of the indicator(s)/subfield specified.
 
     - Wildcard specifies that we are interested in getting the value
       of the field whatever the indicator(s)/subfield is.
 
     For e.g. consider the following record in MARC:
       100C5  $$a val1
       555AB  $$a val2
       555AB      val3
       555    $$a val4
       555A       val5
 
       >> record_get_field_value(record, '555', 'A', '', '')
       >> "val5"
       >> record_get_field_value(record, '555', 'A', '%', '')
       >> "val3"
       >> record_get_field_value(record, '555', 'A', '%', '%')
       >> "val2"
       >> record_get_field_value(record, '555', 'A', 'B', '')
       >> "val3"
       >> record_get_field_value(record, '555', '', 'B', 'a')
       >> ""
       >> record_get_field_value(record, '555', '', '', 'a')
       >> "val4"
       >> record_get_field_value(record, '555', '', '', '')
       >> ""
       >> record_get_field_value(record, '%%%', '%', '%', '%')
       >> "val1"
 
     @param rec: a record structure as returned by create_record()
     @param tag: a 3 characters long string
     @param ind1: a 1 character long string
     @param ind2: a 1 character long string
     @param code: a 1 character long string
     @return: string value (empty if nothing found)"""
     # Note: the code is quite redundant for speed reasons (avoid calling
     # functions or doing tests inside loops)
     ind1, ind2 = _wash_indicators(ind1, ind2)
 
     if '%' in tag:
         # Wild card in tag. Must find all corresponding fields
         if code == '':
             # Code not specified.
             for field_tag, fields in rec.items():
                 if _tag_matches_pattern(field_tag, tag):
                     for field in fields:
                         if ind1 in ('%', field[1]) and ind2 in ('%', field[2]):
                             # Return matching field value if not empty
                             if field[3]:
                                 return field[3]
         elif code == '%':
             # Code is wildcard. Take first subfield of first matching field
             for field_tag, fields in rec.items():
                 if _tag_matches_pattern(field_tag, tag):
                     for field in fields:
                         if (ind1 in ('%', field[1]) and ind2 in ('%', field[2])
                             and field[0]):
                             return field[0][0][1]
         else:
             # Code is specified. Take corresponding one
             for field_tag, fields in rec.items():
                 if _tag_matches_pattern(field_tag, tag):
                     for field in fields:
                         if ind1 in ('%', field[1]) and ind2 in ('%', field[2]):
                             for subfield in field[0]:
                                 if subfield[0] == code:
                                     return subfield[1]
 
     else:
         # Tag is completely specified. Use tag as dict key
         if tag in rec:
             if code == '':
                 # Code not specified.
                 for field in rec[tag]:
                     if ind1 in ('%', field[1]) and ind2 in ('%', field[2]):
                         # Return matching field value if not empty
                         # or return "" empty if not exist.
                         if field[3]:
                             return field[3]
 
             elif code == '%':
                 # Code is wildcard. Take first subfield of first matching field
                 for field in rec[tag]:
                     if (ind1 in ('%', field[1]) and ind2 in ('%', field[2]) and
                         field[0]):
                         return field[0][0][1]
             else:
                 # Code is specified. Take corresponding one
                 for field in rec[tag]:
                     if ind1 in ('%', field[1]) and ind2 in ('%', field[2]):
                         for subfield in field[0]:
                             if subfield[0] == code:
                                 return subfield[1]
     # Nothing was found
     return ""
 
 def record_get_field_values(rec, tag, ind1=" ", ind2=" ", code=""):
     """Returns the list of (string) values for the specified field
     (tag, ind1, ind2, code) of the record (rec).
 
     Returns empty list if not found.
 
     Parameters (tag, ind1, ind2, code) can contain wildcard %.
 
     @param rec: a record structure as returned by create_record()
     @param tag: a 3 characters long string
     @param ind1: a 1 character long string
     @param ind2: a 1 character long string
     @param code: a 1 character long string
     @return: a list of strings"""
     tmp = []
 
     ind1, ind2 = _wash_indicators(ind1, ind2)
 
     if '%' in tag:
         # Wild card in tag. Must find all corresponding tags and fields
         tags = [k for k in rec if _tag_matches_pattern(k, tag)]
         if code == '':
             # Code not specified. Consider field value (without subfields)
             for tag in tags:
                 for field in rec[tag]:
                     if (ind1 in ('%', field[1]) and ind2 in ('%', field[2]) and
                         field[3]):
                         tmp.append(field[3])
         elif code == '%':
             # Code is wildcard. Consider all subfields
             for tag in tags:
                 for field in rec[tag]:
                     if ind1 in ('%', field[1]) and ind2 in ('%', field[2]):
                         for subfield in field[0]:
                             tmp.append(subfield[1])
         else:
             # Code is specified. Consider all corresponding subfields
             for tag in tags:
                 for field in rec[tag]:
                     if ind1 in ('%', field[1]) and ind2 in ('%', field[2]):
                         for subfield in field[0]:
                             if subfield[0] == code:
                                 tmp.append(subfield[1])
     else:
         # Tag is completely specified. Use tag as dict key
         if rec and tag in rec:
             if code == '':
                 # Code not specified. Consider field value (without subfields)
                 for field in rec[tag]:
                     if (ind1 in ('%', field[1]) and ind2 in ('%', field[2]) and
                         field[3]):
                         tmp.append(field[3])
             elif code == '%':
                 # Code is wildcard. Consider all subfields
                 for field in rec[tag]:
                     if ind1 in ('%', field[1]) and ind2 in ('%', field[2]):
                         for subfield in field[0]:
                             tmp.append(subfield[1])
             else:
                 # Code is specified. Take corresponding one
                 for field in rec[tag]:
                     if ind1 in ('%', field[1]) and ind2 in ('%', field[2]):
                         for subfield in field[0]:
                             if subfield[0] == code:
                                 tmp.append(subfield[1])
 
     # If tmp was not set, nothing was found
     return tmp
 
 def record_xml_output(rec, tags=None):
     """Generates the XML for record 'rec' and returns it as a string
     @rec: record
     @tags: list of tags to be printed"""
     if tags is None:
         tags = []
     if isinstance(tags, str):
         tags = [tags]
     if tags and '001' not in tags:
         # Add the missing controlfield.
         tags.append('001')
 
     marcxml = ['<record>']
 
     # Add the tag 'tag' to each field in rec[tag]
     fields = []
     for tag in rec:
         if not tags or tag in tags:
             for field in rec[tag]:
                 fields.append((tag, field))
     record_order_fields(fields)
     for field in fields:
         marcxml.append(field_xml_output(field[1], field[0]))
     marcxml.append('</record>')
     return '\n'.join(marcxml)
 
 def field_get_subfield_instances(field):
     """Returns the list of subfields associated with field 'field'"""
     return field[0]
 
 def field_get_subfield_values(field_instance, code):
     """Return subfield CODE values of the field instance FIELD."""
     return [subfield_value
             for subfield_code, subfield_value in field_instance[0]
             if subfield_code == code]
 
 def field_add_subfield(field, code, value):
     """Adds a subfield to field 'field'"""
     field[0].append((code, value))
 
 def record_order_fields(rec, fun="_order_by_ord"):
     """Orders field inside record 'rec' according to a function"""
     rec.sort(eval(fun))
 
 def field_xml_output(field, tag):
     """Generates the XML for field 'field' and returns it as a string."""
     marcxml = []
     if field[3]:
         marcxml.append('  <controlfield tag="%s">%s</controlfield>' %
             (tag, encode_for_xml(field[3])))
     else:
         marcxml.append('  <datafield tag="%s" ind1="%s" ind2="%s">' %
             (tag, field[1], field[2]))
         marcxml += [_subfield_xml_output(subfield) for subfield in field[0]]
         marcxml.append('  </datafield>')
     return '\n'.join(marcxml)
 
 def record_extract_oai_id(record):
     """Returns the OAI ID of the record."""
     tag = CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3]
     ind1 = CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3]
     ind2 = CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4]
     subfield = CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5]
     values = record_get_field_values(record, tag, ind1, ind2, subfield)
     oai_id_regex = re.compile("oai[a-zA-Z0-9/.:]+")
     for value in [value.strip() for value in values]:
         if oai_id_regex.match(value):
             return value
     return ""
 
 def print_rec(rec, format=1, tags=None):
     """prints a record
        format = 1 -- XML
        format = 2 -- HTML (not implemented)
        @tags: list of tags to be printed
       """
     if tags is None:
         tags = []
     if format == 1:
         text = record_xml_output(rec, tags)
     else:
         return ''
 
     return text
 
 def print_recs(listofrec, format=1, tags=None):
     """prints a list of records
        format = 1 -- XML
        format = 2 -- HTML (not implemented)
        @tags: list of tags to be printed
        if 'listofrec' is not a list it returns empty string
     """
     if tags is None:
         tags = []
     text = ""
 
     if type(listofrec).__name__ !='list':
         return ""
     else:
         for rec in listofrec:
             text = "%s\n%s" % (text, print_rec(rec, format, tags))
     return text
 
 def concat(alist):
     """Concats a list of lists"""
     newl = []
     for l in alist:
         newl.extend(l)
     return newl
 
 def print_errors(alist):
     """Creates a unique string with the strings in list, using '\n' as
     a separator."""
     text = ""
     for l in alist:
         text = '%s\n%s'% (text, l)
     return text
 
 def record_find_field(rec, tag, field, strict=False):
     """
     Returns the global and local positions of the first occurrence
     of the field in a record.
 
     @param rec:    A record dictionary structure
     @type  rec:    dictionary
     @param tag:    The tag of the field to search for
     @type  tag:    string
     @param field:  A field tuple as returned by create_field()
     @type  field:  tuple
     @param strict: A boolean describing the search method. If strict
                    is False, then the order of the subfields doesn't
                    matter. Default search method is strict.
     @type  strict: boolean
     @return:       A tuple of (global_position, local_position) or a
                    tuple (None, None) if the field is not present.
     @rtype:        tuple
     @raise InvenioBibRecordFieldError: If the provided field is invalid.
     """
     try:
         _check_field_validity(field)
     except InvenioBibRecordFieldError:
         raise
 
     for local_position, field1 in enumerate(rec.get(tag, [])):
         if _compare_fields(field, field1, strict):
             return (field1[4], local_position)
 
     return (None, None)
 
 def record_strip_empty_volatile_subfields(rec):
     """
     Removes unchanged volatile subfields from the record
     """
     for tag in rec.keys():
         for field in rec[tag]:
             field[0][:] = [subfield for subfield in field[0] if subfield[1][:9] != "VOLATILE:"]
 
 def record_strip_empty_fields(rec, tag=None):
     """
     Removes empty subfields and fields from the record. If 'tag' is not None, only
     a specific tag of the record will be stripped, otherwise the whole record.
 
     @param rec:  A record dictionary structure
     @type  rec:  dictionary
     @param tag:  The tag of the field to strip empty fields from
     @type  tag:  string
     """
     # Check whole record
     if tag is None:
         tags = rec.keys()
         for tag in tags:
             record_strip_empty_fields(rec, tag)
 
     # Check specific tag of the record
     elif tag in rec:
         # in case of a controlfield
         if tag[:2] == '00':
             if len(rec[tag]) == 0 or not rec[tag][0][3]:
                 del rec[tag]
 
         #in case of a normal field
         else:
             fields = []
             for field in rec[tag]:
                 subfields = []
                 for subfield in field[0]:
                     # check if the subfield has been given a value
                     if subfield[1]:
                         subfields.append(subfield)
                 if len(subfields) > 0:
                     new_field = create_field(subfields, field[1], field[2],
                         field[3])
                     fields.append(new_field)
             if len(fields) > 0:
                 rec[tag] = fields
             else:
                 del rec[tag]
 
 ### IMPLEMENTATION / INVISIBLE FUNCTIONS
 
 def _compare_fields(field1, field2, strict=True):
     """
     Compares 2 fields. If strict is True, then the order of the
     subfield will be taken care of, if not then the order of the
     subfields doesn't matter.
 
     @return: True if the field are equivalent, False otherwise.
     """
     if strict:
         # Return a simple equal test on the field minus the position.
         return field1[:4] == field2[:4]
     else:
         if field1[1:4] != field2[1:4]:
             # Different indicators or controlfield value.
             return False
         else:
             # Compare subfields in a loose way.
             return set(field1[0]) == set(field2[0])
 
 def _check_field_validity(field):
     """
     Checks if a field is well-formed.
 
     @param field: A field tuple as returned by create_field()
     @type field:  tuple
     @raise InvenioBibRecordFieldError: If the field is invalid.
     """
     if type(field) not in (list, tuple):
         raise InvenioBibRecordFieldError("Field of type '%s' should be either "
             "a list or a tuple." % type(field))
 
     if len(field) != 5:
         raise InvenioBibRecordFieldError("Field of length '%d' should have 5 "
             "elements." % len(field))
 
     if type(field[0]) not in (list, tuple):
         raise InvenioBibRecordFieldError("Subfields of type '%s' should be "
             "either a list or a tuple." % type(field[0]))
 
     if type(field[1]) is not str:
         raise InvenioBibRecordFieldError("Indicator 1 of type '%s' should be "
             "a string." % type(field[1]))
 
     if type(field[2]) is not str:
         raise InvenioBibRecordFieldError("Indicator 2 of type '%s' should be "
             "a string." % type(field[2]))
 
     if type(field[3]) is not str:
         raise InvenioBibRecordFieldError("Controlfield value of type '%s' "
             "should be a string." % type(field[3]))
 
     if type(field[4]) is not int:
         raise InvenioBibRecordFieldError("Global position of type '%s' should "
             "be an int." % type(field[4]))
 
     for subfield in field[0]:
         if (type(subfield) not in (list, tuple) or
             len(subfield) != 2 or
             type(subfield[0]) is not str or
             type(subfield[1]) is not str):
             raise InvenioBibRecordFieldError("Subfields are malformed. "
                 "Should a list of tuples of 2 strings.")
 
 def _shift_field_positions_global(record, start, delta=1):
     """Shifts all global field positions with global field positions
     higher or equal to 'start' from the value 'delta'."""
     if not delta:
         return
 
     for tag, fields in record.items():
         newfields = []
         for field in fields:
             if field[4] < start:
                 newfields.append(field)
             else:
                 # Increment the global field position by delta.
                 newfields.append(tuple(list(field[:4]) + [field[4] + delta]))
         record[tag] = newfields
 
 def _tag_matches_pattern(tag, pattern):
     """Returns true if MARC 'tag' matches a 'pattern'.
 
     'pattern' is plain text, with % as wildcard
 
     Both parameters must be 3 characters long strings.
 
     For e.g.
     >> _tag_matches_pattern("909", "909") -> True
     >> _tag_matches_pattern("909", "9%9") -> True
     >> _tag_matches_pattern("909", "9%8") -> False
 
     @param tag: a 3 characters long string
     @param pattern: a 3 characters long string
     @return: False or True"""
     for char1, char2 in zip(tag, pattern):
         if char2 not in ('%', char1):
             return False
     return True
 
 def validate_record_field_positions_global(record):
     """
     Checks if the global field positions in the record are valid ie no
     duplicate global field positions and local field positions in the
     list of fields are ascending.
 
     @param record: the record data structure
     @return: the first error found as a string or None if no error was found
     """
     all_fields = []
     for tag, fields in record.items():
         previous_field_position_global = -1
         for field in fields:
             if field[4] < previous_field_position_global:
                 return "Non ascending global field positions in tag '%s'." % tag
             previous_field_position_global = field[4]
             if field[4] in all_fields:
                 return ("Duplicate global field position '%d' in tag '%s'" %
                     (field[4], tag))
 
 def _record_sort_by_indicators(record):
     """Sorts the fields inside the record by indicators."""
     for tag, fields in record.items():
         record[tag] = _fields_sort_by_indicators(fields)
 
 def _fields_sort_by_indicators(fields):
     """Sorts a set of fields by their indicators. Returns a sorted list
     with correct global field positions."""
     field_dict = {}
     field_positions_global = []
     for field in fields:
         field_dict.setdefault(field[1:3], []).append(field)
         field_positions_global.append(field[4])
 
     indicators = field_dict.keys()
     indicators.sort()
 
     field_list = []
     for indicator in indicators:
         for field in field_dict[indicator]:
             field_list.append(field[:4] + (field_positions_global.pop(0),))
 
     return field_list
 
 def _select_parser(parser=None):
     """Selects the more relevant parser based on the parsers available
     and on the parser desired by the user."""
     if not AVAILABLE_PARSERS:
         # No parser is available. This is bad.
         return None
 
     if parser is None or parser not in AVAILABLE_PARSERS:
         # Return the best available parser.
         return AVAILABLE_PARSERS[0]
     else:
         return parser
 
 def _create_record_rxp(marcxml, verbose=CFG_BIBRECORD_DEFAULT_VERBOSE_LEVEL,
     correct=CFG_BIBRECORD_DEFAULT_CORRECT,
     keep_singletons=CFG_BIBRECORD_KEEP_SINGLETONS):
     """Creates a record object using the RXP parser.
 
     If verbose>3 then the parser will be strict and will stop in case of
         well-formedness errors or DTD errors.
     If verbose=0, the parser will not give warnings.
     If 0 < verbose <= 3, the parser will not give errors, but will warn
         the user about possible mistakes
 
     correct != 0 -> We will try to correct errors such as missing
         attributes
     correct = 0 -> there will not be any attempt to correct errors"""
     if correct:
         # Note that with pyRXP < 1.13 a memory leak has been found
         # involving DTD parsing. So enable correction only if you have
         # pyRXP 1.13 or greater.
         marcxml = ('<?xml version="1.0" encoding="UTF-8"?>\n'
             '<!DOCTYPE collection SYSTEM "file://%s">\n'
             '<collection>\n%s\n</collection>' % (CFG_MARC21_DTD, marcxml))
 
     # Create the pyRXP parser.
     pyrxp_parser = pyRXP.Parser(ErrorOnValidityErrors=0, ProcessDTD=1,
         ErrorOnUnquotedAttributeValues=0, srcName='string input')
 
     if verbose > 3:
         pyrxp_parser.ErrorOnValidityErrors = 1
         pyrxp_parser.ErrorOnUnquotedAttributeValues = 1
 
     try:
         root = pyrxp_parser.parse(marcxml)
     except pyRXP.error, ex1:
         raise InvenioBibRecordParserError(str(ex1))
 
     # If record is enclosed in a collection tag, extract it.
     if root[TAG] == 'collection':
         children = _get_children_by_tag_name_rxp(root, 'record')
         if not children:
             return {}
         root = children[0]
 
     record = {}
     # This is needed because of the record_xml_output function, where we
     # need to know the order of the fields.
     field_position_global = 1
 
     # Consider the control fields.
     for controlfield in _get_children_by_tag_name_rxp(root, 'controlfield'):
         if controlfield[CHILDREN]:
             value = ''.join([n for n in controlfield[CHILDREN]])
             # Construct the field tuple.
             field = ([], ' ', ' ', value, field_position_global)
             record.setdefault(controlfield[ATTRS]['tag'], []).append(field)
             field_position_global += 1
         elif keep_singletons:
             field = ([], ' ', ' ', '', field_position_global)
             record.setdefault(controlfield[ATTRS]['tag'], []).append(field)
             field_position_global += 1
 
     # Consider the data fields.
     for datafield in _get_children_by_tag_name_rxp(root, 'datafield'):
         subfields = []
         for subfield in _get_children_by_tag_name_rxp(datafield, 'subfield'):
             if subfield[CHILDREN]:
                 value = ''.join([n for n in subfield[CHILDREN]])
                 subfields.append((subfield[ATTRS].get('code', '!'), value))
             elif keep_singletons:
                 subfields.append((subfield[ATTRS].get('code', '!'), ''))
 
         if subfields or keep_singletons:
             # Create the field.
             tag = datafield[ATTRS].get('tag', '!')
             ind1 = datafield[ATTRS].get('ind1', '!')
             ind2 = datafield[ATTRS].get('ind2', '!')
             ind1, ind2 = _wash_indicators(ind1, ind2)
             # Construct the field tuple.
             field = (subfields, ind1, ind2, '', field_position_global)
             record.setdefault(tag, []).append(field)
 
             field_position_global += 1
 
     return record
 
 def _create_record_from_document(document,
         keep_singletons=CFG_BIBRECORD_KEEP_SINGLETONS):
     """Creates a record from the document (of type
     xml.dom.minidom.Document or Ft.Xml.Domlette.Document)."""
     root = None
     for node in document.childNodes:
         if node.nodeType == node.ELEMENT_NODE:
             root = node
             break
 
     if root is None:
         return {}
 
     if root.tagName == 'collection':
         children = _get_children_by_tag_name(root, 'record')
         if not children:
             return {}
         root = children[0]
 
     field_position_global = 1
     record = {}
 
     for controlfield in _get_children_by_tag_name(root, "controlfield"):
         tag = controlfield.getAttributeNS(None, "tag").encode('utf-8')
 
         text_nodes = controlfield.childNodes
         value = ''.join([n.data for n in text_nodes]).encode("utf-8")
 
         if value or keep_singletons:
             field = ([], " ", " ", value, field_position_global)
             record.setdefault(tag, []).append(field)
             field_position_global += 1
 
     for datafield in _get_children_by_tag_name(root, "datafield"):
         subfields = []
 
         for subfield in _get_children_by_tag_name(datafield, "subfield"):
             text_nodes = subfield.childNodes
             value = ''.join([n.data for n in text_nodes]).encode("utf-8")
             if value or keep_singletons:
                 code = subfield.getAttributeNS(None, 'code').encode("utf-8")
                 subfields.append((code or '!', value))
 
         if subfields or keep_singletons:
             tag = datafield.getAttributeNS(None, "tag").encode("utf-8") or '!'
 
             ind1 = datafield.getAttributeNS(None, "ind1").encode("utf-8")
             ind2 = datafield.getAttributeNS(None, "ind2").encode("utf-8")
             ind1, ind2 = _wash_indicators(ind1, ind2)
             field = (subfields, ind1, ind2, "", field_position_global)
 
             record.setdefault(tag, []).append(field)
             field_position_global += 1
 
     return record
 
 def _create_record_minidom(marcxml,
         keep_singletons=CFG_BIBRECORD_KEEP_SINGLETONS):
     """Creates a record using minidom."""
     try:
         dom = xml.dom.minidom.parseString(marcxml)
     except xml.parsers.expat.ExpatError, ex1:
         raise InvenioBibRecordParserError(str(ex1))
 
     return _create_record_from_document(dom, keep_singletons=keep_singletons)
 
 def _create_record_4suite(marcxml,
         keep_singletons=CFG_BIBRECORD_KEEP_SINGLETONS):
     """Creates a record using the 4suite parser."""
     try:
         dom = Ft.Xml.Domlette.NonvalidatingReader.parseString(marcxml,
             "urn:dummy")
     except Ft.Xml.ReaderException, ex1:
         raise InvenioBibRecordParserError(ex1.message)
 
     return _create_record_from_document(dom, keep_singletons=keep_singletons)
 
 def _concat(alist):
     """Concats a list of lists"""
     return [element for single_list in alist for element in single_list]
 
 def _subfield_xml_output(subfield):
     """Generates the XML for a subfield object and return it as a string"""
     return '    <subfield code="%s">%s</subfield>' % (subfield[0],
         encode_for_xml(subfield[1]))
 
 def _order_by_ord(field1, field2):
     """Function used to order the fields according to their ord value"""
     return cmp(field1[1][4], field2[1][4])
 
 def _get_children_by_tag_name(node, name):
     """Retrieves all children from node 'node' with name 'name' and
     returns them as a list."""
     try:
         return [child for child in node.childNodes if child.nodeName == name]
     except TypeError:
         return []
 
 def _get_children_by_tag_name_rxp(node, name):
     """Retrieves all children from 'children' with tag name 'tag' and
     returns them as a list.
     children is a list returned by the RXP parser"""
     try:
         return [child for child in node[CHILDREN] if child[TAG] == name]
     except TypeError:
         return []
 
 def _wash_indicators(*indicators):
     """
     Washes the values of the indicators. An empty string or an
     underscore is replaced by a blank space.
 
     @param indicators: a series of indicators to be washed
     @return: a list of washed indicators
     """
     return [indicator in ('', '_') and ' ' or indicator
             for indicator in indicators]
 
 def _correct_record(record):
     """
     Checks and corrects the structure of the record.
 
     @param record: the record data structure
     @return: a list of errors found
     """
     errors = []
 
     for tag in record.keys():
         upper_bound = '999'
         n = len(tag)
 
         if n > 3:
             i = n - 3
             while i > 0:
                 upper_bound = '%s%s' % ('0', upper_bound)
                 i -= 1
 
         # Missing tag. Replace it with dummy tag '000'.
         if tag == '!':
             errors.append((1, '(field number(s): ' +
                 str([f[4] for f in record[tag]]) + ')'))
             record['000'] = record.pop(tag)
             tag = '000'
         elif not ('001' <= tag <= upper_bound or tag in ('FMT', 'FFT')):
             errors.append(2)
             record['000'] = record.pop(tag)
             tag = '000'
 
         fields = []
         for field in record[tag]:
             # Datafield without any subfield.
             if field[0] == [] and field[3] == '':
                 errors.append((8, '(field number: ' + str(field[4]) + ')'))
 
             subfields = []
             for subfield in field[0]:
                 if subfield[0] == '!':
                     errors.append((3, '(field number: ' + str(field[4]) + ')'))
                     newsub = ('', subfield[1])
                 else:
                     newsub = subfield
                 subfields.append(newsub)
 
             if field[1] == '!':
                 errors.append((4, '(field number: ' + str(field[4]) + ')'))
                 ind1 = " "
             else:
                 ind1 = field[1]
 
             if field[2] == '!':
                 errors.append((5, '(field number: ' + str(field[4]) + ')'))
                 ind2 = " "
             else:
                 ind2 = field[2]
 
             fields.append((subfields, ind1, ind2, field[3], field[4]))
 
         record[tag] = fields
 
     return errors
 
 def _warning(code):
     """It returns a warning message of code 'code'.
         If code = (cd, str) it returns the warning message of code 'cd'
         and appends str at the end"""
     if isinstance(code, str):
         return code
 
     message = ''
     if isinstance(code, tuple):
         if isinstance(code[0], str):
             message = code[1]
             code = code[0]
     return CFG_BIBRECORD_WARNING_MSGS.get(code, '') + message
 
 def _warnings(alist):
     """Applies the function _warning() to every element in l."""
     return [_warning(element) for element in alist]
 
 def _compare_lists(list1, list2, custom_cmp):
     """Compares twolists using given comparing function
     @param list1: first list to compare
     @param list2: second list to compare
     @param custom_cmp: a function taking two arguments (element of
         list 1, element of list 2) and
     @return: True or False depending if the values are the same"""
     if len(list1) != len(list2):
         return False
     for element1, element2 in zip(list1, list2):
         if not custom_cmp(element1, element2):
             return False
     return True
-
-if PSYCO_AVAILABLE:
-    psyco.bind(_correct_record)
-    psyco.bind(_create_record_4suite)
-    psyco.bind(_create_record_rxp)
-    psyco.bind(_create_record_minidom)
-    psyco.bind(field_get_subfield_values)
-    psyco.bind(create_records)
-    psyco.bind(create_record)
-    psyco.bind(record_get_field_instances)
-    psyco.bind(record_get_field_value)
-    psyco.bind(record_get_field_values)
diff --git a/modules/bibindex/lib/bibindex_engine.py b/modules/bibindex/lib/bibindex_engine.py
index 26941b81e..45975f736 100644
--- a/modules/bibindex/lib/bibindex_engine.py
+++ b/modules/bibindex/lib/bibindex_engine.py
@@ -1,1732 +1,1723 @@
 # -*- coding: utf-8 -*-
 ##
 ## This file is part of Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """
 BibIndex indexing engine implementation.  See bibindex executable for entry point.
 """
 
 __revision__ = "$Id$"
 
 import os
 import re
 import sys
 import time
 
 from invenio.config import \
      CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS, \
      CFG_BIBINDEX_CHARS_PUNCTUATION, \
      CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY, \
      CFG_BIBINDEX_MIN_WORD_LENGTH, \
      CFG_BIBINDEX_REMOVE_HTML_MARKUP, \
      CFG_BIBINDEX_REMOVE_LATEX_MARKUP, \
      CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES, \
      CFG_CERN_SITE, CFG_INSPIRE_SITE, \
      CFG_BIBINDEX_PERFORM_OCR_ON_DOCNAMES, \
      CFG_BIBINDEX_SPLASH_PAGES
 from invenio.websubmit_config import CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM
 from invenio.bibindex_engine_config import CFG_MAX_MYSQL_THREADS, \
     CFG_MYSQL_THREAD_TIMEOUT, \
     CFG_CHECK_MYSQL_THREADS
 from invenio.bibindex_engine_tokenizer import BibIndexFuzzyNameTokenizer, \
      BibIndexExactNameTokenizer
 from invenio.bibdocfile import bibdocfile_url_p, \
      bibdocfile_url_to_bibdoc, normalize_format, \
      download_url, guess_format_from_url, BibRecDocs
 from invenio.websubmit_file_converter import convert_file
 from invenio.search_engine import perform_request_search, strip_accents, \
      wash_index_term, lower_index_term, get_index_stemming_language
 from invenio.dbquery import run_sql, DatabaseError, serialize_via_marshal, \
      deserialize_via_marshal
 from invenio.bibindex_engine_stopwords import is_stopword
 from invenio.bibindex_engine_stemmer import stem
 from invenio.bibtask import task_init, write_message, get_datetime, \
     task_set_option, task_get_option, task_get_task_param, task_update_status, \
     task_update_progress, task_sleep_now_if_required
 from invenio.intbitset import intbitset
 from invenio.errorlib import register_exception
 from invenio.htmlutils import remove_html_markup
 from invenio.textutils import wash_for_utf8
 
 if sys.hexversion < 0x2040000:
     # pylint: disable=W0622
     from sets import Set as set
     # pylint: enable=W0622
 
 # FIXME: journal tag and journal pubinfo standard format are defined here:
 if CFG_CERN_SITE:
     CFG_JOURNAL_TAG = '773__%'
     CFG_JOURNAL_PUBINFO_STANDARD_FORM = "773__p 773__v (773__y) 773__c"
 elif CFG_INSPIRE_SITE:
     CFG_JOURNAL_TAG = '773__%'
     CFG_JOURNAL_PUBINFO_STANDARD_FORM = "773__p,773__v,773__c"
 else:
     CFG_JOURNAL_TAG = '909C4%'
     CFG_JOURNAL_PUBINFO_STANDARD_FORM = "909C4p 909C4v (909C4y) 909C4c"
 
 ## precompile some often-used regexp for speed reasons:
 re_subfields = re.compile('\$\$\w')
 re_block_punctuation_begin = re.compile(r"^"+CFG_BIBINDEX_CHARS_PUNCTUATION+"+")
 re_block_punctuation_end = re.compile(CFG_BIBINDEX_CHARS_PUNCTUATION+"+$")
 re_punctuation = re.compile(CFG_BIBINDEX_CHARS_PUNCTUATION)
 re_separators = re.compile(CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS)
 re_datetime_shift = re.compile("([-\+]{0,1})([\d]+)([dhms])")
 re_arxiv = re.compile(r'^arxiv:\d\d\d\d\.\d\d\d\d')
 
 nb_char_in_line = 50  # for verbose pretty printing
 chunksize = 1000 # default size of chunks that the records will be treated by
 base_process_size = 4500 # process base size
 _last_word_table = None
 
 def list_union(list1, list2):
     "Returns union of the two lists."
     union_dict = {}
     for e in list1:
         union_dict[e] = 1
     for e in list2:
         union_dict[e] = 1
     return union_dict.keys()
 
 ## safety function for killing slow DB threads:
 def kill_sleepy_mysql_threads(max_threads=CFG_MAX_MYSQL_THREADS, thread_timeout=CFG_MYSQL_THREAD_TIMEOUT):
     """Check the number of DB threads and if there are more than
        MAX_THREADS of them, lill all threads that are in a sleeping
        state for more than THREAD_TIMEOUT seconds.  (This is useful
        for working around the the max_connection problem that appears
        during indexation in some not-yet-understood cases.)  If some
        threads are to be killed, write info into the log file.
     """
     res = run_sql("SHOW FULL PROCESSLIST")
     if len(res) > max_threads:
         for row in res:
             r_id, dummy, dummy, dummy, r_command, r_time, dummy, dummy = row
             if r_command == "Sleep" and int(r_time) > thread_timeout:
                 run_sql("KILL %s", (r_id,))
                 write_message("WARNING: too many DB threads, killing thread %s" % r_id, verbose=1)
     return
 
 ## MARC-21 tag/field access functions
 def get_fieldvalues(recID, tag):
     """Returns list of values of the MARC-21 'tag' fields for the record
        'recID'."""
     bibXXx = "bib" + tag[0] + tag[1] + "x"
     bibrec_bibXXx = "bibrec_" + bibXXx
     query = "SELECT value FROM %s AS b, %s AS bb WHERE bb.id_bibrec=%%s AND bb.id_bibxxx=b.id AND tag LIKE %%s" \
             % (bibXXx, bibrec_bibXXx)
     res = run_sql(query, (recID, tag))
     return [row[0] for row in res]
 
 def get_associated_subfield_value(recID, tag, value, associated_subfield_code):
     """Return list of ASSOCIATED_SUBFIELD_CODE, if exists, for record
     RECID and TAG of value VALUE.  Used by fulltext indexer only.
     Note: TAG must be 6 characters long (tag+ind1+ind2+sfcode),
     otherwise en empty string is returned.
     FIXME: what if many tag values have the same value but different
     associated_subfield_code?  Better use bibrecord library for this.
     """
     out = ""
     if len(tag) != 6:
         return out
     bibXXx = "bib" + tag[0] + tag[1] + "x"
     bibrec_bibXXx = "bibrec_" + bibXXx
     query = """SELECT bb.field_number, b.tag, b.value FROM %s AS b, %s AS bb
                WHERE bb.id_bibrec=%%s AND bb.id_bibxxx=b.id AND tag LIKE
                %%s%%""" % (bibXXx, bibrec_bibXXx)
     res = run_sql(query, (recID, tag[:-1]))
     field_number = -1
     for row in res:
         if row[1] == tag and row[2] == value:
             field_number = row[0]
     if field_number > 0:
         for row in res:
             if row[0] == field_number and row[1] == tag[:-1] + associated_subfield_code:
                 out = row[2]
                 break
     return out
 
 def get_field_tags(field):
     """Returns a list of MARC tags for the field code 'field'.
        Returns empty list in case of error.
        Example: field='author', output=['100__%','700__%']."""
     out = []
     query = """SELECT t.value FROM tag AS t, field_tag AS ft, field AS f
                 WHERE f.code=%s AND ft.id_field=f.id AND t.id=ft.id_tag
                 ORDER BY ft.score DESC"""
     res = run_sql(query, (field, ))
     return [row[0] for row in res]
 
 ## Fulltext word extraction functions
 def get_fulltext_urls_from_html_page(htmlpagebody):
 
     """Parses htmlpagebody data (the splash page content) looking for
        url_directs referring to probable fulltexts.
        Returns an array of (ext,url_direct) to fulltexts.
        Note: it looks for file format extensions as defined by global
        'CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM' structure, minus the HTML ones, because we don't
        want to index HTML pages that the splash page might point to.
        """
     out = []
     for ext in CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM:
         expr = re.compile( r"\"(http://[\w]+\.+[\w]+[^\"'><]*\." + \
                            ext + r")\"")
         match =  expr.search(htmlpagebody)
         if match and ext not in ['htm', 'html']:
             out.append([ext, match.group(1)])
         #else: # FIXME: workaround for getfile, should use bibdoc tables
             #expr_getfile = re.compile(r"\"(http://.*getfile\.py\?.*format=" + ext + "&version=.*)\"")
             #match =  expr_getfile.search(htmlpagebody)
             #if match and ext not in ['htm', 'html']:
                 #out.append([ext, match.group(1)])
     return out
 
 def get_words_from_journal_tag(recID, tag):
     """
     Special procedure to extract words from journal tags.  Joins
     title/volume/year/page into a standard form that is also used for
     citations.
     """
 
     # get all journal tags/subfields:
     bibXXx = "bib" + tag[0] + tag[1] + "x"
     bibrec_bibXXx = "bibrec_" + bibXXx
     query = """SELECT bb.field_number,b.tag,b.value FROM %s AS b, %s AS bb
                 WHERE bb.id_bibrec=%%s
                   AND bb.id_bibxxx=b.id AND tag LIKE %%s""" % (bibXXx, bibrec_bibXXx)
     res = run_sql(query, (recID, tag))
     # construct journal pubinfo:
     dpubinfos = {}
     for row in res:
         nb_instance, subfield, value = row
         if subfield.endswith("c"):
             # delete pageend if value is pagestart-pageend
             # FIXME: pages may not be in 'c' subfield
             value = value.split('-', 1)[0]
         if dpubinfos.has_key(nb_instance):
             dpubinfos[nb_instance][subfield] = value
         else:
             dpubinfos[nb_instance] = {subfield: value}
     # construct standard format:
     lwords = []
     for dpubinfo in dpubinfos.values():
         # index all journal subfields separately
         for tag,val in dpubinfo.items():
             lwords.append(val)
         # index journal standard format:
         pubinfo = CFG_JOURNAL_PUBINFO_STANDARD_FORM
         for tag,val in dpubinfo.items():
             pubinfo = pubinfo.replace(tag,val)
         if CFG_JOURNAL_TAG[:-1] in pubinfo:
             # some subfield was missing, do nothing
             pass
         else:
             lwords.append(pubinfo)
     # return list of words and pubinfos:
     return lwords
 
 def get_words_from_date_tag(datestring, stemming_language=None):
     """
     Special procedure to index words from tags storing date-like
     information in format YYYY or YYYY-MM or YYYY-MM-DD.  Namely, we
     are indexing word-terms YYYY, YYYY-MM, YYYY-MM-DD, but never
     standalone MM or DD.
     """
     out = []
     for dateword in datestring.split():
         # maybe there are whitespaces, so break these too
         out.append(dateword)
         parts = dateword.split('-')
         for nb in range(1,len(parts)):
             out.append("-".join(parts[:nb]))
     return out
 
 def get_words_from_fulltext(url_direct_or_indirect, stemming_language=None):
     """Returns all the words contained in the document specified by
        URL_DIRECT_OR_INDIRECT with the words being split by various
        SRE_SEPARATORS regexp set earlier.  If FORCE_FILE_EXTENSION is
        set (e.g. to "pdf", then treat URL_DIRECT_OR_INDIRECT as a PDF
        file.  (This is interesting to index Indico for example.)  Note
        also that URL_DIRECT_OR_INDIRECT may be either a direct URL to
        the fulltext file or an URL to a setlink-like page body that
        presents the links to be indexed.  In the latter case the
        URL_DIRECT_OR_INDIRECT is parsed to extract actual direct URLs
        to fulltext documents, for all knows file extensions as
        specified by global CONV_PROGRAMS config variable.
     """
     re_perform_ocr = re.compile(CFG_BIBINDEX_PERFORM_OCR_ON_DOCNAMES)
     write_message("... reading fulltext files from %s started" % url_direct_or_indirect, verbose=2)
     try:
         if bibdocfile_url_p(url_direct_or_indirect):
             write_message("... %s is an internal document" % url_direct_or_indirect, verbose=2)
             bibdoc = bibdocfile_url_to_bibdoc(url_direct_or_indirect)
             perform_ocr = bool(re_perform_ocr.match(bibdoc.get_docname()))
             write_message("... will extract words from %s (docid: %s) %s" % (bibdoc.get_docname(), bibdoc.get_id(), perform_ocr and 'with OCR' or ''), verbose=2)
             if not bibdoc.has_text(require_up_to_date=True):
                 bibdoc.extract_text(perform_ocr=perform_ocr)
             return get_words_from_phrase(bibdoc.get_text(), stemming_language)
         else:
             if CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY:
                 write_message("... %s is external URL but indexing only local files" % url_direct_or_indirect, verbose=2)
                 return []
             write_message("... %s is an external URL" % url_direct_or_indirect, verbose=2)
             best_formats = [normalize_format(format) for format in CFG_WEBSUBMIT_BEST_FORMATS_TO_EXTRACT_TEXT_FROM]
             format = guess_format_from_url(url_direct_or_indirect)
             if re.match(CFG_BIBINDEX_SPLASH_PAGES, url_direct_or_indirect):
                 urls = get_fulltext_urls_from_html_page(url_direct_or_indirect)
             else:
                 urls = [url_direct_or_indirect]
             write_message("... will extract words from %s" % ', '.join(urls), verbose=2)
             words = {}
             for url in urls:
                 format = guess_format_from_url(url)
                 tmpdoc = download_url(url, format)
                 tmptext = convert_file(tmpdoc, output_format='.txt')
                 os.remove(tmpdoc)
                 text = open(tmptext).read()
                 os.remove(tmptext)
                 tmpwords = get_words_from_phrase(text, stemming_language)
                 words.update(dict(map(lambda x: (x, 1), tmpwords)))
             return words.keys()
     except Exception, e:
         register_exception(prefix='ERROR: it\'s impossible to correctly extract words from %s' % url_direct_or_indirect, alert_admin=True)
         write_message("ERROR: %s" % e, stream=sys.stderr)
         return []
 
 latex_markup_re = re.compile(r"\\begin(\[.+?\])?\{.+?\}|\\end\{.+?}|\\\w+(\[.+?\])?\{(?P<inside1>.*?)\}|\{\\\w+ (?P<inside2>.*?)\}")
 
 def remove_latex_markup(phrase):
     ret_phrase = ''
     index = 0
     for match in latex_markup_re.finditer(phrase):
         ret_phrase += phrase[index:match.start()]
         ret_phrase += match.group('inside1') or match.group('inside2') or ''
         index = match.end()
     ret_phrase += phrase[index:]
     return ret_phrase
 
 def get_nothing_from_phrase(phrase, stemming_language=None):
     """ A dump implementation of get_words_from_phrase to be used when
     when a tag should not be indexed (such as when trying to extract phrases from
     8564_u)."""
     return []
 
 def swap_temporary_reindex_tables(index_id, reindex_prefix="tmp_"):
     """Atomically swap reindexed temporary table with the original one.
     Delete the now-old one."""
     write_message("Putting new tmp index tables for id %s into production" % index_id)
     run_sql(
         "RENAME TABLE " +
         "idxWORD%02dR TO old_idxWORD%02dR," % (index_id, index_id) +
         "%sidxWORD%02dR TO idxWORD%02dR," % (reindex_prefix, index_id, index_id) +
         "idxWORD%02dF TO old_idxWORD%02dF," % (index_id, index_id) +
         "%sidxWORD%02dF TO idxWORD%02dF," % (reindex_prefix, index_id, index_id) +
         "idxPAIR%02dR TO old_idxPAIR%02dR," % (index_id, index_id) +
         "%sidxPAIR%02dR TO idxPAIR%02dR," % (reindex_prefix, index_id, index_id) +
         "idxPAIR%02dF TO old_idxPAIR%02dF," % (index_id, index_id) +
         "%sidxPAIR%02dF TO idxPAIR%02dF," % (reindex_prefix, index_id, index_id) +
         "idxPHRASE%02dR TO old_idxPHRASE%02dR," % (index_id, index_id) +
         "%sidxPHRASE%02dR TO idxPHRASE%02dR," % (reindex_prefix, index_id, index_id) +
         "idxPHRASE%02dF TO old_idxPHRASE%02dF," % (index_id, index_id) +
         "%sidxPHRASE%02dF TO idxPHRASE%02dF;" % (reindex_prefix, index_id, index_id)
     )
     write_message("Dropping old index tables for id %s" % index_id)
     run_sql("DROP TABLE old_idxWORD%02dR, old_idxWORD%02dF, old_idxPAIR%02dR, old_idxPAIR%02dF, old_idxPHRASE%02dR, old_idxPHRASE%02dF" % (index_id, index_id, index_id, index_id, index_id, index_id)
     )
 
 def init_temporary_reindex_tables(index_id, reindex_prefix="tmp_"):
     """Create reindexing temporary tables."""
     write_message("Creating new tmp index tables for id %s" % index_id)
     res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxWORD%02dF (
                         id mediumint(9) unsigned NOT NULL auto_increment,
                         term varchar(50) default NULL,
                         hitlist longblob,
                         PRIMARY KEY  (id),
                         UNIQUE KEY term (term)
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
 
     res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxWORD%02dR (
                         id_bibrec mediumint(9) unsigned NOT NULL,
                         termlist longblob,
                         type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT',
                         PRIMARY KEY (id_bibrec,type)
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
 
     res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxPAIR%02dF (
                         id mediumint(9) unsigned NOT NULL auto_increment,
                         term varchar(100) default NULL,
                         hitlist longblob,
                         PRIMARY KEY  (id),
                         UNIQUE KEY term (term)
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
 
     res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxPAIR%02dR (
                         id_bibrec mediumint(9) unsigned NOT NULL,
                         termlist longblob,
                         type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT',
                         PRIMARY KEY (id_bibrec,type)
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
 
     res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxPHRASE%02dF (
                         id mediumint(9) unsigned NOT NULL auto_increment,
                         term text default NULL,
                         hitlist longblob,
                         PRIMARY KEY  (id),
                         KEY term (term(50))
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
 
     res = run_sql("""CREATE TABLE IF NOT EXISTS %sidxPHRASE%02dR (
                         id_bibrec mediumint(9) unsigned NOT NULL default '0',
                         termlist longblob,
                         type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT',
                         PRIMARY KEY  (id_bibrec,type)
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
     run_sql("UPDATE idxINDEX SET last_updated='0000-00-00 00:00:00' WHERE id=%s", (index_id,))
 
 
 latex_formula_re = re.compile(r'\$.*?\$|\\\[.*?\\\]')
 def get_words_from_phrase(phrase, stemming_language=None):
     """Return list of words found in PHRASE.  Note that the phrase is
        split into groups depending on the alphanumeric characters and
        punctuation characters definition present in the config file.
     """
     words = {}
     formulas = []
     if CFG_BIBINDEX_REMOVE_HTML_MARKUP and phrase.find("</") > -1:
         phrase = remove_html_markup(phrase)
     if CFG_BIBINDEX_REMOVE_LATEX_MARKUP:
         formulas = latex_formula_re.findall(phrase)
         phrase = remove_latex_markup(phrase)
         phrase = latex_formula_re.sub(' ', phrase)
     phrase = wash_for_utf8(phrase)
     phrase = lower_index_term(phrase)
     # 1st split phrase into blocks according to whitespace
     for block in strip_accents(phrase).split():
         # 2nd remove leading/trailing punctuation and add block:
         block = re_block_punctuation_begin.sub("", block)
         block = re_block_punctuation_end.sub("", block)
         if block:
             stemmed_block = apply_stemming_and_stopwords_and_length_check(block, stemming_language)
             if stemmed_block:
                 words[stemmed_block] = 1
             if re_arxiv.match(block):
                 # special case for blocks like `arXiv:1007.5048' where
                 # we would like to index the part after the colon
                 # regardless of dot or other punctuation characters:
                 words[block.split(':', 1)[1]] = 1
             # 3rd break each block into subblocks according to punctuation and add subblocks:
             for subblock in re_punctuation.split(block):
                 stemmed_subblock = apply_stemming_and_stopwords_and_length_check(subblock, stemming_language)
                 if stemmed_subblock:
                     words[stemmed_subblock] = 1
                 # 4th break each subblock into alphanumeric groups and add groups:
                 for alphanumeric_group in re_separators.split(subblock):
                     stemmed_alphanumeric_group = apply_stemming_and_stopwords_and_length_check(alphanumeric_group, stemming_language)
                     if stemmed_alphanumeric_group:
                         words[stemmed_alphanumeric_group] = 1
     for block in formulas:
         words[block] = 1
     return words.keys()
 
 def get_pairs_from_phrase(phrase, stemming_language=None):
     """Return list of words found in PHRASE.  Note that the phrase is
        split into groups depending on the alphanumeric characters and
        punctuation characters definition present in the config file.
     """
     words = {}
     if CFG_BIBINDEX_REMOVE_HTML_MARKUP and phrase.find("</") > -1:
         phrase = remove_html_markup(phrase)
     if CFG_BIBINDEX_REMOVE_LATEX_MARKUP:
         phrase = remove_latex_markup(phrase)
         phrase = latex_formula_re.sub(' ', phrase)
     phrase = wash_for_utf8(phrase)
     phrase = lower_index_term(phrase)
     # 1st split phrase into blocks according to whitespace
     last_word = ''
     for block in strip_accents(phrase).split():
         # 2nd remove leading/trailing punctuation and add block:
         block = re_block_punctuation_begin.sub("", block)
         block = re_block_punctuation_end.sub("", block)
         if block:
             if stemming_language:
                 block = apply_stemming_and_stopwords_and_length_check(block, stemming_language)
             # 3rd break each block into subblocks according to punctuation and add subblocks:
             for subblock in re_punctuation.split(block):
                 if stemming_language:
                     subblock = apply_stemming_and_stopwords_and_length_check(subblock, stemming_language)
                 if subblock:
                     # 4th break each subblock into alphanumeric groups and add groups:
                     for alphanumeric_group in re_separators.split(subblock):
                         if stemming_language:
                             alphanumeric_group = apply_stemming_and_stopwords_and_length_check(alphanumeric_group, stemming_language)
                         if alphanumeric_group:
                             if last_word:
                                 words['%s %s' % (last_word, alphanumeric_group)] = 1
                             last_word = alphanumeric_group
     return words.keys()
 
 phrase_delimiter_re = re.compile(r'[\.:;\?\!]')
 space_cleaner_re = re.compile(r'\s+')
 def get_phrases_from_phrase(phrase, stemming_language=None):
     """Return list of phrases found in PHRASE.  Note that the phrase is
        split into groups depending on the alphanumeric characters and
        punctuation characters definition present in the config file.
     """
     phrase = wash_for_utf8(phrase)
     return [phrase]
     ## Note that we don't break phrases, they are used for exact style
     ## of searching.
     words = {}
     phrase = strip_accents(phrase)
     # 1st split phrase into blocks according to whitespace
     for block1 in phrase_delimiter_re.split(strip_accents(phrase)):
         block1 = block1.strip()
         if block1 and stemming_language:
             new_words = []
             for block2 in re_punctuation.split(block1):
                 block2 = block2.strip()
                 if block2:
                     for block3 in block2.split():
                         block3 = block3.strip()
                         if block3:
                             # Note that we don't stem phrases, they
                             # are used for exact style of searching.
                             new_words.append(block3)
             block1 = ' '.join(new_words)
         if block1:
             words[block1] = 1
     return words.keys()
 
 def get_fuzzy_authors_from_phrase(phrase, stemming_language=None):
     """
     Return list of fuzzy phrase-tokens suitable for storing into
     author phrase index.
     """
     author_tokenizer = BibIndexFuzzyNameTokenizer()
     return author_tokenizer.tokenize(phrase)
 
 def get_exact_authors_from_phrase(phrase, stemming_language=None):
     """
     Return list of exact phrase-tokens suitable for storing into
     exact author phrase index.
     """
     author_tokenizer = BibIndexExactNameTokenizer()
     return author_tokenizer.tokenize(phrase)
 
 def get_author_family_name_words_from_phrase(phrase, stemming_language=None):
     """
     Return list of words from author family names, not his/her first
     names.  The phrase is assumed to be the full author name.  This is
     useful for CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES.
     """
     d_family_names = {}
     # first, treat everything before first comma as surname:
     if ',' in phrase:
         d_family_names[phrase.split(',', 1)[0]] = 1
     # second, try fuzzy author tokenizer to find surname variants:
     for name in get_fuzzy_authors_from_phrase(phrase, stemming_language):
         if ',' in name:
             d_family_names[name.split(',', 1)[0]] = 1
     # now extract words from these surnames:
     d_family_names_words = {}
     for family_name in d_family_names.keys():
         for word in get_words_from_phrase(family_name, stemming_language):
             d_family_names_words[word] = 1
     return d_family_names_words.keys()
 
 def apply_stemming_and_stopwords_and_length_check(word, stemming_language):
     """Return WORD after applying stemming and stopword and length checks.
        See the config file in order to influence these.
     """
     # now check against stopwords:
     if is_stopword(word):
         return ""
     # finally check the word length:
     if len(word) < CFG_BIBINDEX_MIN_WORD_LENGTH:
         return ""
     # stem word, when configured so:
     if stemming_language:
         word = stem(word, stemming_language)
     return word
 
 def remove_subfields(s):
     "Removes subfields from string, e.g. 'foo $$c bar' becomes 'foo bar'."
     return re_subfields.sub(' ', s)
 
 def get_index_id_from_index_name(index_name):
     """Returns the words/phrase index id for INDEXNAME.
        Returns empty string in case there is no words table for this index.
        Example: field='author', output=4."""
     out = 0
     query = """SELECT w.id FROM idxINDEX AS w
                 WHERE w.name=%s LIMIT 1"""
     res = run_sql(query, (index_name, ), 1)
     if res:
         out = res[0][0]
     return out
 
 def get_index_name_from_index_id(index_id):
     """Returns the words/phrase index name for INDEXID.
        Returns '' in case there is no words table for this indexid.
        Example: field=9, output='fulltext'."""
     res = run_sql("SELECT name FROM idxINDEX WHERE id=%s", (index_id, ))
     if res:
         return res[0][0]
     return ''
 
 def get_index_tags(indexname):
     """Returns the list of tags that are indexed inside INDEXNAME.
        Returns empty list in case there are no tags indexed in this index.
        Note: uses get_field_tags() defined before.
        Example: field='author', output=['100__%', '700__%']."""
     out = []
     query = """SELECT f.code FROM idxINDEX AS w, idxINDEX_field AS wf,
     field AS f WHERE w.name=%s AND w.id=wf.id_idxINDEX
     AND f.id=wf.id_field"""
     res = run_sql(query, (indexname, ))
     for row in res:
         out.extend(get_field_tags(row[0]))
     return out
 
 def get_all_indexes():
     """Returns the list of the names of all defined words indexes.
        Returns empty list in case there are no tags indexed in this index.
        Example: output=['global', 'author']."""
     out = []
     query = """SELECT name FROM idxINDEX"""
     res = run_sql(query)
     for row in res:
         out.append(row[0])
     return out
 
 def split_ranges(parse_string):
     """Parse a string a return the list or ranges."""
     recIDs = []
     ranges = parse_string.split(",")
     for arange in ranges:
         tmp_recIDs = arange.split("-")
 
         if len(tmp_recIDs)==1:
             recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[0])])
         else:
             if int(tmp_recIDs[0]) > int(tmp_recIDs[1]): # sanity check
                 tmp = tmp_recIDs[0]
                 tmp_recIDs[0] = tmp_recIDs[1]
                 tmp_recIDs[1] = tmp
             recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[1])])
     return recIDs
 
 def get_word_tables(tables):
     """ Given a list of table names it return a list of tuples
     (index_id, index_name, index_tags).
     If tables is empty it returns the whole list."""
     wordTables = []
     if tables:
         indexes = tables.split(",")
         for index in indexes:
             index_id = get_index_id_from_index_name(index)
             if index_id:
                 wordTables.append((index_id, index, get_index_tags(index)))
             else:
                 write_message("Error: There is no %s words table." % index, sys.stderr)
     else:
         for index in get_all_indexes():
             index_id = get_index_id_from_index_name(index)
             wordTables.append((index_id, index, get_index_tags(index)))
     return wordTables
 
 def get_date_range(var):
     "Returns the two dates contained as a low,high tuple"
     limits = var.split(",")
     if len(limits)==1:
         low = get_datetime(limits[0])
         return low, None
     if len(limits)==2:
         low = get_datetime(limits[0])
         high = get_datetime(limits[1])
         return low, high
     return None, None
 
 def create_range_list(res):
     """Creates a range list from a recID select query result contained
     in res. The result is expected to have ascending numerical order."""
     if not res:
         return []
     row = res[0]
     if not row:
         return []
     else:
         range_list = [[row, row]]
     for row in res[1:]:
         row_id = row
         if row_id == range_list[-1][1] + 1:
             range_list[-1][1] = row_id
         else:
             range_list.append([row_id, row_id])
     return range_list
 
 def beautify_range_list(range_list):
     """Returns a non overlapping, maximal range list"""
     ret_list = []
     for new in range_list:
         found = 0
         for old in ret_list:
             if new[0] <= old[0] <= new[1] + 1 or new[0] - 1 <= old[1] <= new[1]:
                 old[0] = min(old[0], new[0])
                 old[1] = max(old[1], new[1])
                 found = 1
                 break
 
         if not found:
             ret_list.append(new)
 
     return ret_list
 
 def truncate_index_table(index_name):
     """Properly truncate the given index."""
     index_id = get_index_id_from_index_name(index_name)
     if index_id:
         write_message('Truncating %s index table in order to reindex.' % index_name, verbose=2)
         run_sql("UPDATE idxINDEX SET last_updated='0000-00-00 00:00:00' WHERE id=%s", (index_id,))
         run_sql("TRUNCATE idxWORD%02dF" % index_id)
         run_sql("TRUNCATE idxWORD%02dR" % index_id)
         run_sql("TRUNCATE idxPHRASE%02dF" % index_id)
         run_sql("TRUNCATE idxPHRASE%02dR" % index_id)
 
 def update_index_last_updated(index_id, starting_time=None):
     """Update last_updated column of the index table in the database.
     Puts starting time there so that if the task was interrupted for record download,
     the records will be reindexed next time."""
     if starting_time is None:
         return None
     write_message("updating last_updated to %s..." % starting_time, verbose=9)
     return run_sql("UPDATE idxINDEX SET last_updated=%s WHERE id=%s",
                     (starting_time, index_id,))
 
 #def update_text_extraction_date(first_recid, last_recid):
     #"""for all the bibdoc connected to the specified recid, set
     #the text_extraction_date to the task_starting_time."""
     #run_sql("UPDATE bibdoc JOIN bibrec_bibdoc ON id=id_bibdoc SET text_extraction_date=%s WHERE id_bibrec BETWEEN %s AND %s", (task_get_task_param('task_starting_time'), first_recid, last_recid))
 
 class WordTable:
     "A class to hold the words table."
 
     def __init__(self, index_id, fields_to_index, table_name_pattern, default_get_words_fnc, tag_to_words_fnc_map, wash_index_terms=50, is_fulltext_index=False):
         """Creates words table instance.
         @param index_id: the index integer identificator
         @param fields_to_index: a list of fields to index
         @param table_name_pattern: i.e. idxWORD%02dF or idxPHRASE%02dF
         @parm default_get_words_fnc: the default function called to extract words from a metadata
         @param tag_to_words_fnc_map: a mapping to specify particular function to
             extract words from particular metdata (such as 8564_u)
         @param wash_index_terms: do we wash index terms, and if yes (when >0),
             how many characters do we keep in the index terms; see
             max_char_length parameter of wash_index_term()
         """
         self.index_id = index_id
         self.tablename = table_name_pattern % index_id
         self.recIDs_in_mem = []
         self.fields_to_index = fields_to_index
         self.value = {}
         self.stemming_language = get_index_stemming_language(index_id)
         self.is_fulltext_index = is_fulltext_index
         self.wash_index_terms = wash_index_terms
 
         # tagToFunctions mapping. It offers an indirection level necessary for
         # indexing fulltext. The default is get_words_from_phrase
         self.tag_to_words_fnc_map = tag_to_words_fnc_map
         self.default_get_words_fnc = default_get_words_fnc
 
         if self.stemming_language and self.tablename.startswith('idxWORD'):
             write_message('%s has stemming enabled, language %s' % (self.tablename, self.stemming_language))
 
     def get_field(self, recID, tag):
         """Returns list of values of the MARC-21 'tag' fields for the
            record 'recID'."""
 
         out = []
         bibXXx = "bib" + tag[0] + tag[1] + "x"
         bibrec_bibXXx = "bibrec_" + bibXXx
         query = """SELECT value FROM %s AS b, %s AS bb
                 WHERE bb.id_bibrec=%%s AND bb.id_bibxxx=b.id
                 AND tag LIKE %%s""" % (bibXXx, bibrec_bibXXx)
         res = run_sql(query, (recID, tag))
         for row in res:
             out.append(row[0])
         return out
 
     def clean(self):
         "Cleans the words table."
         self.value = {}
 
     def put_into_db(self, mode="normal"):
         """Updates the current words table in the corresponding DB
            idxFOO table.  Mode 'normal' means normal execution,
            mode 'emergency' means words index reverting to old state.
            """
         write_message("%s %s wordtable flush started" % (self.tablename, mode))
         write_message('...updating %d words into %s started' % \
                 (len(self.value), self.tablename))
         task_update_progress("%s flushed %d/%d words" % (self.tablename, 0, len(self.value)))
 
         self.recIDs_in_mem = beautify_range_list(self.recIDs_in_mem)
 
         if mode == "normal":
             for group in self.recIDs_in_mem:
                 query = """UPDATE %sR SET type='TEMPORARY' WHERE id_bibrec
                 BETWEEN %%s AND %%s AND type='CURRENT'""" % self.tablename[:-1]
                 write_message(query % (group[0], group[1]), verbose=9)
                 run_sql(query, (group[0], group[1]))
 
         nb_words_total = len(self.value)
         nb_words_report = int(nb_words_total/10.0)
         nb_words_done = 0
         for word in self.value.keys():
             self.put_word_into_db(word)
             nb_words_done += 1
             if nb_words_report != 0 and ((nb_words_done % nb_words_report) == 0):
                 write_message('......processed %d/%d words' % (nb_words_done, nb_words_total))
                 task_update_progress("%s flushed %d/%d words" % (self.tablename, nb_words_done, nb_words_total))
         write_message('...updating %d words into %s ended' % \
                       (nb_words_total, self.tablename))
 
         write_message('...updating reverse table %sR started' % self.tablename[:-1])
         if mode == "normal":
             for group in self.recIDs_in_mem:
                 query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec
                 BETWEEN %%s AND %%s AND type='FUTURE'""" % self.tablename[:-1]
                 write_message(query % (group[0], group[1]), verbose=9)
                 run_sql(query, (group[0], group[1]))
                 query = """DELETE FROM %sR WHERE id_bibrec
                 BETWEEN %%s AND %%s AND type='TEMPORARY'""" % self.tablename[:-1]
                 write_message(query % (group[0], group[1]), verbose=9)
                 run_sql(query, (group[0], group[1]))
                 #if self.is_fulltext_index:
                     #update_text_extraction_date(group[0], group[1])
             write_message('End of updating wordTable into %s' % self.tablename, verbose=9)
         elif mode == "emergency":
             for group in self.recIDs_in_mem:
                 query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec
                 BETWEEN %%s AND %%s AND type='TEMPORARY'""" % self.tablename[:-1]
                 write_message(query % (group[0], group[1]), verbose=9)
                 run_sql(query, (group[0], group[1]))
                 query = """DELETE FROM %sR WHERE id_bibrec
                 BETWEEN %%s AND %%s AND type='FUTURE'""" % self.tablename[:-1]
                 write_message(query % (group[0], group[1]), verbose=9)
                 run_sql(query, (group[0], group[1]))
             write_message('End of emergency flushing wordTable into %s' % self.tablename, verbose=9)
         write_message('...updating reverse table %sR ended' % self.tablename[:-1])
 
         self.clean()
         self.recIDs_in_mem = []
         write_message("%s %s wordtable flush ended" % (self.tablename, mode))
         task_update_progress("%s flush ended" % (self.tablename))
 
     def load_old_recIDs(self, word):
         """Load existing hitlist for the word from the database index files."""
         query = "SELECT hitlist FROM %s WHERE term=%%s" % self.tablename
         res = run_sql(query, (word,))
         if res:
             return intbitset(res[0][0])
         else:
             return None
 
     def merge_with_old_recIDs(self, word, set):
         """Merge the system numbers stored in memory (hash of recIDs with value +1 or -1
         according to whether to add/delete them) with those stored in the database index
         and received in set universe of recIDs for the given word.
 
         Return False in case no change was done to SET, return True in case SET
         was changed.
         """
         oldset = intbitset(set)
         set.update_with_signs(self.value[word])
         return set != oldset
 
     def put_word_into_db(self, word):
         """Flush a single word to the database and delete it from memory"""
 
         set = self.load_old_recIDs(word)
         if set is not None: # merge the word recIDs found in memory:
             if not self.merge_with_old_recIDs(word,set):
                 # nothing to update:
                 write_message("......... unchanged hitlist for ``%s''" % word, verbose=9)
                 pass
             else:
                 # yes there were some new words:
                 write_message("......... updating hitlist for ``%s''" % word, verbose=9)
                 run_sql("UPDATE %s SET hitlist=%%s WHERE term=%%s" % self.tablename,
                     (set.fastdump(), word))
 
         else: # the word is new, will create new set:
             write_message("......... inserting hitlist for ``%s''" % word, verbose=9)
             set = intbitset(self.value[word].keys())
             try:
                 run_sql("INSERT INTO %s (term, hitlist) VALUES (%%s, %%s)" % self.tablename,
                         (word, set.fastdump()))
             except Exception, e:
                 ## We send this exception to the admin only when is not
                 ## already reparing the problem.
                 register_exception(prefix="Error when putting the term '%s' into db (hitlist=%s): %s\n" % (repr(word), set, e), alert_admin=(task_get_option('cmd') != 'repair'))
 
         if not set: # never store empty words
             run_sql("DELETE from %s WHERE term=%%s" % self.tablename,
                     (word,))
 
         del self.value[word]
 
     def display(self):
         "Displays the word table."
         keys = self.value.keys()
         keys.sort()
         for k in keys:
             write_message("%s: %s" % (k, self.value[k]))
 
     def count(self):
         "Returns the number of words in the table."
         return len(self.value)
 
     def info(self):
         "Prints some information on the words table."
         write_message("The words table contains %d words." % self.count())
 
     def lookup_words(self, word=""):
         "Lookup word from the words table."
 
         if not word:
             done = 0
             while not done:
                 try:
                     word = raw_input("Enter word: ")
                     done = 1
                 except (EOFError, KeyboardInterrupt):
                     return
 
         if self.value.has_key(word):
             write_message("The word '%s' is found %d times." \
                 % (word, len(self.value[word])))
         else:
             write_message("The word '%s' does not exist in the word file."\
                               % word)
 
     def add_recIDs(self, recIDs, opt_flush):
         """Fetches records which id in the recIDs range list and adds
         them to the wordTable.  The recIDs range list is of the form:
         [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]].
         """
         global chunksize, _last_word_table
         flush_count = 0
         records_done = 0
         records_to_go = 0
 
         for arange in recIDs:
             records_to_go = records_to_go + arange[1] - arange[0] + 1
 
         time_started = time.time() # will measure profile time
         for arange in recIDs:
             i_low = arange[0]
             chunksize_count = 0
             while i_low <= arange[1]:
                 # calculate chunk group of recIDs and treat it:
                 i_high = min(i_low+opt_flush-flush_count-1,arange[1])
                 i_high = min(i_low+chunksize-chunksize_count-1, i_high)
                 try:
                     self.chk_recID_range(i_low, i_high)
                 except StandardError, e:
                     write_message("Exception caught: %s" % e, sys.stderr)
                     register_exception(alert_admin=True)
                     task_update_status("ERROR")
                     self.put_into_db()
                     sys.exit(1)
                 write_message("%s adding records #%d-#%d started" % \
                         (self.tablename, i_low, i_high))
                 if CFG_CHECK_MYSQL_THREADS:
                     kill_sleepy_mysql_threads()
                 task_update_progress("%s adding recs %d-%d" % (self.tablename, i_low, i_high))
                 self.del_recID_range(i_low, i_high)
                 just_processed = self.add_recID_range(i_low, i_high)
                 flush_count = flush_count + i_high - i_low + 1
                 chunksize_count = chunksize_count + i_high - i_low + 1
                 records_done = records_done + just_processed
                 write_message("%s adding records #%d-#%d ended  " % \
                         (self.tablename, i_low, i_high))
 
                 if chunksize_count >= chunksize:
                     chunksize_count = 0
                 # flush if necessary:
                 if flush_count >= opt_flush:
                     self.put_into_db()
                     self.clean()
                     write_message("%s backing up" % (self.tablename))
                     flush_count = 0
                     self.log_progress(time_started,records_done,records_to_go)
                 # iterate:
                 i_low = i_high + 1
         if flush_count > 0:
             self.put_into_db()
             self.log_progress(time_started,records_done,records_to_go)
 
     def add_recIDs_by_date(self, dates, opt_flush):
         """Add records that were modified between DATES[0] and DATES[1].
            If DATES is not set, then add records that were modified since
            the last update of the index.
         """
         if not dates:
             table_id = self.tablename[-3:-1]
             query = """SELECT last_updated FROM idxINDEX WHERE id=%s"""
             res = run_sql(query, (table_id, ))
             if not res:
                 return
             if not res[0][0]:
                 dates = ("0000-00-00", None)
             else:
                 dates = (res[0][0], None)
         if dates[1] is None:
             res = intbitset(run_sql("""SELECT b.id FROM bibrec AS b
                               WHERE b.modification_date >= %s""",
                           (dates[0],)))
             if self.is_fulltext_index:
                 res |= intbitset(run_sql("""SELECT id_bibrec FROM bibrec_bibdoc JOIN bibdoc ON id_bibdoc=id WHERE text_extraction_date <= modification_date AND modification_date >= %s AND status<>'DELETED'""", (dates[0], )))
         elif dates[0] is None:
             res = intbitset(run_sql("""SELECT b.id FROM bibrec AS b
                               WHERE b.modification_date <= %s""",
                           (dates[1],)))
             if self.is_fulltext_index:
                 res |= intbitset(run_sql("""SELECT id_bibrec FROM bibrec_bibdoc JOIN bibdoc ON id_bibdoc=id WHERE text_extraction_date <= modification_date AND modification_date <= %s AND status<>'DELETED'""", (dates[1], )))
         else:
             res = intbitset(run_sql("""SELECT b.id FROM bibrec AS b
                               WHERE b.modification_date >= %s AND
                                     b.modification_date <= %s""",
                           (dates[0], dates[1])))
             if self.is_fulltext_index:
                 res |= intbitset(run_sql("""SELECT id_bibrec FROM bibrec_bibdoc JOIN bibdoc ON id_bibdoc=id WHERE text_extraction_date <= modification_date AND modification_date >= %s AND modification_date <= %s AND status<>'DELETED'""", (dates[0], dates[1], )))
         alist = create_range_list(list(res))
         if not alist:
             write_message( "No new records added. %s is up to date" % self.tablename)
         else:
             self.add_recIDs(alist, opt_flush)
 
     def add_recID_range(self, recID1, recID2):
         """Add records from RECID1 to RECID2."""
         wlist = {}
         self.recIDs_in_mem.append([recID1,recID2])
         # secondly fetch all needed tags:
         if self.fields_to_index == [CFG_JOURNAL_TAG]:
             # FIXME: quick hack for the journal index; a special
             # treatment where we need to associate more than one
             # subfield into indexed term
             for recID in range(recID1, recID2 + 1):
                 new_words = get_words_from_journal_tag(recID, self.fields_to_index[0])
                 if not wlist.has_key(recID):
                     wlist[recID] = []
                 wlist[recID] = list_union(new_words, wlist[recID])
         else:
             # usual tag-by-tag indexing:
             for tag in self.fields_to_index:
                 get_words_function = self.tag_to_words_fnc_map.get(tag, self.default_get_words_fnc)
                 bibXXx = "bib" + tag[0] + tag[1] + "x"
                 bibrec_bibXXx = "bibrec_" + bibXXx
                 query = """SELECT bb.id_bibrec,b.value FROM %s AS b, %s AS bb
                         WHERE bb.id_bibrec BETWEEN %%s AND %%s
                         AND bb.id_bibxxx=b.id AND tag LIKE %%s""" % (bibXXx, bibrec_bibXXx)
                 res = run_sql(query, (recID1, recID2, tag))
                 if tag == '8564_u':
                     ## FIXME: Quick hack to be sure that hidden files are
                     ## actually indexed.
                     res = set(res)
                     for recid in xrange(int(recID1), int(recID2) + 1):
                         for bibdocfile in BibRecDocs(recid).list_latest_files():
                             res.add((recid, bibdocfile.get_url()))
                 for row in res:
                     recID,phrase = row
                     if not wlist.has_key(recID):
                         wlist[recID] = []
                     new_words = get_words_function(phrase, stemming_language=self.stemming_language) # ,self.separators
                     wlist[recID] = list_union(new_words, wlist[recID])
 
         # were there some words for these recIDs found?
         if len(wlist) == 0: return 0
         recIDs = wlist.keys()
         for recID in recIDs:
             # was this record marked as deleted?
             if "DELETED" in self.get_field(recID, "980__c"):
                 wlist[recID] = []
                 write_message("... record %d was declared deleted, removing its word list" % recID, verbose=9)
             write_message("... record %d, termlist: %s" % (recID, wlist[recID]), verbose=9)
 
         # put words into reverse index table with FUTURE status:
         for recID in recIDs:
             run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'FUTURE')" % self.tablename[:-1],
                     (recID, serialize_via_marshal(wlist[recID])))
             # ... and, for new records, enter the CURRENT status as empty:
             try:
                 run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'CURRENT')" % self.tablename[:-1],
                         (recID, serialize_via_marshal([])))
             except DatabaseError:
                 # okay, it's an already existing record, no problem
                 pass
 
         # put words into memory word list:
         put = self.put
         for recID in recIDs:
             for w in wlist[recID]:
                 put(recID, w, 1)
 
         return len(recIDs)
 
     def log_progress(self, start, done, todo):
         """Calculate progress and store it.
         start: start time,
         done: records processed,
         todo: total number of records"""
         time_elapsed = time.time() - start
         # consistency check
         if time_elapsed == 0 or done > todo:
             return
 
         time_recs_per_min = done/(time_elapsed/60.0)
         write_message("%d records took %.1f seconds to complete.(%1.f recs/min)"\
                 % (done, time_elapsed, time_recs_per_min))
 
         if time_recs_per_min:
             write_message("Estimated runtime: %.1f minutes" % \
                     ((todo-done)/time_recs_per_min))
 
     def put(self, recID, word, sign):
         """Adds/deletes a word to the word list."""
         try:
             if self.wash_index_terms:
                 word = wash_index_term(word, self.wash_index_terms)
             if self.value.has_key(word):
                 # the word 'word' exist already: update sign
                 self.value[word][recID] = sign
             else:
                 self.value[word] = {recID: sign}
         except:
             write_message("Error: Cannot put word %s with sign %d for recID %s." % (word, sign, recID))
 
     def del_recIDs(self, recIDs):
         """Fetches records which id in the recIDs range list and adds
         them to the wordTable.  The recIDs range list is of the form:
         [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]].
         """
         count = 0
         for arange in recIDs:
             self.del_recID_range(arange[0],arange[1])
             count = count + arange[1] - arange[0]
         self.put_into_db()
 
     def del_recID_range(self, low, high):
         """Deletes records with 'recID' system number between low
            and high from memory words index table."""
         write_message("%s fetching existing words for records #%d-#%d started" % \
                 (self.tablename, low, high), verbose=3)
         self.recIDs_in_mem.append([low,high])
         query = """SELECT id_bibrec,termlist FROM %sR as bb WHERE bb.id_bibrec
         BETWEEN %%s AND %%s""" % (self.tablename[:-1])
         recID_rows = run_sql(query, (low, high))
         for recID_row in recID_rows:
             recID = recID_row[0]
             wlist = deserialize_via_marshal(recID_row[1])
             for word in wlist:
                 self.put(recID, word, -1)
         write_message("%s fetching existing words for records #%d-#%d ended" % \
                 (self.tablename, low, high), verbose=3)
 
     def report_on_table_consistency(self):
         """Check reverse words index tables (e.g. idxWORD01R) for
         interesting states such as 'TEMPORARY' state.
         Prints small report (no of words, no of bad words).
         """
         # find number of words:
         query = """SELECT COUNT(*) FROM %s""" % (self.tablename)
         res = run_sql(query, None, 1)
         if res:
             nb_words = res[0][0]
         else:
             nb_words = 0
 
         # find number of records:
         query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1])
         res = run_sql(query, None, 1)
         if res:
             nb_records = res[0][0]
         else:
             nb_records = 0
 
         # report stats:
         write_message("%s contains %d words from %d records" % (self.tablename, nb_words, nb_records))
 
         # find possible bad states in reverse tables:
         query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1])
         res = run_sql(query)
         if res:
             nb_bad_records = res[0][0]
         else:
             nb_bad_records = 999999999
         if nb_bad_records:
             write_message("EMERGENCY: %s needs to repair %d of %d index records" % \
                 (self.tablename, nb_bad_records, nb_records))
         else:
             write_message("%s is in consistent state" % (self.tablename))
 
         return nb_bad_records
 
     def repair(self, opt_flush):
         """Repair the whole table"""
         # find possible bad states in reverse tables:
         query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1])
         res = run_sql(query, None, 1)
         if res:
             nb_bad_records = res[0][0]
         else:
             nb_bad_records = 0
 
         if nb_bad_records == 0:
             return
 
         query = """SELECT id_bibrec FROM %sR WHERE type <> 'CURRENT'""" \
                 % (self.tablename[:-1])
         res = intbitset(run_sql(query))
         recIDs = create_range_list(list(res))
 
         flush_count = 0
         records_done = 0
         records_to_go = 0
 
         for arange in recIDs:
             records_to_go = records_to_go + arange[1] - arange[0] + 1
 
         time_started = time.time() # will measure profile time
         for arange in recIDs:
             i_low = arange[0]
             chunksize_count = 0
             while i_low <= arange[1]:
                 # calculate chunk group of recIDs and treat it:
                 i_high = min(i_low+opt_flush-flush_count-1,arange[1])
                 i_high = min(i_low+chunksize-chunksize_count-1, i_high)
 
                 try:
                     self.fix_recID_range(i_low, i_high)
                 except StandardError, e:
                     write_message("Exception caught: %s" % e, sys.stderr)
                     register_exception(alert_admin=True)
                     task_update_status("ERROR")
                     self.put_into_db()
                     sys.exit(1)
 
                 flush_count = flush_count + i_high - i_low + 1
                 chunksize_count = chunksize_count + i_high - i_low + 1
                 records_done = records_done + i_high - i_low + 1
                 if chunksize_count >= chunksize:
                     chunksize_count = 0
                 # flush if necessary:
                 if flush_count >= opt_flush:
                     self.put_into_db("emergency")
                     self.clean()
                     flush_count = 0
                     self.log_progress(time_started,records_done,records_to_go)
                 # iterate:
                 i_low = i_high + 1
         if flush_count > 0:
             self.put_into_db("emergency")
             self.log_progress(time_started,records_done,records_to_go)
         write_message("%s inconsistencies repaired." % self.tablename)
 
     def chk_recID_range(self, low, high):
         """Check if the reverse index table is in proper state"""
         ## check db
         query = """SELECT COUNT(*) FROM %sR WHERE type <> 'CURRENT'
         AND id_bibrec BETWEEN %%s AND %%s""" % self.tablename[:-1]
         res = run_sql(query, (low, high), 1)
         if res[0][0]==0:
             write_message("%s for %d-%d is in consistent state" % (self.tablename,low,high))
             return # okay, words table is consistent
 
         ## inconsistency detected!
         write_message("EMERGENCY: %s inconsistencies detected..." % self.tablename)
         error_message = "Errors found. You should check consistency of the " \
                 "%s - %sR tables.\nRunning 'bibindex --repair' is " \
                 "recommended." % (self.tablename, self.tablename[:-1])
         write_message("EMERGENCY: " + error_message, stream=sys.stderr)
         raise StandardError, error_message
 
     def fix_recID_range(self, low, high):
         """Try to fix reverse index database consistency (e.g. table idxWORD01R) in the low,high doc-id range.
 
         Possible states for a recID follow:
         CUR TMP FUT: very bad things have happened: warn!
         CUR TMP    : very bad things have happened: warn!
         CUR     FUT: delete FUT (crash before flushing)
         CUR        : database is ok
             TMP FUT: add TMP to memory and del FUT from memory
                      flush (revert to old state)
             TMP    : very bad things have happened: warn!
                 FUT: very bad things have happended: warn!
         """
 
         state = {}
         query = "SELECT id_bibrec,type FROM %sR WHERE id_bibrec BETWEEN %%s AND %%s"\
                 % self.tablename[:-1]
         res = run_sql(query, (low, high))
         for row in res:
             if not state.has_key(row[0]):
                 state[row[0]]=[]
             state[row[0]].append(row[1])
 
         ok = 1 # will hold info on whether we will be able to repair
         for recID in state.keys():
             if not 'TEMPORARY' in state[recID]:
                 if 'FUTURE' in state[recID]:
                     if 'CURRENT' not in state[recID]:
                         write_message("EMERGENCY: Index record %d is in inconsistent state. Can't repair it." % recID)
                         ok = 0
                     else:
                         write_message("EMERGENCY: Inconsistency in index record %d detected" % recID)
                         query = """DELETE FROM %sR
                         WHERE id_bibrec=%%s""" % self.tablename[:-1]
                         run_sql(query, (recID, ))
                         write_message("EMERGENCY: Inconsistency in record %d repaired." % recID)
 
             else:
                 if 'FUTURE' in state[recID] and not 'CURRENT' in state[recID]:
                     self.recIDs_in_mem.append([recID,recID])
 
                     # Get the words file
                     query = """SELECT type,termlist FROM %sR
                     WHERE id_bibrec=%%s""" % self.tablename[:-1]
                     write_message(query, verbose=9)
                     res = run_sql(query, (recID, ))
                     for row in res:
                         wlist = deserialize_via_marshal(row[1])
                         write_message("Words are %s " % wlist, verbose=9)
                         if row[0] == 'TEMPORARY':
                             sign = 1
                         else:
                             sign = -1
                         for word in wlist:
                             self.put(recID, word, sign)
 
                 else:
                     write_message("EMERGENCY: %s for %d is in inconsistent "
                             "state. Couldn't repair it." % (self.tablename,
                                 recID), stream=sys.stderr)
                     ok = 0
 
         if not ok:
             error_message = "Unrepairable errors found. You should check " \
                     "consistency of the %s - %sR tables. Deleting affected " \
                     "TEMPORARY and FUTURE entries from these tables is " \
                     "recommended; see the BibIndex Admin Guide." % \
                     (self.tablename, self.tablename[:-1])
             write_message("EMERGENCY: " + error_message, stream=sys.stderr)
             raise StandardError, error_message
 
 def main():
     """Main that construct all the bibtask."""
     task_init(authorization_action='runbibindex',
             authorization_msg="BibIndex Task Submission",
             description="""Examples:
 \t%s -a -i 234-250,293,300-500 -u admin@localhost
 \t%s -a -w author,fulltext -M 8192 -v3
             \t%s -d -m +4d -A on --flush=10000\n""" % ((sys.argv[0],) * 3), help_specific_usage=""" Indexing options:
   -a, --add\t\tadd or update words for selected records
   -d, --del\t\tdelete words for selected records
   -i, --id=low[-high]\t\tselect according to doc recID
   -m, --modified=from[,to]\tselect according to modification date
   -c, --collection=c1[,c2]\tselect according to collection
   -R, --reindex\treindex the selected indexes from scratch
 
  Repairing options:
   -k, --check\t\tcheck consistency for all records in the table(s)
   -r, --repair\t\ttry to repair all records in the table(s)
 
  Specific options:
   -w, --windex=w1[,w2]\tword/phrase indexes to consider (all)
   -M, --maxmem=XXX\tmaximum memory usage in kB (no limit)
   -f, --flush=NNN\t\tfull consistent table flush after NNN records (10000)
 """,
             version=__revision__,
             specific_params=("adi:m:c:w:krRM:f:", [
                 "add",
                 "del",
                 "id=",
                 "modified=",
                 "collection=",
                 "windex=",
                 "check",
                 "repair",
                 "reindex",
                 "maxmem=",
                 "flush=",
             ]),
             task_stop_helper_fnc=task_stop_table_close_fnc,
             task_submit_elaborate_specific_parameter_fnc=task_submit_elaborate_specific_parameter,
             task_run_fnc=task_run_core,
             task_submit_check_options_fnc=task_submit_check_options)
 
 def task_submit_check_options():
     """Check for options compatibility."""
     if task_get_option("reindex"):
         if task_get_option("cmd") != "add" or task_get_option('id') or task_get_option('collection'):
             print >> sys.stderr, "ERROR: You can use --reindex only when adding modified record."
             return False
     return True
 
 def task_submit_elaborate_specific_parameter(key, value, opts, args):
     """ Given the string key it checks it's meaning, eventually using the
     value. Usually it fills some key in the options dict.
     It must return True if it has elaborated the key, False, if it doesn't
     know that key.
     eg:
     if key in ['-n', '--number']:
         self.options['number'] = value
         return True
     return False
     """
     if key in ("-a", "--add"):
         task_set_option("cmd", "add")
         if ("-x","") in opts or ("--del","") in opts:
             raise StandardError, "Can not have --add and --del at the same time!"
     elif key in ("-k", "--check"):
         task_set_option("cmd", "check")
     elif key in ("-r", "--repair"):
         task_set_option("cmd", "repair")
     elif key in ("-d", "--del"):
         task_set_option("cmd", "del")
     elif key in ("-i", "--id"):
         task_set_option('id', task_get_option('id') + split_ranges(value))
     elif key in ("-m", "--modified"):
         task_set_option("modified", get_date_range(value))
     elif key in ("-c", "--collection"):
         task_set_option("collection", value)
     elif key in ("-R", "--reindex"):
         task_set_option("reindex", True)
     elif key in ("-w", "--windex"):
         task_set_option("windex", value)
     elif key in ("-M", "--maxmem"):
         task_set_option("maxmem", int(value))
         if task_get_option("maxmem") < base_process_size + 1000:
             raise StandardError, "Memory usage should be higher than %d kB" % \
                 (base_process_size + 1000)
     elif key in ("-f", "--flush"):
         task_set_option("flush", int(value))
     else:
         return False
     return True
 
 def task_stop_table_close_fnc():
     """ Close tables to STOP. """
     global _last_word_table
     if _last_word_table:
         _last_word_table.put_into_db()
 
 def task_run_core():
     """Runs the task by fetching arguments from the BibSched task queue.  This is
     what BibSched will be invoking via daemon call.
     The task prints Fibonacci numbers for up to NUM on the stdout, and some
     messages on stderr.
     Return 1 in case of success and 0 in case of failure."""
     global _last_word_table
 
     if task_get_option("cmd") == "check":
         wordTables = get_word_tables(task_get_option("windex"))
         for index_id, index_name, index_tags in wordTables:
             if index_name == 'year' and CFG_INSPIRE_SITE:
                 fnc_get_words_from_phrase = get_words_from_date_tag
             elif index_name in ('author', 'firstauthor') and \
                      CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES:
                 fnc_get_words_from_phrase = get_author_family_name_words_from_phrase
             else:
                 fnc_get_words_from_phrase = get_words_from_phrase
             wordTable = WordTable(index_id=index_id,
                                   fields_to_index=index_tags,
                                   table_name_pattern='idxWORD%02dF',
                                   default_get_words_fnc=fnc_get_words_from_phrase,
                                   tag_to_words_fnc_map={'8564_u': get_words_from_fulltext},
                                   wash_index_terms=50)
             _last_word_table = wordTable
             wordTable.report_on_table_consistency()
             task_sleep_now_if_required(can_stop_too=True)
 
             if index_name in ('author', 'firstauthor') and \
                    CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES:
                 fnc_get_pairs_from_phrase = get_pairs_from_phrase # FIXME
             else:
                 fnc_get_pairs_from_phrase = get_pairs_from_phrase
             wordTable = WordTable(index_id=index_id,
                                   fields_to_index=index_tags,
                                   table_name_pattern='idxPAIR%02dF',
                                   default_get_words_fnc=fnc_get_pairs_from_phrase,
                                   tag_to_words_fnc_map={'8564_u': get_nothing_from_phrase},
                                   wash_index_terms=100)
             _last_word_table = wordTable
             wordTable.report_on_table_consistency()
             task_sleep_now_if_required(can_stop_too=True)
 
             if index_name in ('author', 'firstauthor'):
                 fnc_get_phrases_from_phrase = get_fuzzy_authors_from_phrase
             elif index_name == 'exactauthor':
                 fnc_get_phrases_from_phrase = get_exact_authors_from_phrase
             else:
                 fnc_get_phrases_from_phrase = get_phrases_from_phrase
             wordTable = WordTable(index_id=index_id,
                                   fields_to_index=index_tags,
                                   table_name_pattern='idxPHRASE%02dF',
                                   default_get_words_fnc=fnc_get_phrases_from_phrase,
                                   tag_to_words_fnc_map={'8564_u': get_nothing_from_phrase},
                                   wash_index_terms=0)
             _last_word_table = wordTable
             wordTable.report_on_table_consistency()
             task_sleep_now_if_required(can_stop_too=True)
         _last_word_table = None
         return True
 
     # Let's work on single words!
     wordTables = get_word_tables(task_get_option("windex"))
     for index_id, index_name, index_tags in wordTables:
         is_fulltext_index = index_name == 'fulltext'
         reindex_prefix = ""
         if task_get_option("reindex"):
             reindex_prefix = "tmp_"
             init_temporary_reindex_tables(index_id, reindex_prefix)
         if index_name == 'year' and CFG_INSPIRE_SITE:
             fnc_get_words_from_phrase = get_words_from_date_tag
         elif index_name in ('author', 'firstauthor') and \
                  CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES:
             fnc_get_words_from_phrase = get_author_family_name_words_from_phrase
         else:
             fnc_get_words_from_phrase = get_words_from_phrase
         wordTable = WordTable(index_id=index_id,
                               fields_to_index=index_tags,
                               table_name_pattern=reindex_prefix + 'idxWORD%02dF',
                               default_get_words_fnc=fnc_get_words_from_phrase,
                               tag_to_words_fnc_map={'8564_u': get_words_from_fulltext},
                               is_fulltext_index=is_fulltext_index,
                               wash_index_terms=50)
         _last_word_table = wordTable
         wordTable.report_on_table_consistency()
         try:
             if task_get_option("cmd") == "del":
                 if task_get_option("id"):
                     wordTable.del_recIDs(task_get_option("id"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID,recID])
                     wordTable.del_recIDs(recIDs_range)
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     error_message = "Missing IDs of records to delete from " \
                             "index %s." % wordTable.tablename
                     write_message(error_message, stream=sys.stderr)
                     raise StandardError, error_message
             elif task_get_option("cmd") == "add":
                 if task_get_option("id"):
                     wordTable.add_recIDs(task_get_option("id"), task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID,recID])
                     wordTable.add_recIDs(recIDs_range, task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     wordTable.add_recIDs_by_date(task_get_option("modified"), task_get_option("flush"))
                     ## here we used to update last_updated info, if run via automatic mode;
                     ## but do not update here anymore, since idxPHRASE will be acted upon later
                     task_sleep_now_if_required(can_stop_too=True)
             elif task_get_option("cmd") == "repair":
                 wordTable.repair(task_get_option("flush"))
                 task_sleep_now_if_required(can_stop_too=True)
             else:
                 error_message = "Invalid command found processing %s" % \
                     wordTable.tablename
                 write_message(error_message, stream=sys.stderr)
                 raise StandardError, error_message
         except StandardError, e:
             write_message("Exception caught: %s" % e, sys.stderr)
             register_exception(alert_admin=True)
             task_update_status("ERROR")
             if _last_word_table:
                 _last_word_table.put_into_db()
             sys.exit(1)
 
         wordTable.report_on_table_consistency()
         task_sleep_now_if_required(can_stop_too=True)
 
         # Let's work on pairs now
         if index_name in ('author', 'firstauthor') and \
                CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES:
             fnc_get_pairs_from_phrase = get_pairs_from_phrase # FIXME
         else:
             fnc_get_pairs_from_phrase = get_pairs_from_phrase
         wordTable = WordTable(index_id=index_id,
                               fields_to_index=index_tags,
                               table_name_pattern=reindex_prefix + 'idxPAIR%02dF',
                               default_get_words_fnc=fnc_get_pairs_from_phrase,
                               tag_to_words_fnc_map={'8564_u': get_nothing_from_phrase},
                               wash_index_terms=100)
         _last_word_table = wordTable
         wordTable.report_on_table_consistency()
         try:
             if task_get_option("cmd") == "del":
                 if task_get_option("id"):
                     wordTable.del_recIDs(task_get_option("id"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID,recID])
                     wordTable.del_recIDs(recIDs_range)
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     error_message = "Missing IDs of records to delete from " \
                             "index %s." % wordTable.tablename
                     write_message(error_message, stream=sys.stderr)
                     raise StandardError, error_message
             elif task_get_option("cmd") == "add":
                 if task_get_option("id"):
                     wordTable.add_recIDs(task_get_option("id"), task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID,recID])
                     wordTable.add_recIDs(recIDs_range, task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     wordTable.add_recIDs_by_date(task_get_option("modified"), task_get_option("flush"))
                     # let us update last_updated timestamp info, if run via automatic mode:
                     task_sleep_now_if_required(can_stop_too=True)
             elif task_get_option("cmd") == "repair":
                 wordTable.repair(task_get_option("flush"))
                 task_sleep_now_if_required(can_stop_too=True)
             else:
                 error_message = "Invalid command found processing %s" % \
                         wordTable.tablename
                 write_message(error_message, stream=sys.stderr)
                 raise StandardError, error_message
         except StandardError, e:
             write_message("Exception caught: %s" % e, sys.stderr)
             register_exception()
             task_update_status("ERROR")
             if _last_word_table:
                 _last_word_table.put_into_db()
             sys.exit(1)
 
         wordTable.report_on_table_consistency()
         task_sleep_now_if_required(can_stop_too=True)
 
         # Let's work on phrases now
         if index_name in ('author', 'firstauthor'):
             fnc_get_phrases_from_phrase = get_fuzzy_authors_from_phrase
         elif index_name == 'exactauthor':
             fnc_get_phrases_from_phrase = get_exact_authors_from_phrase
         else:
             fnc_get_phrases_from_phrase = get_phrases_from_phrase
         wordTable = WordTable(index_id=index_id,
                               fields_to_index=index_tags,
                               table_name_pattern=reindex_prefix + 'idxPHRASE%02dF',
                               default_get_words_fnc=fnc_get_phrases_from_phrase,
                               tag_to_words_fnc_map={'8564_u': get_nothing_from_phrase},
                               wash_index_terms=0)
         _last_word_table = wordTable
         wordTable.report_on_table_consistency()
         try:
             if task_get_option("cmd") == "del":
                 if task_get_option("id"):
                     wordTable.del_recIDs(task_get_option("id"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID,recID])
                     wordTable.del_recIDs(recIDs_range)
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     error_message = "Missing IDs of records to delete from " \
                             "index %s." % wordTable.tablename
                     write_message(error_message, stream=sys.stderr)
                     raise StandardError, error_message
             elif task_get_option("cmd") == "add":
                 if task_get_option("id"):
                     wordTable.add_recIDs(task_get_option("id"), task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID,recID])
                     wordTable.add_recIDs(recIDs_range, task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     wordTable.add_recIDs_by_date(task_get_option("modified"), task_get_option("flush"))
                     # let us update last_updated timestamp info, if run via automatic mode:
                     update_index_last_updated(index_id, task_get_task_param('task_starting_time'))
                     task_sleep_now_if_required(can_stop_too=True)
             elif task_get_option("cmd") == "repair":
                 wordTable.repair(task_get_option("flush"))
                 task_sleep_now_if_required(can_stop_too=True)
             else:
                 error_message = "Invalid command found processing %s" % \
                         wordTable.tablename
                 write_message(error_message, stream=sys.stderr)
                 raise StandardError, error_message
         except StandardError, e:
             write_message("Exception caught: %s" % e, sys.stderr)
             register_exception()
             task_update_status("ERROR")
             if _last_word_table:
                 _last_word_table.put_into_db()
             sys.exit(1)
 
         wordTable.report_on_table_consistency()
         task_sleep_now_if_required(can_stop_too=True)
 
         if task_get_option("reindex"):
             swap_temporary_reindex_tables(index_id, reindex_prefix)
             update_index_last_updated(index_id, task_get_task_param('task_starting_time'))
         task_sleep_now_if_required(can_stop_too=True)
 
     _last_word_table = None
     return True
 
 
-## import optional modules:
-try:
-    import psyco
-    psyco.bind(get_words_from_phrase)
-    psyco.bind(WordTable.merge_with_old_recIDs)
-except:
-    pass
-
-
 ### okay, here we go:
 if __name__ == '__main__':
     main()
diff --git a/modules/bibrank/lib/bibrank_record_sorter.py b/modules/bibrank/lib/bibrank_record_sorter.py
index a79f29c26..baa311112 100644
--- a/modules/bibrank/lib/bibrank_record_sorter.py
+++ b/modules/bibrank/lib/bibrank_record_sorter.py
@@ -1,688 +1,677 @@
 # -*- coding: utf-8 -*-
 ## Ranking of records using different parameters and methods on the fly.
 ##
 ## This file is part of Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 __revision__ = "$Id$"
 
 import string
 import time
 import math
 import re
 import ConfigParser
 import copy
 
 from invenio.config import \
      CFG_SITE_LANG, \
      CFG_ETCDIR
 from invenio.dbquery import run_sql, deserialize_via_marshal
 from invenio.errorlib import register_exception
 from invenio.webpage import adderrorbox
 from invenio.bibindex_engine_stemmer import stem
 from invenio.bibindex_engine_stopwords import is_stopword
 from invenio.bibrank_citation_searcher import get_cited_by, get_cited_by_weight
 from invenio.intbitset import intbitset
 
 
 def compare_on_val(first, second):
     return cmp(second[1], first[1])
 
 def check_term(term, col_size, term_rec, max_occ, min_occ, termlength):
     """Check if the tem is valid for use
     term - the term to check
     col_size - the number of records in database
     term_rec - the number of records which contains this term
     max_occ - max frequency of the term allowed
     min_occ - min frequence of the term allowed
     termlength - the minimum length of the terms allowed"""
 
     try:
         if is_stopword(term, 1) or (len(term) <= termlength) or ((float(term_rec) / float(col_size)) >= max_occ) or ((float(term_rec) / float(col_size)) <= min_occ):
             return ""
         if int(term):
             return ""
     except StandardError, e:
         pass
     return "true"
 
 def create_rnkmethod_cache():
     """Create cache with vital information for each rank method."""
 
     global methods
     bibrank_meths = run_sql("SELECT name from rnkMETHOD")
     methods = {}
     global voutput
     voutput = ""
 
     for (rank_method_code,) in bibrank_meths:
         try:
             file = CFG_ETCDIR + "/bibrank/" + rank_method_code + ".cfg"
             config = ConfigParser.ConfigParser()
             config.readfp(open(file))
         except StandardError, e:
             pass
 
         cfg_function = config.get("rank_method", "function")
         if config.has_section(cfg_function):
             methods[rank_method_code] = {}
             methods[rank_method_code]["function"] = cfg_function
             methods[rank_method_code]["prefix"] = config.get(cfg_function, "relevance_number_output_prologue")
             methods[rank_method_code]["postfix"] = config.get(cfg_function, "relevance_number_output_epilogue")
             methods[rank_method_code]["chars_alphanumericseparators"] = r"[1234567890\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]"
         else:
             raise Exception("Error in configuration file: %s" % (CFG_ETCDIR + "/bibrank/" + rank_method_code + ".cfg"))
 
         i8n_names = run_sql("""SELECT ln,value from rnkMETHODNAME,rnkMETHOD where id_rnkMETHOD=rnkMETHOD.id and rnkMETHOD.name=%s""", (rank_method_code,))
         for (ln, value) in i8n_names:
             methods[rank_method_code][ln] = value
 
         if config.has_option(cfg_function, "table"):
             methods[rank_method_code]["rnkWORD_table"] = config.get(cfg_function, "table")
             methods[rank_method_code]["col_size"] = run_sql("SELECT count(*) FROM %sR" % methods[rank_method_code]["rnkWORD_table"][:-1])[0][0]
 
         if config.has_option(cfg_function, "stemming") and config.get(cfg_function, "stemming"):
             try:
                 methods[rank_method_code]["stemmer"] = config.get(cfg_function, "stemming")
             except Exception,e:
                 pass
 
         if config.has_option(cfg_function, "stopword"):
             methods[rank_method_code]["stopwords"] = config.get(cfg_function, "stopword")
 
         if config.has_section("find_similar"):
             methods[rank_method_code]["max_word_occurence"] = float(config.get("find_similar", "max_word_occurence"))
             methods[rank_method_code]["min_word_occurence"] = float(config.get("find_similar", "min_word_occurence"))
             methods[rank_method_code]["min_word_length"] = int(config.get("find_similar", "min_word_length"))
             methods[rank_method_code]["min_nr_words_docs"] = int(config.get("find_similar", "min_nr_words_docs"))
             methods[rank_method_code]["max_nr_words_upper"] = int(config.get("find_similar", "max_nr_words_upper"))
             methods[rank_method_code]["max_nr_words_lower"] = int(config.get("find_similar", "max_nr_words_lower"))
             methods[rank_method_code]["default_min_relevance"] = int(config.get("find_similar", "default_min_relevance"))
 
         if config.has_section("combine_method"):
             i = 1
             methods[rank_method_code]["combine_method"] = []
             while config.has_option("combine_method", "method%s" % i):
                 methods[rank_method_code]["combine_method"].append(string.split(config.get("combine_method", "method%s" % i), ","))
                 i += 1
 
 def is_method_valid(colID, rank_method_code):
     """
     Check if RANK_METHOD_CODE method is valid for the collection given.
     If colID is None, then check for existence regardless of collection.
     """
 
     if colID is None:
         return run_sql("SELECT COUNT(*) FROM rnkMETHOD WHERE name=%s", (rank_method_code,))[0][0]
 
     enabled_colls = dict(run_sql("SELECT id_collection, score from collection_rnkMETHOD,rnkMETHOD WHERE id_rnkMETHOD=rnkMETHOD.id AND name='%s'" % rank_method_code))
 
     try:
         colID = int(colID)
     except TypeError:
         return 0
 
     if enabled_colls.has_key(colID):
         return 1
     else:
         while colID:
             colID = run_sql("SELECT id_dad FROM collection_collection WHERE id_son=%s" % colID)
             if colID and enabled_colls.has_key(colID[0][0]):
                 return 1
             elif colID:
                 colID = colID[0][0]
     return 0
 
 def get_bibrank_methods(colID, ln=CFG_SITE_LANG):
     """
     Return a list of rank methods enabled for collection colID and the
     name of them in the language defined by the ln parameter.
     """
 
     if not globals().has_key('methods'):
         create_rnkmethod_cache()
 
     avail_methods = []
     for (rank_method_code, options) in methods.iteritems():
         if options.has_key("function") and is_method_valid(colID, rank_method_code):
             if options.has_key(ln):
                 avail_methods.append((rank_method_code, options[ln]))
             elif options.has_key(CFG_SITE_LANG):
                 avail_methods.append((rank_method_code, options[CFG_SITE_LANG]))
             else:
                 avail_methods.append((rank_method_code, rank_method_code))
     return avail_methods
 
 def rank_records(rank_method_code, rank_limit_relevance, hitset_global, pattern=[], verbose=0):
     """rank_method_code, e.g. `jif' or `sbr' (word frequency vector model)
        rank_limit_relevance, e.g. `23' for `nbc' (number of citations) or `0.10' for `vec'
        hitset, search engine hits;
        pattern, search engine query or record ID (you check the type)
        verbose, verbose level
        output:
        list of records
        list of rank values
        prefix
        postfix
        verbose_output"""
 
     global voutput
     voutput = ""
     configcreated = ""
 
     starttime = time.time()
     afterfind = starttime - time.time()
     aftermap = starttime - time.time()
 
     try:
         hitset = copy.deepcopy(hitset_global) #we are receiving a global hitset
         if not globals().has_key('methods'):
             create_rnkmethod_cache()
 
         function = methods[rank_method_code]["function"]
         #we get 'citation' method correctly here
         func_object = globals().get(function)
 
         if func_object and pattern and pattern[0][0:6] == "recid:" and function == "word_similarity":
             result = find_similar(rank_method_code, pattern[0][6:], hitset, rank_limit_relevance, verbose)
         elif rank_method_code == "citation":
             #we get rank_method_code correctly here. pattern[0] is the search word - not used by find_cit
             p = ""
             if pattern and pattern[0]:
                 p = pattern[0][6:]
             result = find_citations(rank_method_code, p, hitset, verbose)
 
         elif func_object:
             result = func_object(rank_method_code, pattern, hitset, rank_limit_relevance, verbose)
         else:
             result = rank_by_method(rank_method_code, pattern, hitset, rank_limit_relevance, verbose)
     except Exception, e:
         register_exception()
         result = (None, "", adderrorbox("An error occured when trying to rank the search result "+rank_method_code, ["Unexpected error: %s<br />" % (e,)]), voutput)
 
     afterfind = time.time() - starttime
 
     if result[0] and result[1]: #split into two lists for search_engine
         results_similar_recIDs = map(lambda x: x[0], result[0])
         results_similar_relevances = map(lambda x: x[1], result[0])
         result = (results_similar_recIDs, results_similar_relevances, result[1], result[2], "%s" % configcreated + result[3])
         aftermap = time.time() - starttime;
     else:
         result = (None, None, result[1], result[2], result[3])
 
     if verbose > 0:
         voutput = voutput+"\nElapsed time after finding: "+str(afterfind)+"\nElapsed after mapping: "+str(aftermap)
 
     #add stuff from here into voutput from result
     tmp = result[4]+voutput
     result = (result[0],result[1],result[2],result[3],tmp)
 
     #dbg = string.join(map(str,methods[rank_method_code].items()))
     #result = (None, "", adderrorbox("Debug ",rank_method_code+" "+dbg),"",voutput);
     return result
 
 def combine_method(rank_method_code, pattern, hitset, rank_limit_relevance,verbose):
     """combining several methods into one based on methods/percentage in config file"""
 
     global voutput
     result = {}
     try:
         for (method, percent) in methods[rank_method_code]["combine_method"]:
             function = methods[method]["function"]
             func_object = globals().get(function)
             percent = int(percent)
 
             if func_object:
                 this_result = func_object(method, pattern, hitset, rank_limit_relevance, verbose)[0]
             else:
                 this_result = rank_by_method(method, pattern, hitset, rank_limit_relevance, verbose)[0]
 
             for i in range(0, len(this_result)):
                 (recID, value) = this_result[i]
                 if value > 0:
                     result[recID] = result.get(recID, 0) + int((float(i) / len(this_result)) * float(percent))
 
         result = result.items()
         result.sort(lambda x, y: cmp(x[1], y[1]))
         return (result, "(", ")", voutput)
     except Exception, e:
         return (None, "Warning: %s method cannot be used for ranking your query." % rank_method_code, "", voutput)
 
 def rank_by_method(rank_method_code, lwords, hitset, rank_limit_relevance,verbose):
     """Ranking of records based on predetermined values.
     input:
     rank_method_code - the code of the method, from the name field in rnkMETHOD, used to get predetermined values from
     rnkMETHODDATA
     lwords - a list of words from the query
     hitset - a list of hits for the query found by search_engine
     rank_limit_relevance - show only records with a rank value above this
     verbose - verbose value
     output:
     reclist - a list of sorted records, with unsorted added to the end: [[23,34], [344,24], [1,01]]
     prefix - what to show before the rank value
     postfix - what to show after the rank value
     voutput - contains extra information, content dependent on verbose value"""
 
     global voutput
     rnkdict = run_sql("SELECT relevance_data FROM rnkMETHODDATA,rnkMETHOD where rnkMETHOD.id=id_rnkMETHOD and rnkMETHOD.name='%s'" % rank_method_code)
 
     if not rnkdict:
         return (None, "Warning: Could not load ranking data for method %s." % rank_method_code, "", voutput)
 
     max_recid = 0
     res = run_sql("SELECT max(id) FROM bibrec")
     if res and res[0][0]:
         max_recid = int(res[0][0])
 
     lwords_hitset = None
     for j in range(0, len(lwords)): #find which docs to search based on ranges..should be done in search_engine...
         if lwords[j] and lwords[j][:6] == "recid:":
             if not lwords_hitset:
                 lwords_hitset = intbitset()
             lword = lwords[j][6:]
             if string.find(lword, "->") > -1:
                 lword = string.split(lword, "->")
                 if int(lword[0]) >= max_recid or int(lword[1]) >= max_recid + 1:
                     return (None, "Warning: Given record IDs are out of range.", "", voutput)
                 for i in range(int(lword[0]), int(lword[1])):
                     lwords_hitset.add(int(i))
             elif lword < max_recid + 1:
                 lwords_hitset.add(int(lword))
             else:
                 return (None, "Warning: Given record IDs are out of range.", "", voutput)
 
     rnkdict = deserialize_via_marshal(rnkdict[0][0])
     if verbose > 0:
         voutput += "<br />Running rank method: %s, using rank_by_method function in bibrank_record_sorter<br />" % rank_method_code
         voutput += "Ranking data loaded, size of structure: %s<br />" % len(rnkdict)
     lrecIDs = list(hitset)
 
     if verbose > 0:
         voutput += "Number of records to rank: %s<br />" % len(lrecIDs)
     reclist = []
     reclist_addend = []
 
     if not lwords_hitset: #rank all docs, can this be speed up using something else than for loop?
         for recID in lrecIDs:
             if rnkdict.has_key(recID):
                 reclist.append((recID, rnkdict[recID]))
                 del rnkdict[recID]
             else:
                 reclist_addend.append((recID, 0))
     else: #rank docs in hitset, can this be speed up using something else than for loop?
         for recID in lwords_hitset:
             if rnkdict.has_key(recID) and recID in hitset:
                 reclist.append((recID, rnkdict[recID]))
                 del rnkdict[recID]
             elif recID in hitset:
                 reclist_addend.append((recID, 0))
 
     if verbose > 0:
         voutput += "Number of records ranked: %s<br />" % len(reclist)
         voutput += "Number of records not ranked: %s<br />" % len(reclist_addend)
 
     reclist.sort(lambda x, y: cmp(x[1], y[1]))
     return (reclist_addend + reclist, methods[rank_method_code]["prefix"], methods[rank_method_code]["postfix"], voutput)
 
 def find_citations(rank_method_code, recID, hitset, verbose):
     """Rank by the amount of citations."""
     #calculate the cited-by values for all the members of the hitset
     #returns: ((recordid,weight),prefix,postfix,message)
 
     global voutput
     voutput = ""
 
     #If the recID is numeric, return only stuff that cites it. Otherwise return
     #stuff that cites hitset
 
     #try to convert to int
     recisint = True
     recidint = 0
     try:
         recidint = int(recID)
     except:
         recisint = False
     ret = []
     if recisint:
         myrecords = get_cited_by(recidint) #this is a simple list
         ret = get_cited_by_weight(myrecords)
     else:
         ret = get_cited_by_weight(hitset)
     ret.sort(lambda x,y:cmp(x[1],y[1]))      #ascending by the second member of the tuples
 
     if verbose > 0:
         voutput = voutput+"\nrecID "+str(recID)+" is int: "+str(recisint)+" hitset "+str(hitset)+"\n"+"find_citations retlist "+str(ret)
 
     #voutput = voutput + str(ret)
 
     if ret:
         return (ret,"(", ")", "")
     else:
         return ((),"", "", "")
 
 def find_similar(rank_method_code, recID, hitset, rank_limit_relevance,verbose):
     """Finding terms to use for calculating similarity. Terms are taken from the recid given, returns a list of recids's and relevance,
     input:
     rank_method_code - the code of the method, from the name field in rnkMETHOD
     recID - records to use for find similar
     hitset - a list of hits for the query found by search_engine
     rank_limit_relevance - show only records with a rank value above this
     verbose - verbose value
     output:
     reclist - a list of sorted records: [[23,34], [344,24], [1,01]]
     prefix - what to show before the rank value
     postfix - what to show after the rank value
     voutput - contains extra information, content dependent on verbose value"""
 
     startCreate = time.time()
     global voutput
 
     if verbose > 0:
         voutput += "<br />Running rank method: %s, using find_similar/word_frequency in bibrank_record_sorter<br />" % rank_method_code
     rank_limit_relevance = methods[rank_method_code]["default_min_relevance"]
 
     try:
         recID = int(recID)
     except Exception,e :
         return (None, "Warning: Error in record ID, please check that a number is given.", "", voutput)
 
     rec_terms = run_sql("""SELECT termlist FROM %sR WHERE id_bibrec=%%s""" % methods[rank_method_code]["rnkWORD_table"][:-1],  (recID,))
     if not rec_terms:
         return (None, "Warning: Requested record does not seem to exist.", "", voutput)
     rec_terms = deserialize_via_marshal(rec_terms[0][0])
 
     #Get all documents using terms from the selected documents
     if len(rec_terms) == 0:
         return (None, "Warning: Record specified has no content indexed for use with this method.", "", voutput)
     else:
         terms = "%s" % rec_terms.keys()
         terms_recs = dict(run_sql("""SELECT term, hitlist FROM %s WHERE term IN (%s)""" % (methods[rank_method_code]["rnkWORD_table"], terms[1:len(terms) - 1])))
 
     tf_values = {}
     #Calculate all term frequencies
     for (term, tf) in rec_terms.iteritems():
         if len(term) >= methods[rank_method_code]["min_word_length"] and terms_recs.has_key(term) and tf[1] != 0:
             tf_values[term] =  int((1 + math.log(tf[0])) * tf[1]) #calculate term weigth
     tf_values = tf_values.items()
     tf_values.sort(lambda x, y: cmp(y[1], x[1])) #sort based on weigth
 
     lwords = []
     stime = time.time()
     (recdict, rec_termcount) = ({}, {})
 
     for (t, tf) in tf_values: #t=term, tf=term frequency
         term_recs = deserialize_via_marshal(terms_recs[t])
         if len(tf_values) <= methods[rank_method_code]["max_nr_words_lower"] or (len(term_recs) >= methods[rank_method_code]["min_nr_words_docs"] and (((float(len(term_recs)) / float(methods[rank_method_code]["col_size"])) <=  methods[rank_method_code]["max_word_occurence"]) and ((float(len(term_recs)) / float(methods[rank_method_code]["col_size"])) >= methods[rank_method_code]["min_word_occurence"]))): #too complicated...something must be done
             lwords.append((t, methods[rank_method_code]["rnkWORD_table"])) #list of terms used
             (recdict, rec_termcount) = calculate_record_relevance_findsimilar((t, round(tf, 4)) , term_recs, hitset, recdict, rec_termcount, verbose, "true") #true tells the function to not calculate all unimportant terms
         if len(tf_values) > methods[rank_method_code]["max_nr_words_lower"] and (len(lwords) ==  methods[rank_method_code]["max_nr_words_upper"] or tf < 0):
             break
 
     if len(recdict) == 0 or len(lwords) == 0:
         return (None, "Could not find any similar documents, possibly because of error in ranking data.", "", voutput)
     else: #sort if we got something to sort
         (reclist, hitset) = sort_record_relevance_findsimilar(recdict, rec_termcount, hitset, rank_limit_relevance, verbose)
 
     if verbose > 0:
         voutput += "<br />Number of terms: %s<br />" % run_sql("SELECT count(id) FROM %s" % methods[rank_method_code]["rnkWORD_table"])[0][0]
         voutput += "Number of terms to use for query: %s<br />" % len(lwords)
         voutput += "Terms: %s<br />" % lwords
         voutput += "Current number of recIDs: %s<br />" % (methods[rank_method_code]["col_size"])
         voutput += "Prepare time: %s<br />" % (str(time.time() - startCreate))
         voutput += "Total time used: %s<br />" % (str(time.time() - startCreate))
         rank_method_stat(rank_method_code, reclist, lwords)
 
     return (reclist[:len(reclist)], methods[rank_method_code]["prefix"], methods[rank_method_code]["postfix"], voutput)
 
 def word_similarity(rank_method_code, lwords, hitset, rank_limit_relevance, verbose):
     """Ranking a records containing specified words and returns a sorted list.
     input:
     rank_method_code - the code of the method, from the name field in rnkMETHOD
     lwords - a list of words from the query
     hitset - a list of hits for the query found by search_engine
     rank_limit_relevance - show only records with a rank value above this
     verbose - verbose value
     output:
     reclist - a list of sorted records: [[23,34], [344,24], [1,01]]
     prefix - what to show before the rank value
     postfix - what to show after the rank value
     voutput - contains extra information, content dependent on verbose value"""
 
     global voutput
     startCreate = time.time()
 
     if verbose > 0:
         voutput += "<br />Running rank method: %s, using word_frequency function in bibrank_record_sorter<br />" % rank_method_code
 
     lwords_old = lwords
     lwords = []
     #Check terms, remove non alphanumeric characters. Use both unstemmed and stemmed version of all terms.
     for i in range(0, len(lwords_old)):
         term = string.lower(lwords_old[i])
         if not methods[rank_method_code]["stopwords"] == "True" or methods[rank_method_code]["stopwords"] and not is_stopword(term, 1):
             lwords.append((term, methods[rank_method_code]["rnkWORD_table"]))
             terms = string.split(string.lower(re.sub(methods[rank_method_code]["chars_alphanumericseparators"], ' ', term)))
             for term in terms:
                 if methods[rank_method_code].has_key("stemmer"): # stem word
                     term = stem(string.replace(term, ' ', ''), methods[rank_method_code]["stemmer"])
                 if lwords_old[i] != term: #add if stemmed word is different than original word
                     lwords.append((term, methods[rank_method_code]["rnkWORD_table"]))
 
     (recdict, rec_termcount, lrecIDs_remove) = ({}, {}, {})
     #For each term, if accepted, get a list of the records using the term
     #calculate then relevance for each term before sorting the list of records
     for (term, table) in lwords:
         term_recs = run_sql("""SELECT term, hitlist FROM %s WHERE term=%%s""" % methods[rank_method_code]["rnkWORD_table"], (term,))
         if term_recs: #if term exists in database, use for ranking
             term_recs = deserialize_via_marshal(term_recs[0][1])
             (recdict, rec_termcount) = calculate_record_relevance((term, int(term_recs["Gi"][1])) , term_recs, hitset, recdict, rec_termcount, verbose, quick=None)
             del term_recs
 
     if len(recdict) == 0 or (len(lwords) == 1 and lwords[0] == ""):
         return (None, "Records not ranked. The query is not detailed enough, or not enough records found, for ranking to be possible.", "", voutput)
     else: #sort if we got something to sort
         (reclist, hitset) = sort_record_relevance(recdict, rec_termcount, hitset, rank_limit_relevance, verbose)
 
     #Add any documents not ranked to the end of the list
     if hitset:
         lrecIDs = list(hitset)                       #using 2-3mb
         reclist = zip(lrecIDs, [0] * len(lrecIDs)) + reclist      #using 6mb
 
     if verbose > 0:
         voutput += "<br />Current number of recIDs: %s<br />" % (methods[rank_method_code]["col_size"])
         voutput += "Number of terms: %s<br />" % run_sql("SELECT count(id) FROM %s" % methods[rank_method_code]["rnkWORD_table"])[0][0]
         voutput += "Terms: %s<br />" % lwords
         voutput += "Prepare and pre calculate time: %s<br />" % (str(time.time() - startCreate))
         voutput += "Total time used: %s<br />" % (str(time.time() - startCreate))
         rank_method_stat(rank_method_code, reclist, lwords)
 
     return (reclist, methods[rank_method_code]["prefix"], methods[rank_method_code]["postfix"], voutput)
 
 def calculate_record_relevance(term, invidx, hitset, recdict, rec_termcount, verbose, quick=None):
     """Calculating the relevance of the documents based on the input, calculates only one word
     term - (term, query term factor) the term and its importance in the overall search
     invidx - {recid: tf, Gi: norm value} The Gi value is used as a idf value
     hitset - a hitset with records that are allowed to be ranked
     recdict - contains currently ranked records, is returned with new values
     rec_termcount - {recid: count} the number of terms in this record that matches the query
     verbose - verbose value
     quick - if quick=yes only terms with a positive qtf is used, to limit the number of records to sort"""
 
 
     (t, qtf) = term
     if invidx.has_key("Gi"):#Gi = weigth for this term, created by bibrank_word_indexer
         Gi = invidx["Gi"][1]
         del invidx["Gi"]
     else: #if not existing, bibrank should be run with -R
         return (recdict, rec_termcount)
 
     if not quick or (qtf >= 0 or (qtf < 0 and len(recdict) == 0)):
         #Only accept records existing in the hitset received from the search engine
         for (j, tf) in invidx.iteritems():
             if j in hitset:#only include docs found by search_engine based on query
                 try: #calculates rank value
                     recdict[j] = recdict.get(j, 0) + int(math.log(tf[0] * Gi * tf[1] * qtf))
                 except:
                     return (recdict, rec_termcount)
                 rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document
     elif quick: #much used term, do not include all records, only use already existing ones
         for (j, tf) in recdict.iteritems(): #i.e: if doc contains important term, also count unimportant
             if invidx.has_key(j):
                 tf = invidx[j]
                 recdict[j] = recdict.get(j, 0) + int(math.log(tf[0] * Gi * tf[1] * qtf))
                 rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document
 
     return (recdict, rec_termcount)
 
 def calculate_record_relevance_findsimilar(term, invidx, hitset, recdict, rec_termcount, verbose, quick=None):
     """Calculating the relevance of the documents based on the input, calculates only one word
     term - (term, query term factor) the term and its importance in the overall search
     invidx - {recid: tf, Gi: norm value} The Gi value is used as a idf value
     hitset - a hitset with records that are allowed to be ranked
     recdict - contains currently ranked records, is returned with new values
     rec_termcount - {recid: count} the number of terms in this record that matches the query
     verbose - verbose value
     quick - if quick=yes only terms with a positive qtf is used, to limit the number of records to sort"""
 
 
     (t, qtf) = term
     if invidx.has_key("Gi"): #Gi = weigth for this term, created by bibrank_word_indexer
         Gi = invidx["Gi"][1]
         del invidx["Gi"]
     else: #if not existing, bibrank should be run with -R
         return (recdict, rec_termcount)
 
     if not quick or (qtf >= 0 or (qtf < 0 and len(recdict) == 0)):
         #Only accept records existing in the hitset received from the search engine
         for (j, tf) in invidx.iteritems():
             if j in hitset: #only include docs found by search_engine based on query
                 #calculate rank value
                 recdict[j] = recdict.get(j, 0) + int((1 + math.log(tf[0])) * Gi * tf[1] * qtf)
                 rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document
     elif quick: #much used term, do not include all records, only use already existing ones
         for (j, tf) in recdict.iteritems(): #i.e: if doc contains important term, also count unimportant
             if invidx.has_key(j):
                 tf = invidx[j]
                 recdict[j] = recdict[j] + int((1 + math.log(tf[0])) * Gi * tf[1] * qtf)
                 rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document
 
     return (recdict, rec_termcount)
 
 def sort_record_relevance(recdict, rec_termcount, hitset, rank_limit_relevance, verbose):
     """Sorts the dictionary and returns records with a relevance higher than the given value.
     recdict - {recid: value} unsorted
     rank_limit_relevance - a value > 0 usually
     verbose - verbose value"""
 
     startCreate = time.time()
     global voutput
     reclist = []
 
     #remove all ranked documents so that unranked can be added to the end
     hitset -= recdict.keys()
 
     #gives each record a score between 0-100
     divideby = max(recdict.values())
     for (j, w) in recdict.iteritems():
         w = int(w * 100 / divideby)
         if w >= rank_limit_relevance:
             reclist.append((j, w))
 
     #sort scores
     reclist.sort(lambda x, y: cmp(x[1], y[1]))
 
     if verbose > 0:
         voutput += "Number of records sorted: %s<br />" % len(reclist)
         voutput += "Sort time: %s<br />" % (str(time.time() - startCreate))
     return (reclist, hitset)
 
 def sort_record_relevance_findsimilar(recdict, rec_termcount, hitset, rank_limit_relevance, verbose):
     """Sorts the dictionary and returns records with a relevance higher than the given value.
     recdict - {recid: value} unsorted
     rank_limit_relevance - a value > 0 usually
     verbose - verbose value"""
 
     startCreate = time.time()
     global voutput
     reclist = []
 
     #Multiply with the number of terms of the total number of terms in the query existing in the records
     for j in recdict.keys():
         if recdict[j] > 0 and rec_termcount[j] > 1:
             recdict[j] = math.log((recdict[j] * rec_termcount[j]))
         else:
             recdict[j] = 0
 
     hitset -= recdict.keys()
     #gives each record a score between 0-100
     divideby = max(recdict.values())
     for (j, w) in recdict.iteritems():
         w = int(w * 100 / divideby)
         if w >= rank_limit_relevance:
             reclist.append((j, w))
 
     #sort scores
     reclist.sort(lambda x, y: cmp(x[1], y[1]))
 
     if verbose > 0:
         voutput += "Number of records sorted: %s<br />" % len(reclist)
         voutput += "Sort time: %s<br />" % (str(time.time() - startCreate))
     return (reclist, hitset)
 
 def rank_method_stat(rank_method_code, reclist, lwords):
     """Shows some statistics about the searchresult.
     rank_method_code - name field from rnkMETHOD
     reclist - a list of sorted and ranked records
     lwords - the words in the query"""
 
     global voutput
     if len(reclist) > 20:
         j = 20
     else:
         j = len(reclist)
 
     voutput += "<br />Rank statistics:<br />"
     for i in range(1, j + 1):
         voutput += "%s,Recid:%s,Score:%s<br />" % (i,reclist[len(reclist) - i][0],reclist[len(reclist) - i][1])
         for (term, table) in lwords:
             term_recs = run_sql("""SELECT hitlist FROM %s WHERE term=%%s""" % table, (term,))
             if term_recs:
                 term_recs = deserialize_via_marshal(term_recs[0][0])
                 if term_recs.has_key(reclist[len(reclist) - i][0]):
                     voutput += "%s-%s / " % (term, term_recs[reclist[len(reclist) - i][0]])
         voutput += "<br />"
 
     voutput += "<br />Score variation:<br />"
     count = {}
     for i in range(0, len(reclist)):
         count[reclist[i][1]] = count.get(reclist[i][1], 0) + 1
     i = 100
     while i >= 0:
         if count.has_key(i):
             voutput += "%s-%s<br />" % (i, count[i])
         i -= 1
-
-try:
-    import psyco
-    psyco.bind(find_similar)
-    psyco.bind(rank_by_method)
-    psyco.bind(calculate_record_relevance)
-    psyco.bind(word_similarity)
-    psyco.bind(sort_record_relevance)
-except StandardError, e:
-    pass
-
diff --git a/modules/bibrank/lib/bibrank_tag_based_indexer.py b/modules/bibrank/lib/bibrank_tag_based_indexer.py
index 57f69b9b8..a17e19168 100644
--- a/modules/bibrank/lib/bibrank_tag_based_indexer.py
+++ b/modules/bibrank/lib/bibrank_tag_based_indexer.py
@@ -1,478 +1,468 @@
 # -*- coding: utf-8 -*-
 ## Ranking of records using different parameters and methods.
 
 ## This file is part of Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2012 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 __revision__ = "$Id$"
 
 import os
 import sys
 import time
 import ConfigParser
 
 from invenio.config import \
      CFG_SITE_LANG, \
      CFG_ETCDIR, \
      CFG_PREFIX
 from invenio.search_engine import perform_request_search, HitSet
 from invenio.bibrank_citation_indexer import get_citation_weight, print_missing, get_cit_dict, insert_into_cit_db
 from invenio.bibrank_downloads_indexer import *
 from invenio.dbquery import run_sql, serialize_via_marshal, deserialize_via_marshal
 from invenio.errorlib import register_exception
 from invenio.bibtask import task_get_option, write_message, task_sleep_now_if_required
 from invenio.bibindex_engine import create_range_list
 
 options = {}
 
 def remove_auto_cites(dic):
     """Remove auto-cites and dedupe."""
     for key in dic.keys():
         new_list = dic.fromkeys(dic[key]).keys()
         try:
             new_list.remove(key)
         except ValueError:
             pass
         dic[key] = new_list
     return dic
 
 def citation_repair_exec():
     """Repair citation ranking method"""
     ## repair citations
     for rowname in ["citationdict","reversedict"]:
         ## get dic
         dic = get_cit_dict(rowname)
         ## repair
         write_message("Repairing %s" % rowname)
         dic = remove_auto_cites(dic)
         ## store healthy citation dic
         insert_into_cit_db(dic, rowname)
     return
 
 def download_weight_filtering_user_repair_exec ():
     """Repair download weight filtering user ranking method"""
     write_message("Repairing for this ranking method is not defined. Skipping.")
     return
 
 def download_weight_total_repair_exec():
     """Repair download weight total ranking method"""
     write_message("Repairing for this ranking method is not defined. Skipping.")
     return
 
 def file_similarity_by_times_downloaded_repair_exec():
     """Repair file similarity by times downloaded ranking method"""
     write_message("Repairing for this ranking method is not defined. Skipping.")
     return
 
 def single_tag_rank_method_repair_exec():
     """Repair single tag ranking method"""
     write_message("Repairing for this ranking method is not defined. Skipping.")
     return
 
 def citation_exec(rank_method_code, name, config):
     """Rank method for citation analysis"""
     #first check if this is a specific task
     begin_date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
     if task_get_option("cmd") == "print-missing":
         num = task_get_option("num")
         print_missing(num)
     else:
         dict = get_citation_weight(rank_method_code, config)
         if dict:
             if task_get_option("id") or task_get_option("collection") or \
                task_get_option("modified"):
                 # user have asked to citation-index specific records
                 # only, so we should not update citation indexer's
                 # last run time stamp information
                 begin_date = None
             intoDB(dict, begin_date, rank_method_code)
         else:
             write_message("No need to update the indexes for citations.")
 
 def download_weight_filtering_user(run):
     return bibrank_engine(run)
 
 def download_weight_total(run):
     return bibrank_engine(run)
 
 def file_similarity_by_times_downloaded(run):
     return bibrank_engine(run)
 
 def download_weight_filtering_user_exec (rank_method_code, name, config):
     """Ranking by number of downloads per User.
     Only one full Text Download is taken in account for one
     specific userIP address"""
     begin_date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
     time1 = time.time()
     dic = fromDB(rank_method_code)
     last_updated = get_lastupdated(rank_method_code)
     keys = new_downloads_to_index(last_updated)
     filter_downloads_per_hour(keys, last_updated)
     dic = get_download_weight_filtering_user(dic, keys)
     intoDB(dic, begin_date, rank_method_code)
     time2 = time.time()
     return {"time":time2-time1}
 
 def download_weight_total_exec(rank_method_code, name, config):
     """rankink by total number of downloads without check the user ip
     if users downloads 3 time the same full text document it has to be count as 3 downloads"""
     begin_date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
     time1 = time.time()
     dic = fromDB(rank_method_code)
     last_updated = get_lastupdated(rank_method_code)
     keys = new_downloads_to_index(last_updated)
     filter_downloads_per_hour(keys, last_updated)
     dic = get_download_weight_total(dic, keys)
     intoDB(dic, begin_date, rank_method_code)
     time2 = time.time()
     return {"time":time2-time1}
 
 def file_similarity_by_times_downloaded_exec(rank_method_code, name, config):
     """update dictionnary {recid:[(recid, nb page similarity), ()..]}"""
     begin_date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
     time1 = time.time()
     dic = fromDB(rank_method_code)
     last_updated = get_lastupdated(rank_method_code)
     keys = new_downloads_to_index(last_updated)
     filter_downloads_per_hour(keys, last_updated)
     dic = get_file_similarity_by_times_downloaded(dic, keys)
     intoDB(dic, begin_date, rank_method_code)
     time2 = time.time()
     return {"time":time2-time1}
 
 def single_tag_rank_method_exec(rank_method_code, name, config):
     """Creating the rank method data"""
     begin_date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
     rnkset = {}
     rnkset_old = fromDB(rank_method_code)
     rnkset_new = single_tag_rank(config)
     rnkset = union_dicts(rnkset_old, rnkset_new)
     intoDB(rnkset, begin_date, rank_method_code)
 
 def single_tag_rank(config):
     """Connect the given tag with the data from the kb file given"""
     write_message("Loading knowledgebase file", verbose=9)
     kb_data = {}
     records = []
 
     write_message("Reading knowledgebase file: %s" % \
                    config.get(config.get("rank_method", "function"), "kb_src"))
     input = open(config.get(config.get("rank_method", "function"), "kb_src"), 'r')
     data = input.readlines()
     for line in data:
         if not line[0:1] == "#":
             kb_data[string.strip((string.split(string.strip(line), "---"))[0])] = (string.split(string.strip(line), "---"))[1]
     write_message("Number of lines read from knowledgebase file: %s" % len(kb_data))
 
     tag = config.get(config.get("rank_method", "function"), "tag")
     tags = config.get(config.get("rank_method", "function"), "check_mandatory_tags").split(", ")
     if tags == ['']:
         tags = ""
 
     records = []
     for (recids, recide) in options["recid_range"]:
         task_sleep_now_if_required(can_stop_too=True)
         write_message("......Processing records #%s-%s" % (recids, recide))
         recs = run_sql("SELECT id_bibrec, value FROM bib%sx, bibrec_bib%sx WHERE tag=%%s AND id_bibxxx=id and id_bibrec >=%%s and id_bibrec<=%%s" % (tag[0:2], tag[0:2]), (tag, recids, recide))
         valid = HitSet(trailing_bits=1)
         valid.discard(0)
         for key in tags:
             newset = HitSet()
             newset += [recid[0] for recid in (run_sql("SELECT id_bibrec FROM bib%sx, bibrec_bib%sx WHERE id_bibxxx=id AND tag=%%s AND id_bibxxx=id and id_bibrec >=%%s and id_bibrec<=%%s" % (tag[0:2], tag[0:2]), (key, recids, recide)))]
             valid.intersection_update(newset)
         if tags:
             recs = filter(lambda x: x[0] in valid, recs)
         records = records + list(recs)
         write_message("Number of records found with the necessary tags: %s" % len(records))
 
     records = filter(lambda x: x[0] in options["validset"], records)
     rnkset = {}
     for key, value in records:
         if kb_data.has_key(value):
             if not rnkset.has_key(key):
                 rnkset[key] = float(kb_data[value])
             else:
                 if kb_data.has_key(rnkset[key]) and float(kb_data[value]) > float((rnkset[key])[1]):
                     rnkset[key] = float(kb_data[value])
         else:
             rnkset[key] = 0
 
     write_message("Number of records available in rank method: %s" % len(rnkset))
     return rnkset
 
 def get_lastupdated(rank_method_code):
     """Get the last time the rank method was updated"""
     res = run_sql("SELECT rnkMETHOD.last_updated FROM rnkMETHOD WHERE name=%s", (rank_method_code, ))
     if res:
         return res[0][0]
     else:
         raise Exception("Is this the first run? Please do a complete update.")
 
 def intoDB(dict, date, rank_method_code):
     """Insert the rank method data into the database"""
     mid = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, ))
     del_rank_method_codeDATA(rank_method_code)
     serdata = serialize_via_marshal(dict);
     midstr = str(mid[0][0]);
     run_sql("INSERT INTO rnkMETHODDATA(id_rnkMETHOD, relevance_data) VALUES (%s,%s)", (midstr, serdata,))
     if date:
         run_sql("UPDATE rnkMETHOD SET last_updated=%s WHERE name=%s", (date, rank_method_code))
 
 def fromDB(rank_method_code):
     """Get the data for a rank method"""
     id = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, ))
     res = run_sql("SELECT relevance_data FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s", (id[0][0], ))
     if res:
         return deserialize_via_marshal(res[0][0])
     else:
         return {}
 
 def del_rank_method_codeDATA(rank_method_code):
     """Delete the data for a rank method"""
     id = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, ))
     run_sql("DELETE FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s", (id[0][0], ))
 
 def del_recids(rank_method_code, range_rec):
     """Delete some records from the rank method"""
     id = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, ))
     res = run_sql("SELECT relevance_data FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s", (id[0][0], ))
     if res:
         rec_dict = deserialize_via_marshal(res[0][0])
         write_message("Old size: %s" % len(rec_dict))
         for (recids, recide) in range_rec:
             for i in range(int(recids), int(recide)):
                 if rec_dict.has_key(i):
                     del rec_dict[i]
         write_message("New size: %s" % len(rec_dict))
         intoDB(rec_dict, begin_date, rank_method_code)
     else:
         write_message("Create before deleting!")
 
 def union_dicts(dict1, dict2):
     "Returns union of the two dicts."
     union_dict = {}
     for (key, value) in dict1.iteritems():
         union_dict[key] = value
     for (key, value) in dict2.iteritems():
         union_dict[key] = value
     return union_dict
 
 def rank_method_code_statistics(rank_method_code):
     """Print statistics"""
 
     method = fromDB(rank_method_code)
     max = ('', -999999)
     maxcount = 0
     min = ('', 999999)
     mincount = 0
 
     for (recID, value) in method.iteritems():
         if value < min and value > 0:
             min = value
         if value > max:
             max = value
 
     for (recID, value) in method.iteritems():
         if value == min:
             mincount += 1
         if value == max:
             maxcount += 1
 
     write_message("Showing statistic for selected method")
     write_message("Method name: %s" % getName(rank_method_code))
     write_message("Short name: %s" % rank_method_code)
     write_message("Last run: %s" % get_lastupdated(rank_method_code))
     write_message("Number of records: %s" % len(method))
     write_message("Lowest value: %s - Number of records: %s" % (min, mincount))
     write_message("Highest value: %s - Number of records: %s" % (max, maxcount))
     write_message("Divided into 10 sets:")
     for i in range(1, 11):
         setcount = 0
         distinct_values = {}
         lower = -1.0 + ((float(max + 1) / 10)) * (i - 1)
         upper = -1.0 + ((float(max + 1) / 10)) * i
         for (recID, value) in method.iteritems():
             if value >= lower and value <= upper:
                 setcount += 1
                 distinct_values[value] = 1
         write_message("Set %s (%s-%s) %s Distinct values: %s" % (i, lower, upper, len(distinct_values), setcount))
 
 def check_method(rank_method_code):
     write_message("Checking rank method...")
     if len(fromDB(rank_method_code)) == 0:
         write_message("Rank method not yet executed, please run it to create the necessary data.")
     else:
         if len(add_recIDs_by_date(rank_method_code)) > 0:
             write_message("Records modified, update recommended")
         else:
             write_message("No records modified, update not necessary")
 
 def bibrank_engine(run):
     """Run the indexing task.
     Return 1 in case of success and 0 in case of failure.
     """
-
-    try:
-        import psyco
-        psyco.bind(single_tag_rank)
-        psyco.bind(single_tag_rank_method_exec)
-        psyco.bind(serialize_via_marshal)
-        psyco.bind(deserialize_via_marshal)
-    except StandardError, e:
-        pass
-
     startCreate = time.time()
     try:
         options["run"] = []
         options["run"].append(run)
         for rank_method_code in options["run"]:
             task_sleep_now_if_required(can_stop_too=True)
             cfg_name = getName(rank_method_code)
             write_message("Running rank method: %s." % cfg_name)
 
             file = CFG_ETCDIR + "/bibrank/" + rank_method_code + ".cfg"
             config = ConfigParser.ConfigParser()
             try:
                 config.readfp(open(file))
             except StandardError, e:
                 write_message("Cannot find configurationfile: %s" % file, sys.stderr)
                 raise StandardError
 
             cfg_short = rank_method_code
             cfg_function = config.get("rank_method", "function") + "_exec"
             cfg_repair_function = config.get("rank_method", "function") + "_repair_exec"
             cfg_name = getName(cfg_short)
             options["validset"] = get_valid_range(rank_method_code)
 
             if task_get_option("collection"):
                 l_of_colls = string.split(task_get_option("collection"), ", ")
                 recIDs = perform_request_search(c=l_of_colls)
                 recIDs_range = []
                 for recID in recIDs:
                     recIDs_range.append([recID, recID])
                 options["recid_range"] = recIDs_range
             elif task_get_option("id"):
                 options["recid_range"] = task_get_option("id")
             elif task_get_option("modified"):
                 options["recid_range"] = add_recIDs_by_date(rank_method_code, task_get_option("modified"))
             elif task_get_option("last_updated"):
                 options["recid_range"] = add_recIDs_by_date(rank_method_code)
             else:
                 write_message("No records specified, updating all", verbose=2)
                 min_id = run_sql("SELECT min(id) from bibrec")[0][0]
                 max_id = run_sql("SELECT max(id) from bibrec")[0][0]
                 options["recid_range"] = [[min_id, max_id]]
 
             if task_get_option("quick") == "no":
                 write_message("Recalculate parameter not used, parameter ignored.", verbose=9)
 
             if task_get_option("cmd") == "del":
                 del_recids(cfg_short, options["recid_range"])
             elif task_get_option("cmd") == "add":
                 func_object = globals().get(cfg_function)
                 func_object(rank_method_code, cfg_name, config)
             elif task_get_option("cmd") == "stat":
                 rank_method_code_statistics(rank_method_code)
             elif task_get_option("cmd") == "check":
                 check_method(rank_method_code)
             elif task_get_option("cmd") == "print-missing":
                 func_object = globals().get(cfg_function)
                 func_object(rank_method_code, cfg_name, config)
             elif task_get_option("cmd") == "repair":
                 func_object = globals().get(cfg_repair_function)
                 func_object()
             else:
                 write_message("Invalid command found processing %s" % rank_method_code, sys.stderr)
                 raise StandardError
     except StandardError, e:
         write_message("\nException caught: %s" % e, sys.stderr)
         register_exception()
         raise StandardError
 
     if task_get_option("verbose"):
         showtime((time.time() - startCreate))
     return 1
 
 def get_valid_range(rank_method_code):
     """Return a range of records"""
     write_message("Getting records from collections enabled for rank method.", verbose=9)
 
     res = run_sql("SELECT collection.name FROM collection, collection_rnkMETHOD, rnkMETHOD WHERE collection.id=id_collection and id_rnkMETHOD=rnkMETHOD.id and rnkMETHOD.name=%s",  (rank_method_code, ))
     l_of_colls = []
     for coll in res:
         l_of_colls.append(coll[0])
     if len(l_of_colls) > 0:
         recIDs = perform_request_search(c=l_of_colls)
     else:
         recIDs = []
     valid = HitSet()
     valid += recIDs
     return valid
 
 def add_recIDs_by_date(rank_method_code, dates=""):
     """Return recID range from records modified between DATES[0] and DATES[1].
        If DATES is not set, then add records modified since the last run of
        the ranking method RANK_METHOD_CODE.
     """
     if not dates:
         try:
             dates = (get_lastupdated(rank_method_code), '')
         except Exception:
             dates = ("0000-00-00 00:00:00", '')
     if dates[0] is None:
         dates = ("0000-00-00 00:00:00", '')
     query = """SELECT b.id FROM bibrec AS b WHERE b.modification_date >= %s"""
     if dates[1]:
         query += " and b.modification_date <= %s"
     query += " ORDER BY b.id ASC"""
     if dates[1]:
         res = run_sql(query, (dates[0], dates[1]))
     else:
         res = run_sql(query, (dates[0], ))
     alist = create_range_list([row[0] for row in res])
     if not alist:
         write_message("No new records added since last time method was run")
     return alist
 
 def getName(rank_method_code, ln=CFG_SITE_LANG, type='ln'):
     """Returns the name of the method if it exists"""
 
     try:
         rnkid = run_sql("SELECT id FROM rnkMETHOD where name=%s", (rank_method_code, ))
         if rnkid:
             rnkid = str(rnkid[0][0])
             res = run_sql("SELECT value FROM rnkMETHODNAME where type=%s and ln=%s and id_rnkMETHOD=%s", (type, ln, rnkid))
             if not res:
                 res = run_sql("SELECT value FROM rnkMETHODNAME WHERE ln=%s and id_rnkMETHOD=%s and type=%s", (CFG_SITE_LANG, rnkid, type))
             if not res:
                 return rank_method_code
             return res[0][0]
         else:
             raise Exception
     except Exception:
         write_message("Cannot run rank method, either given code for method is wrong, or it has not been added using the webinterface.")
         raise Exception
 
 def single_tag_rank_method(run):
     return bibrank_engine(run)
 
 def showtime(timeused):
     """Show time used for method"""
     write_message("Time used: %d second(s)." % timeused, verbose=9)
 
 def citation(run):
     return bibrank_engine(run)
diff --git a/modules/bibrank/lib/bibrank_word_indexer.py b/modules/bibrank/lib/bibrank_word_indexer.py
index 27c4f65ff..76af88031 100644
--- a/modules/bibrank/lib/bibrank_word_indexer.py
+++ b/modules/bibrank/lib/bibrank_word_indexer.py
@@ -1,1206 +1,1194 @@
 ## This file is part of Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 __revision__ = "$Id$"
 
 import sys
 import time
 import urllib
 import math
 import re
 import ConfigParser
 
 from invenio.config import \
      CFG_SITE_LANG, \
      CFG_ETCDIR
 from invenio.search_engine import perform_request_search, strip_accents, wash_index_term
 from invenio.dbquery import run_sql, DatabaseError, serialize_via_marshal, deserialize_via_marshal
 from invenio.bibindex_engine_stemmer import is_stemmer_available_for_language, stem
 from invenio.bibindex_engine_stopwords import is_stopword
 from invenio.bibindex_engine import beautify_range_list, \
     kill_sleepy_mysql_threads, create_range_list
 from invenio.bibtask import write_message, task_get_option, task_update_progress, \
     task_update_status, task_sleep_now_if_required
 from invenio.intbitset import intbitset
 from invenio.errorlib import register_exception
 
 options = {} # global variable to hold task options
 
 ## safety parameters concerning DB thread-multiplication problem:
 CFG_CHECK_MYSQL_THREADS = 0 # to check or not to check the problem?
 CFG_MAX_MYSQL_THREADS = 50 # how many threads (connections) we consider as still safe
 CFG_MYSQL_THREAD_TIMEOUT = 20 # we'll kill threads that were sleeping for more than X seconds
 
 ## override urllib's default password-asking behaviour:
 class MyFancyURLopener(urllib.FancyURLopener):
     def prompt_user_passwd(self, host, realm):
         # supply some dummy credentials by default
         return ("mysuperuser", "mysuperpass")
     def http_error_401(self, url, fp, errcode, errmsg, headers):
         # do not bother with protected pages
         raise IOError, (999, 'unauthorized access')
         return None
 
 #urllib._urlopener = MyFancyURLopener()
 
 
 nb_char_in_line = 50  # for verbose pretty printing
 chunksize = 1000 # default size of chunks that the records will be treated by
 base_process_size = 4500 # process base size
 
 ## Dictionary merging functions
 def dict_union(list1, list2):
     "Returns union of the two dictionaries."
     union_dict = {}
 
     for (e, count) in list1.iteritems():
         union_dict[e] = count
     for (e, count) in list2.iteritems():
         if not union_dict.has_key(e):
             union_dict[e] = count
         else:
             union_dict[e] = (union_dict[e][0] + count[0], count[1])
 
     #for (e, count) in list2.iteritems():
     #    list1[e] = (list1.get(e, (0, 0))[0] + count[0], count[1])
 
     #return list1
     return union_dict
 
 # tagToFunctions mapping. It offers an indirection level necesary for
 # indexing fulltext. The default is get_words_from_phrase
 tagToWordsFunctions = {}
 
 def get_words_from_phrase(phrase, weight, lang="",
                           chars_punctuation=r"[\.\,\:\;\?\!\"]",
                           chars_alphanumericseparators=r"[1234567890\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]",
                           split=str.split):
     "Returns list of words from phrase 'phrase'."
     words = {}
     phrase = strip_accents(phrase)
     phrase = phrase.lower()
     #Getting rid of strange characters
     phrase = re.sub("&eacute;", 'e', phrase)
     phrase = re.sub("&egrave;", 'e', phrase)
     phrase = re.sub("&agrave;", 'a', phrase)
     phrase = re.sub("&nbsp;", ' ', phrase)
     phrase = re.sub("&laquo;", ' ', phrase)
     phrase = re.sub("&raquo;", ' ', phrase)
     phrase = re.sub("&ecirc;", ' ', phrase)
     phrase = re.sub("&amp;", ' ', phrase)
     if phrase.find("</") > -1:
         #Most likely html, remove html code
         phrase = re.sub("(?s)<[^>]*>|&#?\w+;", ' ', phrase)
     #removes http links
     phrase = re.sub("(?s)http://[^( )]*", '', phrase)
     phrase = re.sub(chars_punctuation, ' ', phrase)
 
     #By doing this like below, characters standing alone, like c a b is not added to the inedx, but when they are together with characters like c++ or c$ they are added.
     for word in split(phrase):
         if options["remove_stopword"] == "True" and not is_stopword(word, 1) and check_term(word, 0):
             if lang and lang !="none" and options["use_stemming"]:
                 word = stem(word, lang)
                 if not words.has_key(word):
                     words[word] = (0, 0)
             else:
                 if not words.has_key(word):
                     words[word] = (0, 0)
             words[word] = (words[word][0] + weight, 0)
         elif options["remove_stopword"] == "True" and not is_stopword(word, 1):
             phrase = re.sub(chars_alphanumericseparators, ' ', word)
             for word_ in split(phrase):
                 if lang and lang !="none" and options["use_stemming"]:
                     word_ = stem(word_, lang)
                 if word_:
                     if not words.has_key(word_):
                         words[word_] = (0,0)
                     words[word_] = (words[word_][0] + weight, 0)
     return words
 
 class WordTable:
     "A class to hold the words table."
 
     def __init__(self, tablename, fields_to_index, separators="[^\s]"):
         "Creates words table instance."
         self.tablename = tablename
         self.recIDs_in_mem = []
         self.fields_to_index = fields_to_index
         self.separators = separators
         self.value = {}
 
     def get_field(self, recID, tag):
         """Returns list of values of the MARC-21 'tag' fields for the
            record 'recID'."""
 
         out = []
         bibXXx = "bib" + tag[0] + tag[1] + "x"
         bibrec_bibXXx = "bibrec_" + bibXXx
         query = """SELECT value FROM %s AS b, %s AS bb
                 WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id
                 AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID, tag);
         res = run_sql(query)
         for row in res:
             out.append(row[0])
         return out
 
     def clean(self):
         "Cleans the words table."
         self.value={}
 
     def put_into_db(self, mode="normal"):
         """Updates the current words table in the corresponding DB
            rnkWORD table.  Mode 'normal' means normal execution,
            mode 'emergency' means words index reverting to old state.
            """
         write_message("%s %s wordtable flush started" % (self.tablename,mode))
         write_message('...updating %d words into %sR started' % \
                 (len(self.value), self.tablename[:-1]))
         task_update_progress("%s flushed %d/%d words" % (self.tablename, 0, len(self.value)))
 
         self.recIDs_in_mem = beautify_range_list(self.recIDs_in_mem)
 
         if mode == "normal":
             for group in self.recIDs_in_mem:
                 query = """UPDATE %sR SET type='TEMPORARY' WHERE id_bibrec
                 BETWEEN '%d' AND '%d' AND type='CURRENT'""" % \
                 (self.tablename[:-1], group[0], group[1])
                 write_message(query, verbose=9)
                 run_sql(query)
 
         nb_words_total = len(self.value)
         nb_words_report = int(nb_words_total/10)
         nb_words_done = 0
         for word in self.value.keys():
             self.put_word_into_db(word, self.value[word])
             nb_words_done += 1
             if nb_words_report!=0 and ((nb_words_done % nb_words_report) == 0):
                 write_message('......processed %d/%d words' % (nb_words_done, nb_words_total))
                 task_update_progress("%s flushed %d/%d words" % (self.tablename, nb_words_done, nb_words_total))
         write_message('...updating %d words into %s ended' % \
                 (nb_words_total, self.tablename), verbose=9)
 
         #if options["verbose"]:
         #    write_message('...updating reverse table %sR started' % self.tablename[:-1])
         if mode == "normal":
             for group in self.recIDs_in_mem:
                 query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec
                 BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \
                 (self.tablename[:-1], group[0], group[1])
                 write_message(query, verbose=9)
                 run_sql(query)
                 query = """DELETE FROM %sR WHERE id_bibrec
                 BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \
                 (self.tablename[:-1], group[0], group[1])
                 write_message(query, verbose=9)
                 run_sql(query)
             write_message('End of updating wordTable into %s' % self.tablename, verbose=9)
         elif mode == "emergency":
             write_message("emergency")
             for group in self.recIDs_in_mem:
                 query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec
                 BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \
                 (self.tablename[:-1], group[0], group[1])
                 write_message(query, verbose=9)
                 run_sql(query)
                 query = """DELETE FROM %sR WHERE id_bibrec
                 BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \
                 (self.tablename[:-1], group[0], group[1])
                 write_message(query, verbose=9)
                 run_sql(query)
             write_message('End of emergency flushing wordTable into %s' % self.tablename, verbose=9)
         #if options["verbose"]:
         #    write_message('...updating reverse table %sR ended' % self.tablename[:-1])
 
         self.clean()
         self.recIDs_in_mem = []
         write_message("%s %s wordtable flush ended" % (self.tablename, mode))
         task_update_progress("%s flush ended" % (self.tablename))
 
     def load_old_recIDs(self,word):
         """Load existing hitlist for the word from the database index files."""
         query = "SELECT hitlist FROM %s WHERE term=%%s" % self.tablename
         res = run_sql(query, (word,))
         if res:
             return deserialize_via_marshal(res[0][0])
         else:
             return None
 
     def merge_with_old_recIDs(self,word,recIDs, set):
         """Merge the system numbers stored in memory (hash of recIDs with value[0] > 0 or -1
         according to whether to add/delete them) with those stored in the database index
         and received in set universe of recIDs for the given word.
 
         Return 0 in case no change was done to SET, return 1 in case SET was changed.
         """
 
         set_changed_p = 0
         for recID,sign in recIDs.iteritems():
             if sign[0] == -1 and set.has_key(recID):
                 # delete recID if existent in set and if marked as to be deleted
                 del set[recID]
                 set_changed_p = 1
             elif sign[0] > -1 and not set.has_key(recID):
                 # add recID if not existent in set and if marked as to be added
                 set[recID] = sign
                 set_changed_p = 1
             elif sign[0] > -1 and sign[0] != set[recID][0]:
                 set[recID] = sign
                 set_changed_p = 1
 
         return set_changed_p
 
     def put_word_into_db(self, word, recIDs, split=str.split):
         """Flush a single word to the database and delete it from memory"""
         set = self.load_old_recIDs(word)
         #write_message("%s %s" % (word, self.value[word]))
         if set is not None: # merge the word recIDs found in memory:
             options["modified_words"][word] = 1
             if not self.merge_with_old_recIDs(word, recIDs, set):
                 # nothing to update:
                 write_message("......... unchanged hitlist for ``%s''" % word, verbose=9)
                 pass
             else:
                 # yes there were some new words:
                 write_message("......... updating hitlist for ``%s''" % word, verbose=9)
                 run_sql("UPDATE %s SET hitlist=%%s WHERE term=%%s" % self.tablename,
                         (serialize_via_marshal(set), word))
         else: # the word is new, will create new set:
             write_message("......... inserting hitlist for ``%s''" % word, verbose=9)
             set = self.value[word]
             if len(set) > 0:
                 #new word, add to list
                 options["modified_words"][word] = 1
                 try:
                     run_sql("INSERT INTO %s (term, hitlist) VALUES (%%s, %%s)" % self.tablename,
                             (word, serialize_via_marshal(set)))
                 except Exception, e:
                     ## FIXME: This is for debugging encoding errors
                     register_exception(prefix="Error when putting the term '%s' into db (hitlist=%s): %s\n" % (repr(word), set, e), alert_admin=True)
         if not set: # never store empty words
             run_sql("DELETE from %s WHERE term=%%s" % self.tablename,
                     (word,))
 
         del self.value[word]
 
     def display(self):
         "Displays the word table."
         keys = self.value.keys()
         keys.sort()
         for k in keys:
             write_message("%s: %s" % (k, self.value[k]))
 
     def count(self):
         "Returns the number of words in the table."
         return len(self.value)
 
     def info(self):
         "Prints some information on the words table."
         write_message("The words table contains %d words." % self.count())
 
     def lookup_words(self, word=""):
         "Lookup word from the words table."
 
         if not word:
             done = 0
             while not done:
                 try:
                     word = raw_input("Enter word: ")
                     done = 1
                 except (EOFError, KeyboardInterrupt):
                     return
 
         if self.value.has_key(word):
             write_message("The word '%s' is found %d times." \
                 % (word, len(self.value[word])))
         else:
             write_message("The word '%s' does not exist in the word file."\
                               % word)
 
     def update_last_updated(self, rank_method_code, starting_time=None):
         """Update last_updated column of the index table in the database.
         Puts starting time there so that if the task was interrupted for record download,
         the records will be reindexed next time."""
         if starting_time is None:
             return None
         write_message("updating last_updated to %s..." % starting_time, verbose=9)
         return run_sql("UPDATE rnkMETHOD SET last_updated=%s WHERE name=%s",
                        (starting_time, rank_method_code,))
 
     def add_recIDs(self, recIDs):
         """Fetches records which id in the recIDs arange list and adds
         them to the wordTable.  The recIDs arange list is of the form:
         [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]].
         """
         global chunksize
         flush_count = 0
         records_done = 0
         records_to_go = 0
 
         for arange in recIDs:
             records_to_go = records_to_go + arange[1] - arange[0] + 1
 
         time_started = time.time() # will measure profile time
         for arange in recIDs:
             i_low = arange[0]
             chunksize_count = 0
             while i_low <= arange[1]:
                 # calculate chunk group of recIDs and treat it:
                 i_high = min(i_low+task_get_option("flush")-flush_count-1,arange[1])
                 i_high = min(i_low+chunksize-chunksize_count-1, i_high)
                 try:
                     self.chk_recID_range(i_low, i_high)
                 except StandardError, e:
                     write_message("Exception caught: %s" % e, sys.stderr)
                     register_exception()
                     task_update_status("ERROR")
                     sys.exit(1)
                 write_message("%s adding records #%d-#%d started" % \
                         (self.tablename, i_low, i_high))
                 if CFG_CHECK_MYSQL_THREADS:
                     kill_sleepy_mysql_threads()
                 task_update_progress("%s adding recs %d-%d" % (self.tablename, i_low, i_high))
                 self.del_recID_range(i_low, i_high)
                 just_processed = self.add_recID_range(i_low, i_high)
                 flush_count = flush_count + i_high - i_low + 1
                 chunksize_count = chunksize_count + i_high - i_low + 1
                 records_done = records_done + just_processed
                 write_message("%s adding records #%d-#%d ended  " % \
                         (self.tablename, i_low, i_high))
                 if chunksize_count >= chunksize:
                     chunksize_count = 0
                 # flush if necessary:
                 if flush_count >= task_get_option("flush"):
                     self.put_into_db()
                     self.clean()
                     write_message("%s backing up" % (self.tablename))
                     flush_count = 0
                     self.log_progress(time_started,records_done,records_to_go)
                 # iterate:
                 i_low = i_high + 1
         if flush_count > 0:
             self.put_into_db()
             self.log_progress(time_started,records_done,records_to_go)
 
     def add_recIDs_by_date(self, dates=""):
         """Add recIDs modified between DATES[0] and DATES[1].
            If DATES is not set, then add records modified since the last run of
            the ranking method.
         """
         if not dates:
             write_message("Using the last update time for the rank method")
             query = """SELECT last_updated FROM rnkMETHOD WHERE name='%s'
             """ % options["current_run"]
             res = run_sql(query)
 
             if not res:
                 return
             if not res[0][0]:
                 dates = ("0000-00-00",'')
             else:
                 dates = (res[0][0],'')
 
         query = """SELECT b.id FROM bibrec AS b WHERE b.modification_date >=
         '%s'""" % dates[0]
         if dates[1]:
             query += "and b.modification_date <= '%s'" % dates[1]
         query += " ORDER BY b.id ASC"""
         res = run_sql(query)
 
         alist = create_range_list([row[0] for row in res])
         if not alist:
             write_message( "No new records added. %s is up to date" % self.tablename)
         else:
             self.add_recIDs(alist)
         return alist
 
 
     def add_recID_range(self, recID1, recID2):
         """Add records from RECID1 to RECID2."""
         wlist = {}
         normalize = {}
 
         self.recIDs_in_mem.append([recID1,recID2])
         # secondly fetch all needed tags:
 
         for (tag, weight, lang) in self.fields_to_index:
             if tag in tagToWordsFunctions.keys():
                 get_words_function = tagToWordsFunctions[tag]
             else:
                 get_words_function = get_words_from_phrase
             bibXXx = "bib" + tag[0] + tag[1] + "x"
             bibrec_bibXXx = "bibrec_" + bibXXx
             query = """SELECT bb.id_bibrec,b.value FROM %s AS b, %s AS bb
                     WHERE bb.id_bibrec BETWEEN %d AND %d
                     AND bb.id_bibxxx=b.id AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID1, recID2, tag)
             res = run_sql(query)
             nb_total_to_read = len(res)
             verbose_idx = 0     # for verbose pretty printing
             for row in res:
                 recID, phrase = row
                 if recID in options["validset"]:
                     if not wlist.has_key(recID): wlist[recID] = {}
                     new_words = get_words_function(phrase, weight, lang) # ,self.separators
                     wlist[recID] = dict_union(new_words,wlist[recID])
 
         # were there some words for these recIDs found?
         if len(wlist) == 0: return 0
         recIDs = wlist.keys()
         for recID in recIDs:
             # was this record marked as deleted?
             if "DELETED" in self.get_field(recID, "980__c"):
                 wlist[recID] = {}
                 write_message("... record %d was declared deleted, removing its word list" % recID, verbose=9)
             write_message("... record %d, termlist: %s" % (recID, wlist[recID]), verbose=9)
 
         # put words into reverse index table with FUTURE status:
         for recID in recIDs:
             run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'FUTURE')" % self.tablename[:-1],
                     (recID, serialize_via_marshal(wlist[recID])))
             # ... and, for new records, enter the CURRENT status as empty:
             try:
                 run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'CURRENT')" % self.tablename[:-1],
                         (recID, serialize_via_marshal([])))
             except DatabaseError:
                 # okay, it's an already existing record, no problem
                 pass
 
         # put words into memory word list:
         put = self.put
         for recID in recIDs:
             for (w, count) in wlist[recID].iteritems():
                 put(recID, w, count)
 
         return len(recIDs)
 
     def log_progress(self, start, done, todo):
         """Calculate progress and store it.
         start: start time,
         done: records processed,
         todo: total number of records"""
         time_elapsed = time.time() - start
         # consistency check
         if time_elapsed == 0 or done > todo:
             return
 
         time_recs_per_min = done/(time_elapsed/60.0)
         write_message("%d records took %.1f seconds to complete.(%1.f recs/min)"\
                 % (done, time_elapsed, time_recs_per_min))
 
         if time_recs_per_min:
             write_message("Estimated runtime: %.1f minutes" % \
                     ((todo-done)/time_recs_per_min))
 
     def put(self, recID, word, sign):
         "Adds/deletes a word to the word list."
         try:
             word = wash_index_term(word)
             if self.value.has_key(word):
                 # the word 'word' exist already: update sign
                 self.value[word][recID] = sign
                 # PROBLEM ?
             else:
                 self.value[word] = {recID: sign}
         except:
             write_message("Error: Cannot put word %s with sign %d for recID %s." % (word, sign, recID))
 
 
     def del_recIDs(self, recIDs):
         """Fetches records which id in the recIDs range list and adds
         them to the wordTable.  The recIDs range list is of the form:
         [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]].
         """
         count = 0
         for range in recIDs:
             self.del_recID_range(range[0],range[1])
             count = count + range[1] - range[0]
         self.put_into_db()
 
     def del_recID_range(self, low, high):
         """Deletes records with 'recID' system number between low
            and high from memory words index table."""
         write_message("%s fetching existing words for records #%d-#%d started" % \
                 (self.tablename, low, high), verbose=3)
         self.recIDs_in_mem.append([low,high])
         query = """SELECT id_bibrec,termlist FROM %sR as bb WHERE bb.id_bibrec
         BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high)
         recID_rows = run_sql(query)
         for recID_row in recID_rows:
             recID = recID_row[0]
             wlist = deserialize_via_marshal(recID_row[1])
             for word in wlist:
                 self.put(recID, word, (-1, 0))
         write_message("%s fetching existing words for records #%d-#%d ended" % \
                 (self.tablename, low, high), verbose=3)
 
     def report_on_table_consistency(self):
         """Check reverse words index tables (e.g. rnkWORD01R) for
         interesting states such as 'TEMPORARY' state.
         Prints small report (no of words, no of bad words).
         """
         # find number of words:
         query = """SELECT COUNT(*) FROM %s""" % (self.tablename)
         res = run_sql(query, None, 1)
         if res:
             nb_words = res[0][0]
         else:
             nb_words = 0
 
         # find number of records:
         query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1])
         res = run_sql(query, None, 1)
         if res:
             nb_records = res[0][0]
         else:
             nb_records = 0
 
         # report stats:
         write_message("%s contains %d words from %d records" % (self.tablename, nb_words, nb_records))
 
         # find possible bad states in reverse tables:
         query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1])
         res = run_sql(query)
         if res:
             nb_bad_records = res[0][0]
         else:
             nb_bad_records = 999999999
         if nb_bad_records:
             write_message("EMERGENCY: %s needs to repair %d of %d index records" % \
                 (self.tablename, nb_bad_records, nb_records))
         else:
             write_message("%s is in consistent state" % (self.tablename))
 
         return nb_bad_records
 
     def repair(self):
         """Repair the whole table"""
         # find possible bad states in reverse tables:
         query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1])
         res = run_sql(query, None, 1)
         if res:
             nb_bad_records = res[0][0]
         else:
             nb_bad_records = 0
 
         # find number of records:
         query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1])
         res = run_sql(query)
         if res:
             nb_records = res[0][0]
         else:
             nb_records = 0
 
         if nb_bad_records == 0:
             return
         query = """SELECT id_bibrec FROM %sR WHERE type <> 'CURRENT' ORDER BY id_bibrec""" \
                 % (self.tablename[:-1])
         res = run_sql(query)
         recIDs = create_range_list([row[0] for row in res])
 
         flush_count = 0
         records_done = 0
         records_to_go = 0
 
         for range in recIDs:
             records_to_go = records_to_go + range[1] - range[0] + 1
 
         time_started = time.time() # will measure profile time
         for range in recIDs:
             i_low = range[0]
             chunksize_count = 0
             while i_low <= range[1]:
                 # calculate chunk group of recIDs and treat it:
                 i_high = min(i_low+task_get_option("flush")-flush_count-1,range[1])
                 i_high = min(i_low+chunksize-chunksize_count-1, i_high)
                 try:
                     self.fix_recID_range(i_low, i_high)
                 except StandardError, e:
                     write_message("Exception caught: %s" % e, sys.stderr)
                     register_exception()
                     task_update_status("ERROR")
                     sys.exit(1)
 
                 flush_count = flush_count + i_high - i_low + 1
                 chunksize_count = chunksize_count + i_high - i_low + 1
                 records_done = records_done + i_high - i_low + 1
                 if chunksize_count >= chunksize:
                     chunksize_count = 0
                 # flush if necessary:
                 if flush_count >= task_get_option("flush"):
                     self.put_into_db("emergency")
                     self.clean()
                     flush_count = 0
                     self.log_progress(time_started,records_done,records_to_go)
                 # iterate:
                 i_low = i_high + 1
         if flush_count > 0:
             self.put_into_db("emergency")
             self.log_progress(time_started,records_done,records_to_go)
         write_message("%s inconsistencies repaired." % self.tablename)
 
     def chk_recID_range(self, low, high):
         """Check if the reverse index table is in proper state"""
         ## check db
         query = """SELECT COUNT(*) FROM %sR WHERE type <> 'CURRENT'
         AND id_bibrec BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high)
         res = run_sql(query, None, 1)
         if res[0][0]==0:
             write_message("%s for %d-%d is in consistent state"%(self.tablename,low,high))
             return # okay, words table is consistent
 
         ## inconsistency detected!
         write_message("EMERGENCY: %s inconsistencies detected..." % self.tablename)
         write_message("""EMERGENCY: Errors found. You should check consistency of the %s - %sR tables.\nRunning 'bibrank --repair' is recommended.""" \
             % (self.tablename, self.tablename[:-1]))
         raise StandardError
 
     def fix_recID_range(self, low, high):
         """Try to fix reverse index database consistency (e.g. table rnkWORD01R) in the low,high doc-id range.
 
         Possible states for a recID follow:
         CUR TMP FUT: very bad things have happened: warn!
         CUR TMP    : very bad things have happened: warn!
         CUR     FUT: delete FUT (crash before flushing)
         CUR        : database is ok
             TMP FUT: add TMP to memory and del FUT from memory
                      flush (revert to old state)
             TMP    : very bad things have happened: warn!
                 FUT: very bad things have happended: warn!
         """
 
         state = {}
         query = "SELECT id_bibrec,type FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d'"\
                 % (self.tablename[:-1], low, high)
         res = run_sql(query)
         for row in res:
             if not state.has_key(row[0]):
                 state[row[0]]=[]
             state[row[0]].append(row[1])
 
         ok = 1 # will hold info on whether we will be able to repair
         for recID in state.keys():
             if not 'TEMPORARY' in state[recID]:
                 if 'FUTURE' in state[recID]:
                     if 'CURRENT' not in state[recID]:
                         write_message("EMERGENCY: Index record %d is in inconsistent state. Can't repair it" % recID)
                         ok = 0
                     else:
                         write_message("EMERGENCY: Inconsistency in index record %d detected" % recID)
                         query = """DELETE FROM %sR
                         WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID)
                         run_sql(query)
                         write_message("EMERGENCY: Inconsistency in index record %d repaired." % recID)
             else:
                 if 'FUTURE' in state[recID] and not 'CURRENT' in state[recID]:
                     self.recIDs_in_mem.append([recID,recID])
                     # Get the words file
                     query = """SELECT type,termlist FROM %sR
                     WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID)
                     write_message(query, verbose=9)
                     res = run_sql(query)
                     for row in res:
                         wlist = deserialize_via_marshal(row[1])
                         write_message("Words are %s " % wlist, verbose=9)
                         if row[0] == 'TEMPORARY':
                             sign = 1
                         else:
                             sign = -1
                         for word in wlist:
                             self.put(recID, word, wlist[word])
 
                 else:
                     write_message("EMERGENCY: %s for %d is in inconsistent state. Couldn't repair it." % (self.tablename, recID))
                     ok = 0
 
         if not ok:
             write_message("""EMERGENCY: Unrepairable errors found. You should check consistency
                 of the %s - %sR tables. Deleting affected TEMPORARY and FUTURE entries
                 from these tables is recommended; see the BibIndex Admin Guide.
                 (The repairing procedure is similar for bibrank word indexes.)""" % (self.tablename, self.tablename[:-1]))
             raise StandardError
 
 def word_index(run):
     """Run the indexing task.  The row argument is the BibSched task
     queue row, containing if, arguments, etc.
     Return 1 in case of success and 0 in case of failure.
     """
-
-    ## import optional modules:
-    try:
-        import psyco
-        psyco.bind(get_words_from_phrase)
-        psyco.bind(WordTable.merge_with_old_recIDs)
-        psyco.bind(update_rnkWORD)
-        psyco.bind(check_rnkWORD)
-    except StandardError,e:
-        print "Warning: Psyco", e
-        pass
-
     global languages
 
     max_recid = 0
     res = run_sql("SELECT max(id) FROM bibrec")
     if res and res[0][0]:
         max_recid = int(res[0][0])
 
     options["run"] = []
     options["run"].append(run)
     for rank_method_code in options["run"]:
         task_sleep_now_if_required(can_stop_too=True)
         method_starting_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
         write_message("Running rank method: %s" % getName(rank_method_code))
         try:
             file = CFG_ETCDIR + "/bibrank/" + rank_method_code + ".cfg"
             config = ConfigParser.ConfigParser()
             config.readfp(open(file))
         except StandardError, e:
             write_message("Cannot find configurationfile: %s" % file, sys.stderr)
             raise StandardError
         options["current_run"] = rank_method_code
         options["modified_words"] = {}
         options["table"] = config.get(config.get("rank_method", "function"), "table")
         options["use_stemming"] = config.get(config.get("rank_method","function"),"stemming")
         options["remove_stopword"] = config.get(config.get("rank_method","function"),"stopword")
         tags = get_tags(config) #get the tags to include
         options["validset"] = get_valid_range(rank_method_code) #get the records from the collections the method is enabled for
         function = config.get("rank_method","function")
         wordTable = WordTable(options["table"], tags)
         wordTable.report_on_table_consistency()
         try:
             if task_get_option("cmd") == "del":
                 if task_get_option("id"):
                     wordTable.del_recIDs(task_get_option("id"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID,recID])
                     wordTable.del_recIDs(recIDs_range)
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     write_message("Missing IDs of records to delete from index %s.", wordTable.tablename,
                                   sys.stderr)
                     raise StandardError
             elif task_get_option("cmd") == "add":
                 if task_get_option("id"):
                     wordTable.add_recIDs(task_get_option("id"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID,recID])
                     wordTable.add_recIDs(recIDs_range)
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("last_updated"):
                     wordTable.add_recIDs_by_date("")
                     # only update last_updated if run via automatic mode:
                     wordTable.update_last_updated(rank_method_code, method_starting_time)
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("modified"):
                     wordTable.add_recIDs_by_date(task_get_option("modified"))
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     wordTable.add_recIDs([[0,max_recid]])
                     task_sleep_now_if_required(can_stop_too=True)
             elif task_get_option("cmd") == "repair":
                 wordTable.repair()
                 check_rnkWORD(options["table"])
                 task_sleep_now_if_required(can_stop_too=True)
             elif task_get_option("cmd") == "check":
                 check_rnkWORD(options["table"])
                 options["modified_words"] = {}
                 task_sleep_now_if_required(can_stop_too=True)
             elif task_get_option("cmd") == "stat":
                 rank_method_code_statistics(options["table"])
                 task_sleep_now_if_required(can_stop_too=True)
             else:
                 write_message("Invalid command found processing %s" % \
                      wordTable.tablename, sys.stderr)
                 raise StandardError
             update_rnkWORD(options["table"], options["modified_words"])
             task_sleep_now_if_required(can_stop_too=True)
         except StandardError, e:
             register_exception(alert_admin=True)
             write_message("Exception caught: %s" % e, sys.stderr)
             sys.exit(1)
         wordTable.report_on_table_consistency()
     # We are done. State it in the database, close and quit
 
     return 1
 
 def get_tags(config):
     """Get the tags that should be used creating the index and each tag's parameter"""
     tags = []
     function = config.get("rank_method","function")
     i = 1
     shown_error = 0
 
     #try:
     if 1:
         while config.has_option(function,"tag%s"% i):
             tag = config.get(function, "tag%s" % i)
             tag = tag.split(",")
             tag[1] = int(tag[1].strip())
             tag[2] = tag[2].strip()
 
             #check if stemmer for language is available
             if config.get(function, "stemming") and stem("information", "en") != "inform":
                 if shown_error == 0:
                     write_message("Warning: Stemming not working. Please check it out!")
                     shown_error = 1
             elif tag[2] and tag[2] != "none" and config.get(function,"stemming") and not is_stemmer_available_for_language(tag[2]):
                 write_message("Warning: Stemming not available for language '%s'." % tag[2])
             tags.append(tag)
             i += 1
     #except Exception:
     #    write_message("Could not read data from configuration file, please check for errors")
     #    raise StandardError
 
     return tags
 
 def get_valid_range(rank_method_code):
     """Returns which records are valid for this rank method, according to which collections it is enabled for."""
 
     #if options["verbose"] >=9:
     #    write_message("Getting records from collections enabled for rank method.")
     #res = run_sql("SELECT collection.name FROM collection,collection_rnkMETHOD,rnkMETHOD WHERE collection.id=id_collection and id_rnkMETHOD=rnkMETHOD.id and rnkMETHOD.name='%s'" %  rank_method_code)
     #l_of_colls = []
     #for coll in res:
     #    l_of_colls.append(coll[0])
     #if len(l_of_colls) > 0:
     #    recIDs = perform_request_search(c=l_of_colls)
     #else:
     #    recIDs = []
 
     valid = intbitset(trailing_bits=1)
     valid.discard(0)
 
     #valid.addlist(recIDs)
     return valid
 
 def check_term(term, termlength):
     """Check if term contains not allowed characters, or for any other reasons for not using this term."""
     try:
         if len(term) <= termlength:
             return False
         reg = re.compile(r"[1234567890\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]")
         if re.search(reg, term):
             return False
         term = str.replace(term, "-", "")
         term = str.replace(term, ".", "")
         term = str.replace(term, ",", "")
         if int(term):
             return False
     except StandardError, e:
         pass
     return True
 
 def check_rnkWORD(table):
     """Checks for any problems in rnkWORD tables."""
     i = 0
     errors = {}
     termslist = run_sql("SELECT term FROM %s" % table)
     N = run_sql("select max(id_bibrec) from %sR" % table[:-1])[0][0]
     write_message("Checking integrity of rank values in %s" % table)
     terms = map(lambda x: x[0], termslist)
 
     while i < len(terms):
         query_params = ()
         for j in range(i, ((i+5000)< len(terms) and (i+5000) or len(terms))):
             query_params += (terms[j],)
         terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE term IN (%s)" % (table, (len(query_params)*"%s,")[:-1]),
                              query_params)
         for (t, hitlist) in terms_docs:
             term_docs = deserialize_via_marshal(hitlist)
             if (term_docs.has_key("Gi") and term_docs["Gi"][1] == 0) or not term_docs.has_key("Gi"):
                 write_message("ERROR: Missing value for term: %s (%s) in %s: %s" % (t, repr(t), table, len(term_docs)))
                 errors[t] = 1
         i += 5000
     write_message("Checking integrity of rank values in %sR" % table[:-1])
     i = 0
     while i < N:
         docs_terms = run_sql("SELECT id_bibrec, termlist FROM %sR WHERE id_bibrec>=%s and id_bibrec<=%s" % (table[:-1], i, i+5000))
         for (j, termlist) in docs_terms:
             termlist = deserialize_via_marshal(termlist)
             for (t, tf) in termlist.iteritems():
                 if tf[1] == 0 and not errors.has_key(t):
                     errors[t] = 1
                     write_message("ERROR: Gi missing for record %s and term: %s (%s) in %s" % (j,t,repr(t), table))
                     terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE term=%%s" % table, (t,))
                     termlist = deserialize_via_marshal(terms_docs[0][1])
             i += 5000
 
     if len(errors) == 0:
         write_message("No direct errors found, but nonconsistent data may exist.")
     else:
         write_message("%s errors found during integrity check, repair and rebalancing recommended." % len(errors))
     options["modified_words"] = errors
 
 def rank_method_code_statistics(table):
     """Shows some statistics about this rank method."""
 
     maxID = run_sql("select max(id) from %s" % table)
     maxID = maxID[0][0]
     terms = {}
     Gi = {}
 
     write_message("Showing statistics of terms in index:")
     write_message("Important: For the 'Least used terms', the number of terms is shown first, and the number of occurences second.")
     write_message("Least used terms---Most important terms---Least important terms")
     i = 0
     while i < maxID:
         terms_docs=run_sql("SELECT term, hitlist FROM %s WHERE id>= %s and id < %s" % (table, i, i + 10000))
         for (t, hitlist) in terms_docs:
             term_docs=deserialize_via_marshal(hitlist)
             terms[len(term_docs)] = terms.get(len(term_docs), 0) + 1
             if term_docs.has_key("Gi"):
                 Gi[t] = term_docs["Gi"]
         i=i + 10000
     terms=terms.items()
     terms.sort(lambda x, y: cmp(y[1], x[1]))
     Gi=Gi.items()
     Gi.sort(lambda x, y: cmp(y[1], x[1]))
     for i in range(0, 20):
         write_message("%s/%s---%s---%s" % (terms[i][0],terms[i][1], Gi[i][0],Gi[len(Gi) - i - 1][0]))
 
 def update_rnkWORD(table, terms):
     """Updates rnkWORDF and rnkWORDR with Gi and Nj values. For each term in rnkWORDF, a Gi value for the term is added. And for each term in each document, the Nj value for that document is added. In rnkWORDR, the Gi value for each term in each document is added. For description on how things are computed, look in the hacking docs.
     table - name of forward index to update
     terms - modified terms"""
 
     stime = time.time()
     Gi = {}
     Nj = {}
     N = run_sql("select count(id_bibrec) from %sR" % table[:-1])[0][0]
 
     if len(terms) == 0 and task_get_option("quick") == "yes":
         write_message("No terms to process, ending...")
         return ""
     elif task_get_option("quick") == "yes": #not used -R option, fast calculation (not accurate)
         write_message("Beginning post-processing of %s terms" % len(terms))
 
         #Locating all documents related to the modified/new/deleted terms, if fast update,
         #only take into account new/modified occurences
         write_message("Phase 1: Finding records containing modified terms")
         terms = terms.keys()
         i = 0
 
         while i < len(terms):
             terms_docs = get_from_forward_index(terms, i, (i+5000), table)
             for (t, hitlist) in terms_docs:
                 term_docs = deserialize_via_marshal(hitlist)
                 if term_docs.has_key("Gi"):
                     del term_docs["Gi"]
                 for (j, tf) in term_docs.iteritems():
                     if (task_get_option("quick") == "yes" and tf[1] == 0) or task_get_option("quick") == "no":
                         Nj[j] = 0
             write_message("Phase 1: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms)))
             i += 5000
         write_message("Phase 1: Finished finding records containing modified terms")
 
         #Find all terms in the records found in last phase
         write_message("Phase 2: Finding all terms in affected records")
         records = Nj.keys()
         i = 0
         while i < len(records):
             docs_terms = get_from_reverse_index(records, i, (i + 5000), table)
             for (j, termlist) in docs_terms:
                 doc_terms = deserialize_via_marshal(termlist)
                 for (t, tf) in doc_terms.iteritems():
                     Gi[t] = 0
             write_message("Phase 2: ......processed %s/%s records " % ((i+5000>len(records) and len(records) or (i+5000)), len(records)))
             i += 5000
         write_message("Phase 2: Finished finding all terms in affected records")
 
     else: #recalculate
         max_id = run_sql("SELECT MAX(id) FROM %s" % table)
         max_id = max_id[0][0]
         write_message("Beginning recalculation of %s terms" % max_id)
 
         terms = []
         i = 0
         while i < max_id:
             terms_docs = get_from_forward_index_with_id(i, (i+5000), table)
             for (t, hitlist) in terms_docs:
                 Gi[t] = 0
                 term_docs = deserialize_via_marshal(hitlist)
                 if term_docs.has_key("Gi"):
                     del term_docs["Gi"]
                 for (j, tf) in term_docs.iteritems():
                     Nj[j] = 0
             write_message("Phase 1: ......processed %s/%s terms" % ((i+5000)>max_id and max_id or (i+5000), max_id))
             i += 5000
 
         write_message("Phase 1: Finished finding which records contains which terms")
         write_message("Phase 2: Jumping over..already done in phase 1 because of -R option")
 
     terms = Gi.keys()
     Gi = {}
     i = 0
     if task_get_option("quick") == "no":
         #Calculating Fi and Gi value for each term
         write_message("Phase 3: Calculating importance of all affected terms")
         while i < len(terms):
             terms_docs = get_from_forward_index(terms, i, (i+5000), table)
             for (t, hitlist) in terms_docs:
                 term_docs = deserialize_via_marshal(hitlist)
                 if term_docs.has_key("Gi"):
                     del term_docs["Gi"]
                 Fi = 0
                 Gi[t] = 1
                 for (j, tf) in term_docs.iteritems():
                     Fi += tf[0]
                 for (j, tf) in term_docs.iteritems():
                     if tf[0] != Fi:
                         Gi[t] = Gi[t] + ((float(tf[0]) / Fi) * math.log(float(tf[0]) / Fi) / math.log(2)) / math.log(N)
             write_message("Phase 3: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms)))
             i += 5000
         write_message("Phase 3: Finished calculating importance of all affected terms")
     else:
         #Using existing Gi value instead of calculating a new one. Missing some accurancy.
         write_message("Phase 3: Getting approximate importance of all affected terms")
         while i < len(terms):
             terms_docs = get_from_forward_index(terms, i, (i+5000), table)
             for (t, hitlist) in terms_docs:
                 term_docs = deserialize_via_marshal(hitlist)
                 if term_docs.has_key("Gi"):
                     Gi[t] = term_docs["Gi"][1]
                 elif len(term_docs) == 1:
                     Gi[t] = 1
                 else:
                     Fi = 0
                     Gi[t] = 1
                     for (j, tf) in term_docs.iteritems():
                         Fi += tf[0]
                     for (j, tf) in term_docs.iteritems():
                         if tf[0] != Fi:
                             Gi[t] = Gi[t] + ((float(tf[0]) / Fi) * math.log(float(tf[0]) / Fi) / math.log(2)) / math.log(N)
             write_message("Phase 3: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms)))
             i += 5000
         write_message("Phase 3: Finished getting approximate importance of all affected terms")
 
     write_message("Phase 4: Calculating normalization value for all affected records and updating %sR" % table[:-1])
     records = Nj.keys()
     i = 0
     while i < len(records):
         #Calculating the normalization value for each document, and adding the Gi value to each term in each document.
         docs_terms = get_from_reverse_index(records, i, (i + 5000), table)
         for (j, termlist) in docs_terms:
             doc_terms = deserialize_via_marshal(termlist)
             try:
                 for (t, tf) in doc_terms.iteritems():
                     if Gi.has_key(t):
                         Nj[j] = Nj.get(j, 0) + math.pow(Gi[t] * (1 + math.log(tf[0])), 2)
                         Git = int(math.floor(Gi[t]*100))
                         if Git >= 0:
                             Git += 1
                         doc_terms[t] = (tf[0], Git)
                     else:
                         Nj[j] = Nj.get(j, 0) + math.pow(tf[1] * (1 + math.log(tf[0])), 2)
                 Nj[j] = 1.0 / math.sqrt(Nj[j])
                 Nj[j] = int(Nj[j] * 100)
                 if Nj[j] >= 0:
                     Nj[j] += 1
                 run_sql("UPDATE %sR SET termlist=%%s WHERE id_bibrec=%%s" % table[:-1],
                         (serialize_via_marshal(doc_terms), j))
             except (ZeroDivisionError, OverflowError), e:
                 ## This is to try to isolate division by zero errors.
                 register_exception(prefix="Error when analysing the record %s (%s): %s\n" % (j, repr(docs_terms), e), alert_admin=True)
         write_message("Phase 4: ......processed %s/%s records" % ((i+5000>len(records) and len(records) or (i+5000)), len(records)))
         i += 5000
     write_message("Phase 4: Finished calculating normalization value for all affected records and updating %sR" % table[:-1])
     write_message("Phase 5: Updating %s with new normalization values" % table)
     i = 0
     terms = Gi.keys()
     while i < len(terms):
         #Adding the Gi value to each term, and adding the normalization value to each term in each document.
         terms_docs = get_from_forward_index(terms, i, (i+5000), table)
         for (t, hitlist) in terms_docs:
             try:
                 term_docs = deserialize_via_marshal(hitlist)
                 if term_docs.has_key("Gi"):
                     del term_docs["Gi"]
                 for (j, tf) in term_docs.iteritems():
                     if Nj.has_key(j):
                         term_docs[j] = (tf[0], Nj[j])
                 Git = int(math.floor(Gi[t]*100))
                 if Git >= 0:
                     Git += 1
                 term_docs["Gi"] = (0, Git)
                 run_sql("UPDATE %s SET hitlist=%%s WHERE term=%%s" % table,
                         (serialize_via_marshal(term_docs), t))
             except (ZeroDivisionError, OverflowError), e:
                 register_exception(prefix="Error when analysing the term %s (%s): %s\n" % (t, repr(terms_docs), e), alert_admin=True)
         write_message("Phase 5: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms)))
         i += 5000
     write_message("Phase 5:  Finished updating %s with new normalization values" % table)
     write_message("Time used for post-processing: %.1fmin" % ((time.time() - stime) / 60))
     write_message("Finished post-processing")
 
 
 def get_from_forward_index(terms, start, stop, table):
     terms_docs = ()
     for j in range(start, (stop < len(terms) and stop or len(terms))):
         terms_docs += run_sql("SELECT term, hitlist FROM %s WHERE term=%%s" % table,
                               (terms[j],))
     return terms_docs
 
 def get_from_forward_index_with_id(start, stop, table):
     terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE id BETWEEN %s AND %s" % (table, start, stop))
     return terms_docs
 
 def get_from_reverse_index(records, start, stop, table):
     current_recs = "%s" % records[start:stop]
     current_recs = current_recs[1:-1]
     docs_terms = run_sql("SELECT id_bibrec, termlist FROM %sR WHERE id_bibrec IN (%s)" % (table[:-1], current_recs))
     return docs_terms
 
 #def test_word_separators(phrase="hep-th/0101001"):
     #"""Tests word separating policy on various input."""
     #print "%s:" % phrase
     #gwfp = get_words_from_phrase(phrase)
     #for (word, count) in gwfp.iteritems():
         #print "\t-> %s - %s" % (word, count)
 
 def getName(methname, ln=CFG_SITE_LANG, type='ln'):
     """Returns the name of the rank method, either in default language or given language.
     methname = short name of the method
     ln - the language to get the name in
     type - which name "type" to get."""
 
     try:
         rnkid = run_sql("SELECT id FROM rnkMETHOD where name='%s'" % methname)
         if rnkid:
             rnkid = str(rnkid[0][0])
             res = run_sql("SELECT value FROM rnkMETHODNAME where type='%s' and ln='%s' and id_rnkMETHOD=%s" % (type, ln, rnkid))
             if not res:
                 res = run_sql("SELECT value FROM rnkMETHODNAME WHERE ln='%s' and id_rnkMETHOD=%s and type='%s'"  % (CFG_SITE_LANG, rnkid, type))
             if not res:
                 return methname
             return res[0][0]
         else:
             raise Exception
     except Exception, e:
         write_message("Cannot run rank method, either given code for method is wrong, or it has not been added using the webinterface.")
         raise Exception
 
 def word_similarity(run):
     """Call correct method"""
     return word_index(run)
diff --git a/modules/miscutil/lib/dbquery.py b/modules/miscutil/lib/dbquery.py
index 24581ffb7..eeed53256 100644
--- a/modules/miscutil/lib/dbquery.py
+++ b/modules/miscutil/lib/dbquery.py
@@ -1,359 +1,352 @@
 ## This file is part of Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """
 Invenio utilities to run SQL queries.
 
 The main API functions are:
     - run_sql()
     - run_sql_many()
 but see the others as well.
 """
 
 __revision__ = "$Id$"
 
 # dbquery clients can import these from here:
 # pylint: disable=W0611
 from MySQLdb import Warning, Error, InterfaceError, DataError, \
                     DatabaseError, OperationalError, IntegrityError, \
                     InternalError, NotSupportedError, \
                     ProgrammingError
 import string
 import time
 import marshal
 import re
 from zlib import compress, decompress
 from thread import get_ident
 from invenio.config import CFG_ACCESS_CONTROL_LEVEL_SITE, \
     CFG_MISCUTIL_SQL_USE_SQLALCHEMY, \
     CFG_MISCUTIL_SQL_RUN_SQL_MANY_LIMIT
 
 if CFG_MISCUTIL_SQL_USE_SQLALCHEMY:
     try:
         import sqlalchemy.pool as pool
         import MySQLdb as mysqldb
         mysqldb = pool.manage(mysqldb, use_threadlocal=True)
         connect = mysqldb.connect
     except ImportError:
         CFG_MISCUTIL_SQL_USE_SQLALCHEMY = False
         from MySQLdb import connect
 else:
     from MySQLdb import connect
 
 ## DB config variables.  These variables are to be set in
 ## invenio-local.conf by admins and then replaced in situ in this file
 ## by calling "inveniocfg --update-dbexec".
 ## Note that they are defined here and not in config.py in order to
 ## prevent them from being exported accidentally elsewhere, as no-one
 ## should know DB credentials but this file.
 ## FIXME: this is more of a blast-from-the-past that should be fixed
 ## both here and in inveniocfg when the time permits.
 CFG_DATABASE_HOST = 'localhost'
 CFG_DATABASE_PORT = '3306'
 CFG_DATABASE_NAME = 'invenio'
 CFG_DATABASE_USER = 'invenio'
 CFG_DATABASE_PASS = 'my123p$ss'
 
 _DB_CONN = {}
 
 def _db_login(relogin = 0):
     """Login to the database."""
 
     ## Note: we are using "use_unicode=False", because we want to
     ## receive strings from MySQL as Python UTF-8 binary string
     ## objects, not as Python Unicode string objects, as of yet.
 
     ## Note: "charset='utf8'" is needed for recent MySQLdb versions
     ## (such as 1.2.1_p2 and above).  For older MySQLdb versions such
     ## as 1.2.0, an explicit "init_command='SET NAMES utf8'" parameter
     ## would constitute an equivalent.  But we are not bothering with
     ## older MySQLdb versions here, since we are recommending to
     ## upgrade to more recent versions anyway.
 
     if CFG_MISCUTIL_SQL_USE_SQLALCHEMY:
         return connect(host=CFG_DATABASE_HOST, port=int(CFG_DATABASE_PORT),
                        db=CFG_DATABASE_NAME, user=CFG_DATABASE_USER,
                        passwd=CFG_DATABASE_PASS,
                        use_unicode=False, charset='utf8')
     else:
         thread_ident = get_ident()
     if relogin:
         _DB_CONN[thread_ident] = connect(host=CFG_DATABASE_HOST,
                                          port=int(CFG_DATABASE_PORT),
                                          db=CFG_DATABASE_NAME,
                                          user=CFG_DATABASE_USER,
                                          passwd=CFG_DATABASE_PASS,
                                          use_unicode=False, charset='utf8')
         return _DB_CONN[thread_ident]
     else:
         if _DB_CONN.has_key(thread_ident):
             return _DB_CONN[thread_ident]
         else:
             _DB_CONN[thread_ident] = connect(host=CFG_DATABASE_HOST,
                                              port=int(CFG_DATABASE_PORT),
                                              db=CFG_DATABASE_NAME,
                                              user=CFG_DATABASE_USER,
                                              passwd=CFG_DATABASE_PASS,
                                              use_unicode=False, charset='utf8')
             return _DB_CONN[thread_ident]
 
 def _db_logout():
     """Close a connection."""
     try:
         del _DB_CONN[get_ident()]
     except KeyError:
         pass
 
 def run_sql(sql, param=None, n=0, with_desc=0):
     """Run SQL on the server with PARAM and return result.
 
     @param param: tuple of string params to insert in the query (see
         notes below)
 
     @param n: number of tuples in result (0 for unbounded)
 
     @param with_desc: if True, will return a DB API 7-tuple describing
         columns in query.
 
     @return: If SELECT, SHOW, DESCRIBE statements, return tuples of
         data, followed by description if parameter with_desc is
         provided.  If INSERT, return last row id.  Otherwise return
         SQL result as provided by database.
 
     @note: When the site is closed for maintenance (as governed by the
         config variable CFG_ACCESS_CONTROL_LEVEL_SITE), do not attempt
         to run any SQL queries but return empty list immediately.
         Useful to be able to have the website up while MySQL database
         is down for maintenance, hot copies, table repairs, etc.
 
     @note: In case of problems, exceptions are returned according to
         the Python DB API 2.0.  The client code can import them from
         this file and catch them.
     """
 
     if CFG_ACCESS_CONTROL_LEVEL_SITE == 3:
         # do not connect to the database as the site is closed for maintenance:
         return []
 
     ### log_sql_query(sql, param) ### UNCOMMENT ONLY IF you REALLY want to log all queries
 
     if param:
         param = tuple(param)
 
     try:
         db = _db_login()
         cur = db.cursor()
         rc = cur.execute(sql, param)
     except OperationalError: # unexpected disconnect, bad malloc error, etc
         # FIXME: now reconnect is always forced, we may perhaps want to ping() first?
         try:
             db = _db_login(relogin=1)
             cur = db.cursor()
             rc = cur.execute(sql, param)
         except OperationalError: # again an unexpected disconnect, bad malloc error, etc
             raise
 
     if string.upper(string.split(sql)[0]) in ("SELECT", "SHOW", "DESC", "DESCRIBE"):
         if n:
             recset = cur.fetchmany(n)
         else:
             recset = cur.fetchall()
         if with_desc:
             return recset, cur.description
         else:
             return recset
     else:
         if string.upper(string.split(sql)[0]) == "INSERT":
             rc =  cur.lastrowid
         return rc
 
 def run_sql_many(query, params, limit=CFG_MISCUTIL_SQL_RUN_SQL_MANY_LIMIT):
     """Run SQL on the server with PARAM.
     This method does executemany and is therefore more efficient than execute
     but it has sense only with queries that affect state of a database
     (INSERT, UPDATE). That is why the results just count number of affected rows
 
     @param params: tuple of tuple of string params to insert in the query
 
     @param limit: query will be executed in parts when number of
          parameters is greater than limit (each iteration runs at most
          `limit' parameters)
 
     @return: SQL result as provided by database
     """
     i = 0
     r = None
     while i < len(params):
         ## make partial query safely (mimicking procedure from run_sql())
         try:
             db = _db_login()
             cur = db.cursor()
             rc = cur.executemany(query, params[i:i+limit])
         except OperationalError:
             try:
                 db = _db_login(relogin=1)
                 cur = db.cursor()
                 rc = cur.executemany(query, params[i:i+limit])
             except OperationalError:
                 raise
         ## collect its result:
         if r is None:
             r = rc
         else:
             r += rc
         i += limit
     return r
 
 def blob_to_string(ablob):
     """Return string representation of ABLOB.  Useful to treat MySQL
     BLOBs in the same way for both recent and old MySQLdb versions.
     """
     if ablob:
         if type(ablob) is str:
             # BLOB is already a string in MySQLdb 0.9.2
             return ablob
         else:
             # BLOB is array.array in MySQLdb 1.0.0 and later
             return ablob.tostring()
     else:
         return ablob
 
 def log_sql_query(sql, param=None):
     """Log SQL query into prefix/var/log/dbquery.log log file.  In order
        to enable logging of all SQL queries, please uncomment one line
        in run_sql() above. Useful for fine-level debugging only!
     """
     from invenio.config import CFG_LOGDIR
     from invenio.dateutils import convert_datestruct_to_datetext
     from invenio.textutils import indent_text
     log_path = CFG_LOGDIR + '/dbquery.log'
     date_of_log = convert_datestruct_to_datetext(time.localtime())
     message = date_of_log + '-->\n'
     message += indent_text('Query:\n' + indent_text(str(sql), 2, wrap=True), 2)
     message += indent_text('Params:\n' + indent_text(str(param), 2, wrap=True), 2)
     message += '-----------------------------\n\n'
     try:
         log_file = open(log_path, 'a+')
         log_file.writelines(message)
         log_file.close()
     except:
         pass
 
 def get_table_update_time(tablename):
     """Return update time of TABLENAME.  TABLENAME can contain
        wildcard `%' in which case we return the maximum update time
        value.
     """
     # Note: in order to work with all of MySQL 4.0, 4.1, 5.0, this
     # function uses SHOW TABLE STATUS technique with a dirty column
     # position lookup to return the correct value.  (Making use of
     # Index_Length column that is either of type long (when there are
     # some indexes defined) or of type None (when there are no indexes
     # defined, e.g. table is empty).  When we shall use solely
     # MySQL-5.0, we can employ a much cleaner technique of using
     # SELECT UPDATE_TIME FROM INFORMATION_SCHEMA.TABLES WHERE
     # table_name='collection'.
     res = run_sql("SHOW TABLE STATUS LIKE %s", (tablename, ))
     update_times = [] # store all update times
     for row in res:
         if type(row[10]) is long or \
            row[10] is None:
             # MySQL-4.1 and 5.0 have creation_time in 11th position,
             # so return next column:
             update_times.append(str(row[12]))
         else:
             # MySQL-4.0 has creation_time in 10th position, which is
             # of type datetime.datetime or str (depending on the
             # version of MySQLdb), so return next column:
             update_times.append(str(row[11]))
     return max(update_times)
 
 def get_table_status_info(tablename):
     """Return table status information on TABLENAME.  Returned is a
        dict with keys like Name, Rows, Data_length, Max_data_length,
        etc.  If TABLENAME does not exist, return empty dict.
     """
     # Note: again a hack so that it works on all MySQL 4.0, 4.1, 5.0
     res = run_sql("SHOW TABLE STATUS LIKE %s", (tablename, ))
     table_status_info = {} # store all update times
     for row in res:
         if type(row[10]) is long or \
            row[10] is None:
             # MySQL-4.1 and 5.0 have creation time in 11th position:
             table_status_info['Name'] = row[0]
             table_status_info['Rows'] = row[4]
             table_status_info['Data_length'] = row[6]
             table_status_info['Max_data_length'] = row[8]
             table_status_info['Create_time'] = row[11]
             table_status_info['Update_time'] = row[12]
         else:
             # MySQL-4.0 has creation_time in 10th position, which is
             # of type datetime.datetime or str (depending on the
             # version of MySQLdb):
             table_status_info['Name'] = row[0]
             table_status_info['Rows'] = row[3]
             table_status_info['Data_length'] = row[5]
             table_status_info['Max_data_length'] = row[7]
             table_status_info['Create_time'] = row[10]
             table_status_info['Update_time'] = row[11]
     return table_status_info
 
 def serialize_via_marshal(obj):
     """Serialize Python object via marshal into a compressed string."""
     return compress(marshal.dumps(obj))
 
 def deserialize_via_marshal(astring):
     """Decompress and deserialize string into a Python object via marshal."""
     return marshal.loads(decompress(astring))
 
-try:
-    import psyco
-    psyco.bind(serialize_via_marshal)
-    psyco.bind(deserialize_via_marshal)
-except StandardError, e:
-    pass
-
 def wash_table_column_name(colname):
     """
     Evaluate table-column name to see if it is clean.
     This function accepts only names containing [a-zA-Z0-9_].
 
     @param colname: The string to be checked
     @type colname: str
 
     @return: colname if test passed
     @rtype: str
 
     @raise Exception: Raises an exception if colname is invalid.
     """
     if re.search('[^\w]', colname):
         raise Exception('The table column %s is not valid.' % repr(colname))
     return colname
 
 def real_escape_string(unescaped_string):
     """
     Escapes special characters in the unescaped string for use in a DB query.
 
     @param unescaped_string: The string to be escaped
     @type unescaped_string: str
 
     @return: Returns the escaped string
     @rtype: str
     """
     connection_object = _db_login()
     escaped_string = connection_object.escape_string(unescaped_string)
     return escaped_string
diff --git a/modules/webhelp/web/hacking/coding-style.webdoc b/modules/webhelp/web/hacking/coding-style.webdoc
index 6361818a0..c1eebfb87 100644
--- a/modules/webhelp/web/hacking/coding-style.webdoc
+++ b/modules/webhelp/web/hacking/coding-style.webdoc
@@ -1,229 +1,228 @@
 ## -*- mode: html; coding: utf-8; -*-
 
 ## This file is part of Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 <!-- WebDoc-Page-Title: Coding Style -->
 <!-- WebDoc-Page-Navtrail: <a class="navtrail" href="<CFG_SITE_URL>/help/hacking">Hacking Invenio</a> -->
 <!-- WebDoc-Page-Revision: $Id$ -->
 
 <p>A brief description of things we strive at, more or less unsuccessfully.</p>
 
 <h2>1. Packaging</h2>
 
 <p>We use the classical GNU Autoconf/Automake approach, for tutorial
 see
 e.g. <a href="http://www.amath.washington.edu/~lf/tutorials/autoconf/tutorial_toc.html">Learning
 the GNU development tools</a> or
 the <a href="http://sources.redhat.com/autobook/autobook/autobook_toc.html">AutoBook</a>.</p>
 
 <h2>2. Modules</h2>
 
 <p>Invenio started as a set of pretty independent modules
 developed by independent people with independent styles.  This was
 even more pronounced by the original use of many different languages
 (e.g. Python, PHP, Perl).  Now the Invenio code base is striving
 to use Python everywhere, except in speed-critical parts when a
 compiled language such as Common Lisp may come to the rescue in the
 near future.</p>
 
 <p>When modifying an existing module, we propose to strictly continue
 using whatever coding style the module was originally written into.
 When writing new modules, we propose to stick to the below-mentioned
 standards.</p>
 
 <p>The code integration across modules is happening, but is slow.
 Therefore, don't be surprised to see that there is a lot of room to
 refactor.</p>
 
 <h2>3. Python</h2>
 
 <p>We aim at following recommendations
 from <a href="http://www.python.org/peps/pep-0008.html">PEP 8</a>,
 although the existing code surely do not fulfil them here and there.
 The code indentation is done via spaces only, please do not use tabs.
 One tab counts as four spaces.  Emacs users can look into our
 <a href="https://twiki.cern.ch/twiki/bin/view/CDS/EmacsTips">Emacs
 Tips wiki page</a> for inspiration.</p>
 
 <p>All the Python code should be extensively documented via
 docstrings, so you can always run pydoc file.py to peruse the file's
 documentation in one simple go.  We follow
 the <a href="http://epydoc.sourceforge.net/manual-epytext.html">epytext</a>
 docstring markup, which generates nice
 HTML <a href="http://invenio-software.org/code-browser/">source
 code documentation.</a></p>
 
 <p>Do not forget to run pylint on your code to check for errors like
 uninitialized variables and to improve its quality and conformance to
 the coding standard.  If you develop in Emacs, run M-x pylint RET on
 your buffers frequently.  Read and implement pylint suggestions.
 (Note that using lambda and friends may lead to false pylint warnings.
 You can switch them off by putting block comments of the form ``#
 pylint: disable=C0301''.)</p>
 
 <p>Do not forget to run pychecker on your code either.  It is another
 source code checker that catches some situations better and some
 situations worse than pylint.  If you develop in Emacs, run C-c C-w
-(M-x py-pychecker-run RET) on your buffers frequently.  (Note that
-using psyco on classes may lead to false pychecker warnings.)</p>
+(M-x py-pychecker-run RET) on your buffers frequently.</p>
 
 <p>You can check the kwalitee of your code by running ``python
 modules/miscutil/lib/kwalitee.py --check-all *.py'' on your files.
 This will run some basic error checking, warning checking, indentation
 checking, but also compliance to PEP 8.  You can also check the code
 kwalitee stats across all the modules by running ``make
 kwalitee-check'' in the main source directory.</p>
 
 <p>Do not hardcode magic constants in your code.  Every magic string
 or a number should be put into accompanying file_config.py with symbol
 name beginning by cfg_modulename_*.</p>
 
 <p>Clearly separate interfaces from implementation.  Document your
 interfaces.  Do not expose to other modules anything that does not
 have to be exposed.  Apply principle of least information.</p>
 
 <p>Create as few new library files as possible.  Do not create many
 nested files in nested modules; rather put all the lib files in one
 dir with bibindex_foo and bibindex_bar names.</p>
 
 <p>Use imperative/functional paradigm rather then OO.  If you do use
 OO, then stick to as simple class hierarchy as possible.  Recall that
 method calls and exception handling in Python are quite expensive.</p>
 
 <p>Use rather the good old foo_bar naming convention for symbols (both
 variables and function names) instead of fooBar CaMelCaSe convention.
 (Except for Class names where UppercaseSymbolNames are to be
 used.)</p>
 
 <p>Pay special attention to name your symbols descriptively.  Your
 code is going to be read and work with by others and its symbols
 should be self-understandable without any comments and without
 studying other parts of the code.  For example, use proper English
 words, not abbreviations that can be misspelled in many a way; use
 words that go in pair (e.g. create/destroy, start/stop; never
 create/stop); use self-understandable symbol names
 (e.g. list_of_file_extensions rather than list2); never misname
 symbols (e.g. score_list should hold the list of scores and nothing
 else - if in the course of development you change the semantics of
 what the symbol holds then change the symbol name too).  Do not be
 afraid to use long descriptive names; good editors such as Emacs can
 tab-complete symbols for you.</p>
 
 <p>When hacking module A, pay close attention to ressemble existing
 coding convention in A, even if it is legacy-weird and even if we use
 a different technique elsewhere.  (Unless the whole module A is going
 to be refactored, of course.)</p>
 
 <p>Speed-critical parts should be profiled with pyprof or our built-in
 web profiler (<code>&profile=t</code>).</p>
 
 <p>The code should be well tested before committed.  Testing is an
 integral part of the development process.  Test along as you
 program.  The testing process should be automatized via our unit
 test and regression test suite infrastructures.  Please read the
 <a href="test-suite">test suite strategy</a> to know more.</p>
 
 <p>Python promotes writing clear, readable, easily maintainable code.
 Write it as such.  Recall Albert Einstein's ``Everything should be
 made as simple as possible, but not simpler''.  Things should be
 neither overengineered nor oversimplified.</p>
 
 <p>Recall principles Unix is built upon.  As summarized by Eric
 S. Reymond's <a href="http://www.catb.org/esr/writings/taoup/html/ch01s06.html#id2877537">TAOUP</a>:
 
 <ul>
   <li>Rule of Modularity: Write simple parts connected by clean interfaces.
   <li>Rule of Clarity: Clarity is better than cleverness.
   <li>Rule of Composition: Design programs to be connected with other programs.
   <li>Rule of Separation: Separate policy from mechanism; separate interfaces from engines.
   <li>Rule of Simplicity: Design for simplicity; add complexity only where you must.
   <li>Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do.
   <li>Rule of Transparency: Design for visibility to make inspection and debugging easier.
   <li>Rule of Robustness: Robustness is the child of transparency and simplicity.
   <li>Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust.
   <li>Rule of Least Surprise: In interface design, always do the least surprising thing.
   <li>Rule of Silence: When a program has nothing surprising to say, it should say nothing.
   <li>Rule of Repair: Repair what you can -- but when you must fail, fail noisily and as soon as possible.
   <li>Rule of Economy: Programmer time is expensive; conserve it in preference to machine time.
   <li>Rule of Generation: Avoid hand-hacking; write programs to write programs when you can.
   <li>Rule of Optimization: Prototype before polishing. Get it working before you optimize it.
   <li>Rule of Diversity: Distrust all claims for one true way.
   <li>Rule of Extensibility: Design for the future, because it will be here sooner than you think.
 </ul>
 
    or the golden rule that says it all: ``keep it simple''.</p>
 
 <p>Think of security and robustness from the start.
 Follow <a href="http://security.web.cern.ch/security/SecureSoftware/checklist.htm">secure
 programming guidelines</a>.</p>
 
 <p>For more hints, thoughts, and other ruminations on programming, see
 our <a href="https://twiki.cern.ch/twiki/bin/view/CDS/Invenio">CDS
 Invenio wiki</a>,
 notably <a href="https://twiki.cern.ch/twiki/bin/view/CDS/GitWorkflow">Git
 Workflow</a>
 and <a href="https://twiki.cern.ch/twiki/bin/view/CDS/InvenioQualityAssurance">Invenio
 QA</a>.</p>
 
 <h2>3. MySQL</h2>
 
 <p>Table naming policy is, roughly and briefly:</p>
 
 <ul>
   <li>"foo": table names in lowercase, without prefix, used by me for
          WebSearch
 
   <li>"foo_bar": underscores represent M:N relationship between "foo"
         and "bar", to tie the two tables together
 
   <li>"bib*": many tables to hold the metadata and relationships
          between them
 
   <li>"idx*": idx is the table name prefix used by BibIndex
 
   <li>"rnk*": rnk is the table name prefix used by BibRank
 
   <li>"fmt*": fmt is the table name prefix used by BibFormat
 
   <li>"sbm*": sbm is the table name prefix used by WebSubmit
 
   <li>"sch*": sch is the table name prefix used by BibSched
 
   <li>"acc*": acc is the table name prefix used by WebAccess
 
   <li>"bsk*": acc is the table name prefix used by WebBasket
 
   <li>"msg*": acc is the table name prefix used by WebMessage
 
   <li>"cls*": acc is the table name prefix used by BibClassify
 
   <li>"sta*": acc is the table name prefix used by WebStat
 
   <li>"jrn*": acc is the table name prefix used by WebJournal
 
   <li>"collection*": many tables to describe collections and search
         interface pages
 
   <li>"user*" : many tables to describe personal features (baskets,
         alerts)
 
   <li>"hst*": tables related to historical versions of metadata and
         fulltext files
 </ul>
 
 - end of file -