diff --git a/INSTALL b/INSTALL
index b1dc674e2..0ed827133 100644
--- a/INSTALL
+++ b/INSTALL
@@ -1,824 +1,825 @@
 Invenio INSTALLATION
 ====================
 
 About
 =====
 
 This document specifies how to build, customize, and install Invenio
 v1.1.1 for the first time.  See RELEASE-NOTES if you are upgrading
 from a previous Invenio release.
 
 Contents
 ========
 
 0. Prerequisites
 1. Quick instructions for the impatient Invenio admin
 2. Detailed instructions for the patient Invenio admin
 
 0. Prerequisites
 ================
 
    Here is the software you need to have around before you
    start installing Invenio:
 
      a) Unix-like operating system.  The main development and
         production platforms for Invenio at CERN are GNU/Linux
         distributions Debian, Gentoo, Scientific Linux (aka RHEL),
         Ubuntu, but we also develop on Mac OS X.  Basically any Unix
         system supporting the software listed below should do.
 
         If you are using Debian GNU/Linux ``Lenny'' or later, then you
         can install most of the below-mentioned prerequisites and
         recommendations by running:
 
           $ sudo aptitude install python-dev apache2-mpm-prefork \
               mysql-server mysql-client python-mysqldb \
               python-4suite-xml python-simplejson python-xml \
               python-libxml2 python-libxslt1 gnuplot poppler-utils \
               gs-common clisp gettext libapache2-mod-wsgi unzip \
               python-dateutil python-rdflib \
               python-gnuplot python-magic pdftk html2text giflib-tools \
               pstotext netpbm python-pypdf python-chardet python-lxml
 
         You may also want to install some of the following packages,
         if you have them available on your concrete architecture:
 
           $ sudo aptitude install sbcl cmucl pylint pychecker pyflakes \
               python-profiler python-epydoc libapache2-mod-xsendfile \
               openoffice.org python-utidylib python-beautifulsoup
 
         Moreover, you should install some Message Transfer Agent (MTA)
         such as Postfix so that Invenio can email notification
         alerts or registration information to the end users, contact
         moderators and reviewers of submitted documents, inform
         administrators about various runtime system information, etc:
 
           $ sudo aptitude install postfix
 
         After running the above-quoted aptitude command(s), you can
         proceed to configuring your MySQL server instance
         (max_allowed_packet in my.cnf, see item 0b below) and then to
         installing the Invenio software package in the section 1
         below.
 
         If you are using another operating system, then please
         continue reading the rest of this prerequisites section, and
         please consult our wiki pages for any concrete hints for your
         specific operating system.
         <https://twiki.cern.ch/twiki/bin/view/CDS/Invenio>
 
      b) MySQL server (may be on a remote machine), and MySQL client
         (must be available locally too).  MySQL versions 4.1 or 5.0
         are supported.  Please set the variable "max_allowed_packet"
         in your "my.cnf" init file to at least 4M.  (For sites such as
         INSPIRE, having 1M records with 10M citer-citee pairs in its
         citation map, you may need to increase max_allowed_packet to
         1G.)  You may perhaps also want to run your MySQL server
         natively in UTF-8 mode by setting "default-character-set=utf8"
         in various parts of your "my.cnf" file, such as in the
         "[mysql]" part and elsewhere; but this is not really required.
         <http://mysql.com/>
 
      c) Apache 2 server, with support for loading DSO modules, and
         optionally with SSL support for HTTPS-secure user
         authentication, and mod_xsendfile for off-loading file
         downloads away from Invenio processes to Apache.
           <http://httpd.apache.org/>
           <http://tn123.ath.cx/mod_xsendfile/>
 
      d) Python v2.4 or above:
           <http://python.org/>
         as well as the following Python modules:
           - (mandatory) MySQLdb (version >= 1.2.1_p2; see below)
              <http://sourceforge.net/projects/mysql-python>
           - (recommended) python-dateutil, for complex date processing:
              <http://labix.org/python-dateutil>
           - (recommended) PyXML, for XML processing:
              <http://pyxml.sourceforge.net/topics/download.html>
           - (recommended) PyRXP, for very fast XML MARC processing:
              <http://www.reportlab.org/pyrxp.html>
           - (recommended) lxml, for XML/XLST processing:
              <http://lxml.de/>
           - (recommended) libxml2-python, for XML/XLST processing:
              <ftp://xmlsoft.org/libxml2/python/>
           - (recommended) simplejson, for AJAX apps:
              <http://undefined.org/python/#simplejson>
              Note that if you are using Python-2.6, you don't need to
              install simplejson, because the module is already included
              in the main Python distribution.
           - (recommended) Gnuplot.Py, for producing graphs:
              <http://gnuplot-py.sourceforge.net/>
           - (recommended) Snowball Stemmer, for stemming:
              <http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz>
           - (recommended) py-editdist, for record merging:
              <http://www.mindrot.org/projects/py-editdist/>
           - (recommended) numpy, for citerank methods:
              <http://numpy.scipy.org/>
           - (recommended) magic, for full-text file handling:
              <http://www.darwinsys.com/file/>
           - (optional) chardet, for character encoding detection:
              <http://chardet.feedparser.org/>
           - (optional) 4suite, slower alternative to PyRXP and
              libxml2-python:
              <http://4suite.org/>
           - (optional) feedparser, for web journal creation:
              <http://feedparser.org/>
           - (optional) RDFLib, to use RDF ontologies and thesauri:
              <http://rdflib.net/>
           - (optional) mechanize, to run regression web test suite:
              <http://wwwsearch.sourceforge.net/mechanize/>
           - (optional) python-mock, mocking library for the test suite:
              <http://www.voidspace.org.uk/python/mock/>
           - (optional) hashlib, needed only for Python-2.4 and only
              if you would like to use AWS connectivity:
              <http://pypi.python.org/pypi/hashlib>
           - (optional) utidylib, for HTML washing:
 	     <http://utidylib.berlios.de/>
           - (optional) Beautiful Soup, for HTML washing:
 	     <http://www.crummy.com/software/BeautifulSoup/>
           - (optional) Python Twitter (and its dependencies) if you want
              to use the Twitter Fetcher bibtasklet:
              <http://code.google.com/p/python-twitter/>
 
         Note: MySQLdb version 1.2.1_p2 or higher is recommended.  If
               you are using an older version of MySQLdb, you may get
               into problems with character encoding.
 
      e) mod_wsgi Apache module.  Versions 3.x and above are
         recommended.
           <http://code.google.com/p/modwsgi/>
 
         Note: if you are using Python 2.4 or earlier, then you should
               also install the wsgiref Python module, available from:
               <http://pypi.python.org/pypi/wsgiref/> (As of Python 2.5
               this module is included in standard Python
               distribution.)
 
      f) If you want to be able to extract references from PDF fulltext
         files, then you need to install pdftotext version 3 at least.
           <http://poppler.freedesktop.org/>
           <http://www.foolabs.com/xpdf/home.html>
 
      g) If you want to be able to search for words in the fulltext
         files (i.e. to have fulltext indexing) or to stamp submitted
         files, then you need as well to install some of the following
         tools:
           - for Microsoft Office/OpenOffice.org document conversion:
                 OpenOffice.org
               <http://www.openoffice.org/>
           - for PDF file stamping: pdftk, pdf2ps
               <http://www.accesspdf.com/pdftk/>
               <http://www.cs.wisc.edu/~ghost/doc/AFPL/>
           - for PDF files: pdftotext or pstotext
               <http://poppler.freedesktop.org/>
               <http://www.foolabs.com/xpdf/home.html>
               <http://www.cs.wisc.edu/~ghost/doc/AFPL/>
           - for PostScript files: pstotext or ps2ascii
               <http://www.cs.wisc.edu/~ghost/doc/AFPL/>
           - for DjVu creation, elaboration: DjVuLibre
               <http://djvu.sourceforge.net>
           - to perform OCR: OCRopus (tested only with release 0.3.1)
               <http://code.google.com/p/ocropus/>
           - to perform different image elaborations: ImageMagick
               <http://www.imagemagick.org/>
-          - to generate PDF after OCR: netpbm, ReportLab and pyPdf
+          - to generate PDF after OCR: netpbm, ReportLab and pyPdf or pyPdf2
               <http://netpbm.sourceforge.net/>
               <http://www.reportlab.org/rl_toolkit.html>
               <http://pybrary.net/pyPdf/>
+              <http://knowah.github.io/PyPDF2/>
 
      h) If you have chosen to install fast XML MARC Python processors
         in the step d) above, then you have to install the parsers
         themselves:
           - (optional) 4suite:
              <http://4suite.org/>
 
      i) (recommended) Gnuplot, the command-line driven interactive
         plotting program.  It is used to display download and citation
         history graphs on the Detailed record pages on the web
         interface.  Note that Gnuplot must be compiled with PNG output
         support, that is, with the GD library.  Note also that Gnuplot
         is not required, only recommended.
           <http://www.gnuplot.info/>
 
      j) (recommended) A Common Lisp implementation, such as CLISP,
         SBCL or CMUCL.  It is used for the web server log analysing
         tool and the metadata checking program.  Note that any of the
         three implementations CLISP, SBCL, or CMUCL will do.  CMUCL
         produces fastest machine code, but it does not support UTF-8
         yet.  Pick up CLISP if you don't know what to do.  Note that a
         Common Lisp implementation is not required, only recommended.
           <http://clisp.cons.org/>
           <http://www.cons.org/cmucl/>
           <http://sbcl.sourceforge.net/>
 
      k) GNU gettext, a set of tools that makes it possible to
         translate the application in multiple languages.
            <http://www.gnu.org/software/gettext/>
         This is available by default on many systems.
 
      l) (recommended) xlwt 0.7.2, Library to create spreadsheet files
         compatible with MS Excel 97/2000/XP/2003 XLS files, on any
 	platform, with Python 2.3 to 2.6
 	   <http://pypi.python.org/pypi/xlwt>
 
      m) (recommended) matplotlib 1.0.0 is a python 2D plotting library
      	which produces publication quality figures in a variety of
 	hardcopy formats and interactive environments across
 	platforms. matplotlib can be used in python scripts, the
 	python and ipython  shell (ala MATLAB®	or Mathematica®),
 	web application servers, and six graphical user	interface
 	toolkits. It is used to generate pie graphs in the custom
 	summary query (WebStat)
 	   <http://matplotlib.sourceforge.net>
 
      n) (optional) FFmpeg, an open-source tools an libraries collection
         to convert video and audio files. It makes use of both internal
         as well as external libraries to generate videos for the web, such
         as Theora, WebM and H.264 out of almost any thinkable video input.
         FFmpeg is needed to run video related modules and submission workflows
         in Invenio. The minimal configuration of ffmpeg for the Invenio demo site
         requires a number of external libraries. It is highly recommended
         to remove all installed versions and packages that are comming with
         various Linux distributions and install the latest versions from
         sources. Additionally, you will need the Mediainfo Library for multimedia
         metadata handling.
         Minimum libraries for the demo site:
         - the ffmpeg multimedia encoder tools
           <http://ffmpeg.org/>
         - a library for jpeg images needed for thumbnail extraction
           <http://www.openjpeg.org/>
         - a library for the ogg container format, needed for Vorbis and Theora
           <http://www.xiph.org/ogg/>
         - the OGG Vorbis audi codec library
           <http://www.vorbis.com/>
         - the OGG Theora video codec library
           <http://www.theora.org/>
         - the WebM video codec library
           <http://www.webmproject.org/>
         - the mediainfo library for multimedia metadata
           <http://mediainfo.sourceforge.net/>
         Recommended for H.264 video (!be aware of licensing issues!):
         - a library for H.264 video encoding
           <http://www.videolan.org/developers/x264.html>
         - a library for Advanced Audi Coding
           <http://www.audiocoding.com/faac.html>
         - a library for MP3 encoding
           <http://lame.sourceforge.net/>
 
    Note that the configure script checks whether you have all the
    prerequisite software installed and that it won't let you continue
    unless everything is in order.  It also warns you if it cannot find
    some optional but recommended software.
 
 
 1. Quick instructions for the impatient Invenio admin
 =========================================================
 
 1a. Installation
 ----------------
 
       $ cd $HOME/src/
       $ wget http://invenio-software.org/download/invenio-1.1.1.tar.gz
       $ wget http://invenio-software.org/download/invenio-1.1.1.tar.gz.md5
       $ wget http://invenio-software.org/download/invenio-1.1.1.tar.gz.sig
       $ md5sum -c invenio-1.1.1.tar.gz.md5
       $ gpg --verify invenio-1.1.1.tar.gz.sig invenio-1.1.1.tar.gz
       $ tar xvfz invenio-1.1.1.tar.gz
       $ cd invenio-1.1.1
       $ ./configure
       $ make
       $ make install
       $ make install-mathjax-plugin    ## optional
       $ make install-jquery-plugins    ## optional
       $ make install-ckeditor-plugin   ## optional
       $ make install-pdfa-helper-files ## optional
       $ make install-mediaelement      ## optional
       $ make install-solrutils         ## optional
       $ make install-js-test-driver    ## optional
 
 1b. Configuration
 -----------------
 
       $ sudo chown -R www-data.www-data /opt/invenio
       $ sudo -u www-data emacs /opt/invenio/etc/invenio-local.conf
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --update-all
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-tables
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-webstat-conf
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-apache-conf
       $ sudo /etc/init.d/apache2 restart
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --check-openoffice
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-demo-site
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-demo-records
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-unit-tests
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-regression-tests
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-web-tests
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --remove-demo-records
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --drop-demo-site
       $ firefox http://your.site.com/help/admin/howto-run
 
 2. Detailed instructions for the patient Invenio admin
 ==========================================================
 
 2a. Installation
 ----------------
 
     The Invenio uses standard GNU autoconf method to build and
     install its files.  This means that you proceed as follows:
 
       $ cd $HOME/src/
 
           Change to a directory where we will build the Invenio
           sources.  (The built files will be installed into different
           "target" directories later.)
 
       $ wget http://invenio-software.org/download/invenio-1.1.1.tar.gz
       $ wget http://invenio-software.org/download/invenio-1.1.1.tar.gz.md5
       $ wget http://invenio-software.org/download/invenio-1.1.1.tar.gz.sig
 
           Fetch Invenio source tarball from the distribution server,
           together with MD5 checksum and GnuPG cryptographic signature
           files useful for verifying the integrity of the tarball.
 
       $ md5sum -c invenio-1.1.1.tar.gz.md5
 
           Verify MD5 checksum.
 
       $ gpg --verify invenio-1.1.1.tar.gz.sig invenio-1.1.1.tar.gz
 
           Verify GnuPG cryptographic signature.  Note that you may
           first have to import my public key into your keyring, if you
           haven't done that already:
             $ gpg --keyserver wwwkeys.eu.pgp.net --recv-keys 0xBA5A2B67
           The output of the gpg --verify command should then read:
             Good signature from "Tibor Simko <tibor@simko.info>"
           You can safely ignore any trusted signature certification
           warning that may follow after the signature has been
           successfully verified.
 
       $ tar xvfz invenio-1.1.1.tar.gz
 
           Untar the distribution tarball.
 
       $ cd invenio-1.1.1
 
           Go to the source directory.
 
       $ ./configure
 
           Configure Invenio software for building on this specific
           platform.  You can use the following optional parameters:
 
               --prefix=/opt/invenio
 
                  Optionally, specify the Invenio general
                  installation directory (default is /opt/invenio).
                  It will contain command-line binaries and program
                  libraries containing the core Invenio
                  functionality, but also store web pages, runtime log
                  and cache information, document data files, etc.
                  Several subdirs like `bin', `etc', `lib', or `var'
                  will be created inside the prefix directory to this
                  effect.  Note that the prefix directory should be
                  chosen outside of the Apache htdocs tree, since only
                  one its subdirectory (prefix/var/www) is to be
                  accessible directly via the Web (see below).
 
                  Note that Invenio won't install to any other
                  directory but to the prefix mentioned in this
                  configuration line.
 
               --with-python=/opt/python/bin/python2.4
 
                  Optionally, specify a path to some specific Python
                  binary.  This is useful if you have more than one
                  Python installation on your system.  If you don't set
                  this option, then the first Python that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-mysql=/opt/mysql/bin/mysql
 
                  Optionally, specify a path to some specific MySQL
                  client binary.  This is useful if you have more than
                  one MySQL installation on your system.  If you don't
                  set this option, then the first MySQL client
                  executable that will be found in your PATH will be
                  chosen for running Invenio.
 
               --with-clisp=/opt/clisp/bin/clisp
 
                  Optionally, specify a path to CLISP executable.  This
                  is useful if you have more than one CLISP
                  installation on your system.  If you don't set this
                  option, then the first executable that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-cmucl=/opt/cmucl/bin/lisp
 
                  Optionally, specify a path to CMUCL executable.  This
                  is useful if you have more than one CMUCL
                  installation on your system.  If you don't set this
                  option, then the first executable that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-sbcl=/opt/sbcl/bin/sbcl
 
                  Optionally, specify a path to SBCL executable.  This
                  is useful if you have more than one SBCL
                  installation on your system.  If you don't set this
                  option, then the first executable that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-openoffice-python
 
                  Optionally, specify the path to the Python interpreter
                  embedded with OpenOffice.org. This is normally not
                  contained in the normal path. If you don't specify this
                  it won't be possible to use OpenOffice.org to convert from and
                  to Microsoft Office and OpenOffice.org documents.
 
           This configuration step is mandatory.  Usually, you do this
           step only once.
 
           (Note that if you are building Invenio not from a
           released tarball, but from the Git sources, then you have to
           generate the configure file via autotools:
 
               $ sudo aptitude install automake1.9 autoconf
               $ aclocal-1.9
               $ automake-1.9 -a
               $ autoconf
 
           after which you proceed with the usual configure command.)
 
       $ make
 
           Launch the Invenio build.  Since many messages are printed
           during the build process, you may want to run it in a
           fast-scrolling terminal such as rxvt or in a detached screen
           session.
 
           During this step all the pages and scripts will be
           pre-created and customized based on the config you have
           edited in the previous step.
 
           Note that on systems such as FreeBSD or Mac OS X you have to
           use GNU make ("gmake") instead of "make".
 
       $ make install
 
           Install the web pages, scripts, utilities and everything
           needed for Invenio runtime into respective installation
           directories, as specified earlier by the configure command.
 
           Note that if you are installing Invenio for the first
           time, you will be asked to create symbolic link(s) from
           Python's site-packages system-wide directory(ies) to the
           installation location.  This is in order to instruct Python
           where to find Invenio's Python files.  You will be
           hinted as to the exact command to use based on the
           parameters you have used in the configure command.
 
       $ make install-mathjax-plugin  ## optional
 
           This will automatically download and install in the proper
           place MathJax, a JavaScript library to render LaTeX formulas
           in the client browser.
 
           Note that in order to enable the rendering you will have to
           set the variable CFG_WEBSEARCH_USE_MATHJAX_FOR_FORMATS in
           invenio-local.conf to a suitable list of output format
           codes. For example:
           CFG_WEBSEARCH_USE_MATHJAX_FOR_FORMATS = hd,hb
 
       $ make install-jquery-plugins  ## optional
 
           This will automatically download and install in the proper
           place jQuery and related plugins.  They are used for AJAX
           applications such as the record editor.
 
           Note that `unzip' is needed when installing jquery plugins.
 
       $ make install-ckeditor-plugin  ## optional
 
           This will automatically download and install in the proper
           place CKeditor, a WYSIWYG Javascript-based editor (e.g. for
           the WebComment module).
 
           Note that in order to enable the editor you have to set the
           CFG_WEBCOMMENT_USE_RICH_EDITOR to True.
 
       $ make install-pdfa-helper-files ## optional
 
           This will automatically download and install in the proper
           place the helper files needed to create PDF/A files out of
           existing PDF files.
 
       $ make install-mediaelement ## optional
 
           This will automatically download and install the MediaElementJS
           HTML5 video player that is needed for videos on the DEMO site.
 
       $ make install-solrutils  ## optional
 
           This will automatically download and install a Solr instance
           which can be used for full-text searching.  See CFG_SOLR_URL
           variable in the invenio.conf.  Note that the admin later has
           to take care of running init.d scripts which would start the
           Solr instance automatically.
 
       $ make install-js-test-driver  ## optional
 
           This will automatically download and install JsTestDriver
           which is needed to run JS unit tests. Recommended for developers.
 
 2b. Configuration
 -----------------
 
     Once the basic software installation is done, we proceed to
     configuring your Invenio system.
 
       $ sudo chown -R www-data.www-data /opt/invenio
 
           For the sake of simplicity, let us assume that your Invenio
           installation will run under the `www-data' user process
           identity.  The above command changes ownership of installed
           files to www-data, so that we shall run everything under
           this user identity from now on.
 
           For production purposes, you would typically enable Apache
           server to read all files from the installation place but to
           write only to the `var' subdirectory of your installation
           place.  You could achieve this by configuring Unix directory
           group permissions, for example.
 
       $ sudo -u www-data emacs /opt/invenio/etc/invenio-local.conf
 
           Customize your Invenio installation.  Please read the
           'invenio.conf' file located in the same directory that
           contains the vanilla default configuration parameters of
           your Invenio installation.  If you want to customize some of
           these parameters, you should create a file named
           'invenio-local.conf' in the same directory where
           'invenio.conf' lives and you should write there only the
           customizations that you want to be different from the
           vanilla defaults.
 
           Here is a realistic, minimalist, yet production-ready
           example of what you would typically put there:
 
              $ cat /opt/invenio/etc/invenio-local.conf
              [Invenio]
              CFG_SITE_NAME = John Doe's Document Server
              CFG_SITE_NAME_INTL_fr = Serveur des Documents de John Doe
              CFG_SITE_URL = http://your.site.com
              CFG_SITE_SECURE_URL = https://your.site.com
              CFG_SITE_ADMIN_EMAIL = john.doe@your.site.com
              CFG_SITE_SUPPORT_EMAIL = john.doe@your.site.com
              CFG_WEBALERT_ALERT_ENGINE_EMAIL = john.doe@your.site.com
              CFG_WEBCOMMENT_ALERT_ENGINE_EMAIL = john.doe@your.site.com
              CFG_WEBCOMMENT_DEFAULT_MODERATOR = john.doe@your.site.com
              CFG_DATABASE_HOST = localhost
              CFG_DATABASE_NAME = invenio
              CFG_DATABASE_USER = invenio
              CFG_DATABASE_PASS = my123p$ss
              CFG_BIBDOCFILE_ENABLE_BIBDOCFSINFO_CACHE = 1
 
           You should override at least the parameters mentioned above
           in order to define some very essential runtime parameters
           such as the name of your document server (CFG_SITE_NAME and
           CFG_SITE_NAME_INTL_*), the visible URL of your document
           server (CFG_SITE_URL and CFG_SITE_SECURE_URL), the email
           address of the local Invenio administrator, comment
           moderator, and alert engine (CFG_SITE_SUPPORT_EMAIL,
           CFG_SITE_ADMIN_EMAIL, etc), and last but not least your
           database credentials (CFG_DATABASE_*).
 
           If this is a first installation of Invenio it is recommended
           you set the CFG_BIBDOCFILE_ENABLE_BIBDOCFSINFO_CACHE
           variable to 1. If this is instead an upgrade from an existing
           installation don't add it until you have run:
           $ bibdocfile --fix-bibdocfsinfo-cache .
 
           The Invenio system will then read both the default
           invenio.conf file and your customized invenio-local.conf
           file and it will override any default options with the ones
           you have specifield in your local file.  This cascading of
           configuration parameters will ease your future upgrades.
 
           If you want to have multiple Invenio instances for distributed
           video encoding, you need to share the same configuration amongs
           them and make some of the folders of the Invenio installation
           available for all nodes.
 
             Configure the allowed tasks for every node:
 
               CFG_BIBSCHED_NODE_TASKS = {
                   "hostname_machine1" : ["bibindex", "bibupload",
                       "bibreformat","webcoll", "bibtaskex", "bibrank",
                       "oaiharvest", "oairepositoryupdater", "inveniogc",
                       "webstatadmin", "bibclassify", "bibexport",
                       "dbdump", "batchuploader", "bibauthorid", "bibtasklet"],
                   "hostname_machine2" : ['bibencode',]
               }
 
             Share the following directories among Invenio instances:
 
               /var/tmp-shared
                  hosts video uploads in a temporary form
               /var/tmp-shared/bibencode/jobs
                  hosts new job files for the video encoding daemon
               /var/tmp-shared/bibencode/jobs/done
                  hosts job files that have been processed by the daemon
               /var/data/files
                  hosts fulltext and media files associated to records
               /var/data/submit
                  hosts files created during submissions
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --update-all
 
           Make the rest of the Invenio system aware of your
           invenio-local.conf changes.  This step is mandatory each
           time you edit your conf files.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-tables
 
           If you are installing Invenio for the first time, you
           have to create database tables.
 
           Note that this step checks for potential problems such as
           the database connection rights and may ask you to perform
           some more administrative steps in case it detects a problem.
           Notably, it may ask you to set up database access
           permissions, based on your configure values.
 
           If you are installing Invenio for the first time, you
           have to create a dedicated database on your MySQL server
           that the Invenio can use for its purposes.  Please
           contact your MySQL administrator and ask him to execute the
           commands this step proposes you.
 
           At this point you should now have successfully completed the
           "make install" process.  We continue by setting up the
           Apache web server.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-webstat-conf
 
           Load the configuration file of webstat module. It will create
           the tables in the database for register customevents, such as
           basket hits.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-apache-conf
 
           Running this command will generate Apache virtual host
           configurations matching your installation.  You will be
           instructed to check created files (usually they are located
           under /opt/invenio/etc/apache/) and edit your httpd.conf
           to activate Invenio virtual hosts.
 
           If you are using Debian GNU/Linux ``Lenny'' or later, then
           you can do the following to create your SSL certificate and
           to activate your Invenio vhosts:
 
               ## make SSL certificate:
               $ sudo aptitude install ssl-cert
               $ sudo mkdir /etc/apache2/ssl
               $ sudo /usr/sbin/make-ssl-cert /usr/share/ssl-cert/ssleay.cnf \
                      /etc/apache2/ssl/apache.pem
 
               ## add Invenio web sites:
               $ sudo ln -s /opt/invenio/etc/apache/invenio-apache-vhost.conf \
                            /etc/apache2/sites-available/invenio
               $ sudo ln -s /opt/invenio/etc/apache/invenio-apache-vhost-ssl.conf \
                            /etc/apache2/sites-available/invenio-ssl
 
               ## disable Debian's default web site:
               $ sudo /usr/sbin/a2dissite default
 
               ## enable Invenio web sites:
               $ sudo /usr/sbin/a2ensite invenio
               $ sudo /usr/sbin/a2ensite invenio-ssl
 
               ## enable SSL module:
               $ sudo /usr/sbin/a2enmod ssl
 
               ## if you are using xsendfile module, enable it too:
               $ sudo /usr/sbin/a2enmod xsendfile
 
           If you are using another operating system, you should do the
           equivalent, for example edit your system-wide httpd.conf and
           put the following include statements:
 
              Include /opt/invenio/etc/apache/invenio-apache-vhost.conf
              Include /opt/invenio/etc/apache/invenio-apache-vhost-ssl.conf
 
           Note that you may need to adapt generated vhost file
           snippets to match your concrete operating system specifics.
           For example, the generated configuration snippet will
           preload Invenio WSGI daemon application upon Apache start up
           for faster site response.  The generated configuration
           assumes that you are using mod_wsgi version 3 or later.  If
           you are using the old legacy mod_wsgi version 2, then you
           would need to comment out the WSGIImportScript directive
           from the generated snippet, or else move the WSGI daemon
           setup to the top level, outside of the VirtualHost section.
 
           Note also that you may want to tweak the generated Apache
           vhost snippet for performance reasons, especially with
           respect to WSGIDaemonProcess parameters.  For example, you
           can increase the number of processes from the default value
           `processes=5' if you have lots of RAM and if many concurrent
           users may access your site in parallel.  However, note that
           you must use `threads=1' there, because Invenio WSGI daemon
           processes are not fully thread safe yet.  This may change in
           the future.
 
       $ sudo /etc/init.d/apache2 restart
 
           Please ask your webserver administrator to restart the
           Apache server after the above "httpd.conf" changes.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --check-openoffice
 
           If you plan to support MS Office or Open Document Format
           files in your installation, you should check whether
           LibreOffice or OpenOffice.org is well integrated with
           Invenio by running the above command.  You may be asked to
           create a temporary directory for converting office files
           with special ownership (typically as user nobody) and
           permissions.  Note that you can do this step later.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-demo-site
 
           This step is recommended to test your local Invenio
           installation.  It should give you our "Atlantis Institute of
           Science" demo installation, exactly as you see it at
           <http://invenio-demo.cern.ch/>.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-demo-records
 
           Optionally, load some demo records to be able to test
           indexing and searching of your local Invenio demo
           installation.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-unit-tests
 
           Optionally, you can run the unit test suite to verify the
           unit behaviour of your local Invenio installation.  Note
           that this command should be run only after you have
           installed the whole system via `make install'.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-regression-tests
 
            Optionally, you can run the full regression test suite to
            verify the functional behaviour of your local Invenio
            installation.  Note that this command requires to have
            created the demo site and loaded the demo records.  Note
            also that running the regression test suite may alter the
            database content with junk data, so that rebuilding the
            demo site is strongly recommended afterwards.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-web-tests
 
            Optionally, you can run additional automated web tests
            running in a real browser.  This requires to have Firefox
            with the Selenium IDE extension installed.
            <http://en.www.mozilla.com/en/firefox/>
            <http://selenium-ide.openqa.org/>
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --remove-demo-records
 
           Optionally, remove the demo records loaded in the previous
           step, but keeping otherwise the demo collection, submission,
           format, and other configurations that you may reuse and
           modify for your own production purposes.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --drop-demo-site
 
           Optionally, drop also all the demo configuration so that
           you'll end up with a completely blank Invenio system.
           However, you may want to find it more practical not to drop
           the demo site configuration but to start customizing from
           there.
 
       $ firefox http://your.site.com/help/admin/howto-run
 
           In order to start using your Invenio installation, you
           can start indexing, formatting and other daemons as
           indicated in the "HOWTO Run" guide on the above URL.  You
           can also use the Admin Area web interfaces to perform
           further runtime configurations such as the definition of
           data collections, document types, document formats, word
           indexes, etc.
 
       $ sudo ln -s /opt/invenio/etc/bash_completion.d/inveniocfg \
                    /etc/bash_completion.d/inveniocfg
 
            Optionally, if you are using Bash shell completion, then
            you may want to create the above symlink in order to
            configure completion for the inveniocfg command.
 
 Good luck, and thanks for choosing Invenio.
 
        - Invenio Development Team
          <info@invenio-software.org>
          <http://invenio-software.org/>
diff --git a/Makefile.am b/Makefile.am
index 672600c48..9ef70fd1b 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -1,455 +1,455 @@
 ## This file is part of Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 confignicedir = $(sysconfdir)/build
 confignice_SCRIPTS=config.nice
 
 SUBDIRS = po config modules
 
 EXTRA_DIST = UNINSTALL THANKS RELEASE-NOTES configure-tests.py config.nice.in \
              config.rpath
 
 
 # current MathJax version and packages
 # See also modules/miscutil/lib/htmlutils.py (get_mathjax_header)
 MJV = 2.1
 MATHJAX = http://invenio-software.org/download/mathjax/MathJax-v$(MJV).zip
 
 # current CKeditor version
 CKV = 3.6.6
 CKEDITOR = ckeditor_$(CKV).zip
 
 # current MediaElement.js version
 MEV = master
 MEDIAELEMENT = http://github.com/johndyer/mediaelement/zipball/$(MEV)
 
 #for solrutils
 INVENIO_JAVA_PATH = org/invenio_software/solr
 solrdirname = apache-solr-3.1.0
 solrdir = $(prefix)/lib/$(solrdirname)
 solrutils_dir=$(CURDIR)/modules/miscutil/lib/solrutils
 
 CLASSPATH=.:${solrdir}/dist/solrj-lib/commons-io-1.4.jar:${solrdir}/dist/apache-solr-core-*jar:${solrdir}/contrib/jzlib-1.0.7.jar:${solrdir}/dist/apache-solr-solrj-3.1.0.jar:${solrdir}/dist/solrj-lib/slf4j-api-1.5.5.jar:${solrdir}/dist/*:${solrdir}/contrib/basic-lucene-libs/*:${solrdir}/contrib/analysis-extras/lucene-libs/*:${solrdir}/dist/solrj-lib/*
 
 # git-version-get stuff:
 BUILT_SOURCES = $(top_srcdir)/.version
 $(top_srcdir)/.version:
 	echo $(VERSION) > $@-t && mv $@-t $@
 dist-hook:
 	echo $(VERSION) > $(distdir)/.tarball-version
 
 check-upgrade:
 	$(PYTHON) $(top_srcdir)/modules/miscutil/lib/inveniocfg_upgrader.py $(top_srcdir) --upgrade-check
 
 kwalitee-check:
 	@$(PYTHON) $(top_srcdir)/modules/miscutil/lib/kwalitee.py --stats $(top_srcdir)
 
 kwalitee-check-errors-only:
 	@$(PYTHON) $(top_srcdir)/modules/miscutil/lib/kwalitee.py --check-errors $(top_srcdir)
 
 kwalitee-check-variables:
 	@$(PYTHON) $(top_srcdir)/modules/miscutil/lib/kwalitee.py --check-variables $(top_srcdir)
 
 kwalitee-check-indentation:
 	@$(PYTHON) $(top_srcdir)/modules/miscutil/lib/kwalitee.py --check-indentation $(top_srcdir)
 
 kwalitee-check-sql-queries:
 	@$(PYTHON) $(top_srcdir)/modules/miscutil/lib/kwalitee.py --check-sql $(top_srcdir)
 
 etags:
 	\rm -f $(top_srcdir)/TAGS
 	(cd $(top_srcdir) && find $(top_srcdir) -name "*.py" -print | xargs etags)
 
 install-data-local:
 	for d in / /cache /cache/RTdata /log /tmp /tmp-shared /data /run /tmp-shared/bibencode/jobs/done /tmp-shared/bibedit-cache; do	\
 		mkdir -p $(localstatedir)$$d ;		\
 	done
 	@echo "************************************************************"
 	@echo "** Invenio software has been successfully installed!      **"
 	@echo "**                                                        **"
 	@echo "** You may proceed to customizing your installation now.  **"
 	@echo "************************************************************"
 
 install-mathjax-plugin:
 	@echo "***********************************************************"
 	@echo "** Installing MathJax plugin, please wait...             **"
 	@echo "***********************************************************"
 	rm -rf /tmp/invenio-mathjax-plugin
 	mkdir /tmp/invenio-mathjax-plugin
 	rm -fr ${prefix}/var/www/MathJax
 	mkdir -p ${prefix}/var/www/MathJax
 	(cd /tmp/invenio-mathjax-plugin && \
 	wget '$(MATHJAX)' -O mathjax.zip && \
-	unzip -q mathjax.zip && cd mathjax-MathJax-* && cp -ur * \
+	unzip -q mathjax.zip && cd mathjax-MathJax-* && cp -r * \
 	${prefix}/var/www/MathJax)
 	rm -fr /tmp/invenio-mathjax-plugin
 	@echo "************************************************************"
 	@echo "** The MathJax plugin was successfully installed.         **"
 	@echo "** Please do not forget to properly set the option        **"
 	@echo "** CFG_WEBSEARCH_USE_MATHJAX_FOR_FORMATS and              **"
 	@echo "** CFG_WEBSUBMIT_USE_MATHJAX in invenio.conf.             **"
 	@echo "************************************************************"
 
 uninstall-mathjax-plugin:
 	@rm -rvf ${prefix}/var/www/MathJax
 	@echo "***********************************************************"
 	@echo "** The MathJax plugin was successfully uninstalled.      **"
 	@echo "***********************************************************"
 
 install-jscalendar-plugin:
 	@echo "***********************************************************"
 	@echo "** Installing jsCalendar plugin, please wait...          **"
 	@echo "***********************************************************"
 	rm -rf /tmp/invenio-jscalendar-plugin
 	mkdir /tmp/invenio-jscalendar-plugin
 	(cd /tmp/invenio-jscalendar-plugin && \
 	wget 'http://www.dynarch.com/static/jscalendar-1.0.zip' && \
 	unzip -u jscalendar-1.0.zip && \
 	mkdir -p ${prefix}/var/www/jsCalendar && \
 	cp jscalendar-1.0/img.gif ${prefix}/var/www/jsCalendar/jsCalendar.gif && \
 	cp jscalendar-1.0/calendar.js ${prefix}/var/www/jsCalendar/ && \
 	cp jscalendar-1.0/calendar-setup.js ${prefix}/var/www/jsCalendar/ && \
 	cp jscalendar-1.0/lang/calendar-en.js ${prefix}/var/www/jsCalendar/ && \
 	cp jscalendar-1.0/calendar-blue.css ${prefix}/var/www/jsCalendar/)
 	rm -fr /tmp/invenio-jscalendar-plugin
 	@echo "***********************************************************"
 	@echo "** The jsCalendar plugin was successfully installed.     **"
 	@echo "***********************************************************"
 
 uninstall-jscalendar-plugin:
 	@rm -rvf ${prefix}/var/www/jsCalendar
 	@echo "***********************************************************"
 	@echo "** The jsCalendar plugin was successfully uninstalled.   **"
 	@echo "***********************************************************"
 
 install-js-test-driver:
 	@echo "*******************************************************"
 	@echo "** Installing js-test-driver, please wait...         **"
 	@echo "*******************************************************"
 	mkdir -p $(prefix)/lib/java/js-test-driver && \
 	cd $(prefix)/lib/java/js-test-driver && \
 	wget http://invenio-software.org/download/js-test-driver/JsTestDriver-1.3.5.jar -O JsTestDriver.jar
 
 uninstall-js-test-driver:
 	@rm -rvf ${prefix}/lib/java/js-test-driver
 	@echo "*********************************************************"
 	@echo "** The js-test-driver was successfully uninstalled.    **"
 	@echo "*********************************************************"
 
 install-jquery-plugins:
 	@echo "***********************************************************"
 	@echo "** Installing various jQuery plugins, please wait...     **"
 	@echo "***********************************************************"
 	mkdir -p ${prefix}/var/www/js
 	mkdir -p $(prefix)/var/www/css
 	(cd ${prefix}/var/www/js && \
 	wget http://code.jquery.com/jquery-1.7.1.min.js && \
 	mv jquery-1.7.1.min.js jquery.min.js && \
 	wget http://ajax.googleapis.com/ajax/libs/jqueryui/1.8.17/jquery-ui.min.js && \
         wget http://invenio-software.org/download/jquery/v1.5/js/jquery.jeditable.mini.js && \
 	wget https://raw.github.com/malsup/form/master/jquery.form.js --no-check-certificate && \
 	wget http://jquery-multifile-plugin.googlecode.com/svn/trunk/jquery.MultiFile.pack.js && \
 	wget -O jquery.tablesorter.zip http://invenio-software.org/download/jquery/jquery.tablesorter.20111208.zip && \
 	wget http://invenio-software.org/download/jquery/uploadify-v2.1.4.zip -O uploadify.zip && \
 	wget http://www.datatables.net/download/build/jquery.dataTables.min.js && \
 	wget http://invenio-software.org/download/jquery/jquery.bookmark.package-1.4.0.zip && \
 	unzip jquery.tablesorter.zip -d tablesorter && \
 	rm jquery.tablesorter.zip && \
 	rm -rf uploadify && \
 	unzip -u uploadify.zip -d uploadify && \
 	wget http://flot.googlecode.com/files/flot-0.6.zip && \
 	wget -O jquery-ui-timepicker-addon.js http://invenio-software.org/download/jquery/jquery-ui-timepicker-addon-1.0.3.js && \
 	unzip -u flot-0.6.zip && \
 	mv flot/jquery.flot.selection.min.js flot/jquery.flot.min.js flot/excanvas.min.js ./ && \
 	rm flot-0.6.zip && rm -r flot && \
 	mv uploadify/swfobject.js ./ && \
 	mv uploadify/cancel.png uploadify/uploadify.css uploadify/uploadify.allglyphs.swf uploadify/uploadify.fla uploadify/uploadify.swf ../img/ && \
 	mv uploadify/jquery.uploadify.v2.1.4.min.js ./jquery.uploadify.min.js && \
 	rm uploadify.zip && rm -r uploadify && \
 	wget --no-check-certificate https://github.com/douglascrockford/JSON-js/raw/master/json2.js && \
 	wget https://raw.github.com/jeresig/jquery.hotkeys/master/jquery.hotkeys.js --no-check-certificate && \
 	wget http://jquery.bassistance.de/treeview/jquery.treeview.zip && \
 	unzip jquery.treeview.zip -d jquery-treeview && \
 	rm jquery.treeview.zip && \
 	wget http://invenio-software.org/download/jquery/v1.5/js/jquery.ajaxPager.js && \
 	unzip jquery.bookmark.package-1.4.0.zip && \
 	rm -f jquery.bookmark.ext.* bookmarks-big.png bookmarkBasic.html jquery.bookmark.js jquery.bookmark.pack.js && \
 	mv bookmarks.png ../img/ && \
 	mv jquery.bookmark.css ../css/ && \
 	rm -f jquery.bookmark.package-1.4.0.zip && \
 	mkdir -p ${prefix}/var/www/img && \
 	cd ${prefix}/var/www/img && \
 	wget -r -np -nH --cut-dirs=4 -A "png,css" -P jquery-ui/themes http://jquery-ui.googlecode.com/svn/tags/1.8.17/themes/base/ && \
 	wget -r -np -nH --cut-dirs=4 -A "png,css" -P jquery-ui/themes http://jquery-ui.googlecode.com/svn/tags/1.8.17/themes/smoothness/ && \
 	wget -r -np -nH --cut-dirs=4 -A "png,css" -P jquery-ui/themes http://jquery-ui.googlecode.com/svn/tags/1.8.17/themes/redmond/ && \
 	wget --no-check-certificate -O datatables_jquery-ui.css https://github.com/DataTables/DataTables/raw/master/media/css/demo_table_jui.css && \
 	wget http://jquery-ui.googlecode.com/svn/tags/1.8.17/themes/redmond/jquery-ui.css && \
 	wget http://jquery-ui.googlecode.com/svn/tags/1.8.17/demos/images/calendar.gif && \
 	wget -r -np -nH --cut-dirs=5 -A "png" http://jquery-ui.googlecode.com/svn/tags/1.8.17/themes/redmond/images/)
 	@echo "***********************************************************"
 	@echo "** The jQuery plugins were successfully installed.       **"
 	@echo "***********************************************************"
 
 uninstall-jquery-plugins:
 	(cd ${prefix}/var/www/js && \
 	rm -f jquery.min.js && \
 	rm -f jquery.MultiFile.pack.js && \
 	rm -f jquery.jeditable.mini.js && \
 	rm -f jquery.flot.selection.min.js && \
 	rm -f jquery.flot.min.js && \
 	rm -f excanvas.min.js && \
 	rm -f jquery-ui-timepicker-addon.min.js && \
 	rm -f json2.js && \
 	rm -f jquery.uploadify.min.js && \
 	rm -rf tablesorter && \
 	rm -rf jquery-treeview && \
 	rm -f jquery.ajaxPager.js && \
 	rm -f jquery.form.js && \
 	rm -f jquery.dataTables.min.js && \
 	rm -f ui.core.js && \
 	rm -f jquery.bookmark.min.js && \
         rm -f jquery.hotkeys.js && \
         rm -f jquery.tablesorter.min.js && \
         rm -f jquery-ui-1.7.3.custom.min.js && \
         rm -f jquery.metadata.js && \
         rm -f jquery-latest.js && \
         rm -f jquery-ui.min.js)
 	(cd ${prefix}/var/www/img && \
 	rm -f cancel.png uploadify.css uploadify.swf uploadify.allglyphs.swf uploadify.fla && \
 	rm -f datatables_jquery-ui.css \
 	rm -f bookmarks.png) && \
 	(cd ${prefix}/var/www/css && \
 	rm -f jquery.bookmark.css)
 	@echo "***********************************************************"
 	@echo "** The jquery plugins were successfully uninstalled.     **"
 	@echo "***********************************************************"
 
 install-ckeditor-plugin:
 	@echo "***********************************************************"
 	@echo "** Installing CKeditor plugin, please wait...           **"
 	@echo "***********************************************************"
 	rm -rf ${prefix}/lib/python/invenio/ckeditor/
 	rm -rf /tmp/invenio-ckeditor-plugin
 	mkdir /tmp/invenio-ckeditor-plugin
 	(cd /tmp/invenio-ckeditor-plugin && \
 	wget 'http://invenio-software.org/download/ckeditor/$(CKEDITOR)' && \
 	unzip -u -d ${prefix}/var/www $(CKEDITOR)) && \
 	find ${prefix}/var/www/ckeditor/ -depth -name '_*' -exec rm -rf {} \; && \
 	find ${prefix}/var/www/ckeditor/ckeditor* -maxdepth 0 ! -name "ckeditor.js" -exec rm -r {} \; && \
 	rm -fr /tmp/invenio-ckeditor-plugin
 	@echo "* Installing Invenio-specific CKeditor config..."
 	(cd $(top_srcdir)/modules/webstyle/etc && make install)
 	@echo "***********************************************************"
 	@echo "** The CKeditor plugin was successfully installed.      **"
 	@echo "** Please do not forget to properly set the option       **"
 	@echo "** CFG_WEBCOMMENT_USE_RICH_TEXT_EDITOR in invenio.conf.  **"
 	@echo "***********************************************************"
 
 uninstall-ckeditor-plugin:
 	@rm -rvf ${prefix}/var/www/ckeditor
 	@rm -rvf ${prefix}/lib/python/invenio/ckeditor
 	@echo "***********************************************************"
 	@echo "** The CKeditor plugin was successfully uninstalled.    **"
 	@echo "***********************************************************"
 
 install-pdfa-helper-files:
 	@echo "***********************************************************"
 	@echo "** Installing PDF/A helper files, please wait...         **"
 	@echo "***********************************************************"
 	wget 'http://invenio-software.org/download/invenio-demo-site-files/ISOCoatedsb.icc' -O ${prefix}/etc/websubmit/file_converter_templates/ISOCoatedsb.icc
 	@echo "***********************************************************"
 	@echo "** The PDF/A helper files were successfully installed.   **"
 	@echo "***********************************************************"
 
 install-mediaelement:
 	@echo "***********************************************************"
 	@echo "** MediaElement.js, please wait...                       **"
 	@echo "***********************************************************"
 	rm -rf /tmp/mediaelement
 	mkdir /tmp/mediaelement
 	wget 'http://github.com/johndyer/mediaelement/zipball/master' -O '/tmp/mediaelement/mediaelement.zip' --no-check-certificate
 	unzip -u -d '/tmp/mediaelement' '/tmp/mediaelement/mediaelement.zip'
 	rm -rf ${prefix}/var/www/mediaelement
 	mkdir ${prefix}/var/www/mediaelement
 	mv /tmp/mediaelement/johndyer-mediaelement-*/build/* ${prefix}/var/www/mediaelement
 	rm -rf /tmp/mediaelement
 	@echo "***********************************************************"
 	@echo "** MediaElement.js was successfully installed.           **"
 	@echo "***********************************************************"
 
 uninstall-pdfa-helper-files:
 	rm -f ${prefix}/etc/websubmit/file_converter_templates/ISOCoatedsb.icc
 	@echo "***********************************************************"
 	@echo "** The PDF/A helper files were successfully uninstalled. **"
 	@echo "***********************************************************"
 
 #Solrutils allows automatic installation, running and searching of an external Solr index.
 install-solrutils:
 	@echo "***********************************************************"
 	@echo "** Installing Solrutils and solr, please wait...         **"
 	@echo "***********************************************************"
 	cd $(prefix)/lib && \
 	if test -d apache-solr*; then echo A solr directory already exists in `pwd` . \
 	Please remove it manually, if you are sure it is not needed; exit 2; fi ; \
         if test -f apache-solr*; then echo solr tarball already exists in `pwd` . \
         Please remove it manually.; exit 2; fi ; \
 	wget http://archive.apache.org/dist/lucene/solr/3.1.0/apache-solr-3.1.0.tgz && \
 	tar -xzf apache-solr-3.1.0.tgz && \
 	rm apache-solr-3.1.0.tgz
 	cd $(solrdir)/contrib/ ;\
 	wget http://mirrors.ibiblio.org/pub/mirrors/maven2/com/jcraft/jzlib/1.0.7/jzlib-1.0.7.jar && \
 	cd $(solrdir)/contrib/ ;\
 	jar -xf ../example/webapps/solr.war WEB-INF/lib/lucene-core-3.1.0.jar ; \
         if test -d basic-lucene-libs; then rm -rf basic-lucene-libs; fi ; \
         mv WEB-INF/lib/ basic-lucene-libs ; \
         cp $(solrutils_dir)/schema.xml $(solrdir)/example/solr/conf/
 	cp $(solrutils_dir)/solrconfig.xml $(solrdir)/example/solr/conf/
 	cd $(solrutils_dir) && \
 	javac -classpath $(CLASSPATH) -d $(solrdir)/contrib @$(solrutils_dir)/java_sources.txt && \
 	cd $(solrdir)/contrib/ && \
 	jar -cf invenio-solr.jar org/invenio_software/solr/*class
 
 update-v0.99.0-tables:
 	cat $(top_srcdir)/modules/miscutil/sql/tabcreate.sql | grep -v 'INSERT INTO upgrade' | ${prefix}/bin/dbexec
 	echo "DROP TABLE IF EXISTS oaiREPOSITORY;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bibdoc ADD COLUMN more_info mediumblob NULL default NULL;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE schTASK ADD COLUMN priority tinyint(4) NOT NULL default 0;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE schTASK ADD KEY priority (priority);" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE rnkCITATIONDATA DROP PRIMARY KEY;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE rnkCITATIONDATA ADD PRIMARY KEY (id);" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE rnkCITATIONDATA CHANGE id id mediumint(8) unsigned NOT NULL auto_increment;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE rnkCITATIONDATA ADD UNIQUE KEY object_name (object_name);" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE sbmPARAMETERS CHANGE value value text NOT NULL default '';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE sbmAPPROVAL ADD note text NOT NULL default '';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE hstDOCUMENT CHANGE docsize docsize bigint(15) unsigned NOT NULL;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE cmtACTIONHISTORY CHANGE client_host client_host int(10) unsigned default NULL;" | ${prefix}/bin/dbexec
 
 update-v0.99.1-tables:
 	@echo "Nothing to do; table structure did not change between v0.99.1 and v0.99.2."
 
 update-v0.99.2-tables:
 	@echo "Nothing to do; table structure did not change between v0.99.2 and v0.99.3."
 
 update-v0.99.3-tables:
 	@echo "Nothing to do; table structure did not change between v0.99.3 and v0.99.4."
 
 update-v0.99.4-tables:
 	@echo "Nothing to do; table structure did not change between v0.99.4 and v0.99.5."
 
 update-v0.99.5-tables:
 	@echo "Nothing to do; table structure did not change between v0.99.5 and v0.99.6."
 
 update-v0.99.6-tables:
 	@echo "Nothing to do; table structure did not change between v0.99.6 and v0.99.7."
 
 update-v0.99.7-tables: # from v0.99.7 to v1.0.0-rc0
 	echo "RENAME TABLE oaiARCHIVE TO oaiREPOSITORY;" | ${prefix}/bin/dbexec
 	cat $(top_srcdir)/modules/miscutil/sql/tabcreate.sql | grep -v 'INSERT INTO upgrade' | ${prefix}/bin/dbexec
 	echo "INSERT INTO knwKB (id,name,description,kbtype) SELECT id,name,description,'' FROM fmtKNOWLEDGEBASES;" | ${prefix}/bin/dbexec
 	echo "INSERT INTO knwKBRVAL (id,m_key,m_value,id_knwKB) SELECT id,m_key,m_value,id_fmtKNOWLEDGEBASES FROM fmtKNOWLEDGEBASEMAPPINGS;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE sbmPARAMETERS CHANGE name name varchar(40) NOT NULL default '';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bibdoc CHANGE docname docname varchar(250) COLLATE utf8_bin NOT NULL default 'file';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bibdoc CHANGE status status text NOT NULL default '';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bibdoc ADD COLUMN text_extraction_date datetime NOT NULL default '0000-00-00';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE collection DROP COLUMN restricted;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE schTASK CHANGE host host varchar(255) NOT NULL default '';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE hstTASK CHANGE host host varchar(255) NOT NULL default '';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bib85x DROP INDEX kv, ADD INDEX kv (value(100));" | ${prefix}/bin/dbexec
 	echo "UPDATE clsMETHOD SET location='http://invenio-software.org/download/invenio-demo-site-files/HEP.rdf' WHERE name='HEP' AND location='';" | ${prefix}/bin/dbexec
 	echo "UPDATE clsMETHOD SET location='http://invenio-software.org/download/invenio-demo-site-files/NASA-subjects.rdf' WHERE name='NASA-subjects' AND location='';" | ${prefix}/bin/dbexec
 	echo "UPDATE accACTION SET name='runoairepository', description='run oairepositoryupdater task' WHERE name='runoaiarchive';" | ${prefix}/bin/dbexec
 	echo "UPDATE accACTION SET name='cfgoaiharvest', description='configure OAI Harvest' WHERE name='cfgbibharvest';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE accARGUMENT CHANGE value value varchar(255);" | ${prefix}/bin/dbexec
 	echo "UPDATE accACTION SET allowedkeywords='doctype,act,categ' WHERE name='submit';" | ${prefix}/bin/dbexec
 	echo "INSERT INTO accARGUMENT(keyword,value) VALUES ('categ','*');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO accROLE_accACTION_accARGUMENT(id_accROLE,id_accACTION,id_accARGUMENT,argumentlistid) SELECT DISTINCT raa.id_accROLE,raa.id_accACTION,accARGUMENT.id,raa.argumentlistid FROM accROLE_accACTION_accARGUMENT as raa JOIN accACTION on id_accACTION=accACTION.id,accARGUMENT WHERE accACTION.name='submit' and accARGUMENT.keyword='categ' and accARGUMENT.value='*';" | ${prefix}/bin/dbexec
 	echo "UPDATE accACTION SET allowedkeywords='name,with_editor_rights' WHERE name='cfgwebjournal';" | ${prefix}/bin/dbexec
 	echo "INSERT INTO accARGUMENT(keyword,value) VALUES ('with_editor_rights','yes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO accROLE_accACTION_accARGUMENT(id_accROLE,id_accACTION,id_accARGUMENT,argumentlistid) SELECT DISTINCT raa.id_accROLE,raa.id_accACTION,accARGUMENT.id,raa.argumentlistid FROM accROLE_accACTION_accARGUMENT as raa JOIN accACTION on id_accACTION=accACTION.id,accARGUMENT WHERE accACTION.name='cfgwebjournal' and accARGUMENT.keyword='with_editor_rights' and accARGUMENT.value='yes';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bskEXTREC CHANGE id id int(15) unsigned NOT NULL auto_increment;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bskEXTREC ADD external_id int(15) NOT NULL default '0';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bskEXTREC ADD collection_id int(15) unsigned NOT NULL default '0';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bskEXTREC ADD original_url text;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE cmtRECORDCOMMENT ADD status char(2) NOT NULL default 'ok';" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE cmtRECORDCOMMENT ADD KEY status (status);" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmALLFUNCDESCR VALUES ('Move_Photos_to_Storage','Attach/edit the pictures uploaded with the \"create_photos_manager_interface()\" function');"  | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFIELDDESC VALUES ('Upload_Photos',NULL,'','R',NULL,NULL,NULL,NULL,NULL,'\"\"\"\r\nThis is an example of element that creates a photos upload interface.\r\nClone it, customize it and integrate it into your submission. Then add function \r\n\'Move_Photos_to_Storage\' to your submission functions list, in order for files \r\nuploaded with this interface to be attached to the record. More information in \r\nthe WebSubmit admin guide.\r\n\"\"\"\r\n\r\nfrom invenio.websubmit_functions.ParamFile import ParamFromFile\r\nfrom invenio.websubmit_functions.Move_Photos_to_Storage import read_param_file, create_photos_manager_interface, get_session_id\r\n\r\n# Retrieve session id\r\ntry:\r\n    # User info is defined only in MBI/MPI actions...\r\n    session_id = get_session_id(None, uid, user_info) \r\nexcept:\r\n    session_id = get_session_id(req, uid, {})\r\n\r\n# Retrieve context\r\nindir = curdir.split(\'/\')[-3]\r\ndoctype = curdir.split(\'/\')[-2]\r\naccess = curdir.split(\'/\')[-1]\r\n\r\n# Get the record ID, if any\r\nsysno = ParamFromFile(\"%s/%s\" % (curdir,\'SN\')).strip()\r\n\r\n\"\"\"\r\nModify below the configuration of the photos manager interface.\r\nNote: \'can_reorder_photos\' parameter is not yet fully taken into consideration\r\n\r\nDocumentation of the function is available by running:\r\necho -e \'from invenio.websubmit_functions.Move_Photos_to_Storage import create_photos_manager_interface as f\\nprint f.__doc__\' | python\r\n\"\"\"\r\ntext += create_photos_manager_interface(sysno, session_id, uid,\r\n                                        doctype, indir, curdir, access,\r\n                                        can_delete_photos=True,\r\n                                        can_reorder_photos=True,\r\n                                        can_upload_photos=True,\r\n                                        editor_width=700,\r\n                                        editor_height=400,\r\n                                        initial_slider_value=100,\r\n                                        max_slider_value=200,\r\n                                        min_slider_value=80)','0000-00-00','0000-00-00',NULL,NULL,0);"  | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_Photos_to_Storage','iconsize');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFIELDDESC VALUES ('Upload_Files',NULL,'','R',NULL,NULL,NULL,NULL,NULL,'\"\"\"\r\nThis is an example of element that creates a file upload interface.\r\nClone it, customize it and integrate it into your submission. Then add function \r\n\'Move_Uploaded_Files_to_Storage\' to your submission functions list, in order for files \r\nuploaded with this interface to be attached to the record. More information in \r\nthe WebSubmit admin guide.\r\n\"\"\"\r\nfrom invenio.websubmit_managedocfiles import create_file_upload_interface\r\nfrom invenio.websubmit_functions.Shared_Functions import ParamFromFile\r\n\r\nindir = ParamFromFile(os.path.join(curdir, \'indir\'))\r\ndoctype = ParamFromFile(os.path.join(curdir, \'doctype\'))\r\naccess = ParamFromFile(os.path.join(curdir, \'access\'))\r\ntry:\r\n    sysno = int(ParamFromFile(os.path.join(curdir, \'SN\')).strip())\r\nexcept:\r\n    sysno = -1\r\nln = ParamFromFile(os.path.join(curdir, \'ln\'))\r\n\r\n\"\"\"\r\nRun the following to get the list of parameters of function \'create_file_upload_interface\':\r\necho -e \'from invenio.websubmit_managedocfiles import create_file_upload_interface as f\\nprint f.__doc__\' | python\r\n\"\"\"\r\ntext = create_file_upload_interface(recid=sysno,\r\n                                 print_outside_form_tag=False,\r\n                                 include_headers=True,\r\n                                 ln=ln,\r\n                                 doctypes_and_desc=[(\'main\',\'Main document\'),\r\n                                                    (\'additional\',\'Figure, schema, etc.\')],\r\n                                 can_revise_doctypes=[\'*\'],\r\n                                 can_describe_doctypes=[\'main\'],\r\n                                 can_delete_doctypes=[\'additional\'],\r\n                                 can_rename_doctypes=[\'main\'],\r\n                                 sbm_indir=indir, sbm_doctype=doctype, sbm_access=access)[1]\r\n','0000-00-00','0000-00-00',NULL,NULL,0);"  | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_Uploaded_Files_to_Storage','forceFileRevision');"  | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmALLFUNCDESCR VALUES ('Create_Upload_Files_Interface','Display generic interface to add/revise/delete files. To be used before function \"Move_Uploaded_Files_to_Storage\"');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmALLFUNCDESCR VALUES ('Move_Uploaded_Files_to_Storage','Attach files uploaded with \"Create_Upload_Files_Interface\"')" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_Revised_Files_to_Storage','elementNameToDoctype');"  | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_Revised_Files_to_Storage','createIconDoctypes');"  | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_Revised_Files_to_Storage','createRelatedFormats');"  | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_Revised_Files_to_Storage','iconsize');"  | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_Revised_Files_to_Storage','keepPreviousVersionDoctypes');"  | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmALLFUNCDESCR VALUES ('Move_Revised_Files_to_Storage','Revise files initially uploaded with \"Move_Files_to_Storage\"')" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','maxsize');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','minsize');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','doctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','restrictions');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canDeleteDoctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canReviseDoctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canDescribeDoctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canCommentDoctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canKeepDoctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canAddFormatDoctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canRestrictDoctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canRenameDoctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','canNameNewFiles');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','createRelatedFormats');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','keepDefault');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','showLinks');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','fileLabel');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','filenameLabel');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','descriptionLabel');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','commentLabel');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','restrictionLabel');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','startDoc');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','endDoc');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','defaultFilenameDoctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Create_Upload_Files_Interface','maxFilesDoctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_Uploaded_Files_to_Storage','iconsize');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_Uploaded_Files_to_Storage','createIconDoctypes');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Report_Number_Generation','nblength');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Second_Report_Number_Generation','2nd_nb_length');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Get_Recid','record_search_pattern');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmALLFUNCDESCR VALUES ('Move_FCKeditor_Files_to_Storage','Transfer files attached to the record with the FCKeditor');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_FCKeditor_Files_to_Storage','input_fields');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Stamp_Uploaded_Files','layer');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Stamp_Replace_Single_File_Approval','layer');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Stamp_Replace_Single_File_Approval','switch_file');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Stamp_Uploaded_Files','switch_file');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_Files_to_Storage','paths_and_restrictions');" | ${prefix}/bin/dbexec
 	echo "INSERT INTO sbmFUNDESC VALUES ('Move_Files_to_Storage','paths_and_doctypes');" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE cmtRECORDCOMMENT ADD round_name varchar(255) NOT NULL default ''" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE cmtRECORDCOMMENT ADD restriction varchar(50) NOT NULL default ''" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE cmtRECORDCOMMENT ADD in_reply_to_id_cmtRECORDCOMMENT int(15) unsigned NOT NULL default '0'" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE cmtRECORDCOMMENT ADD KEY in_reply_to_id_cmtRECORDCOMMENT (in_reply_to_id_cmtRECORDCOMMENT);" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bskRECORDCOMMENT ADD in_reply_to_id_bskRECORDCOMMENT int(15) unsigned NOT NULL default '0'" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bskRECORDCOMMENT ADD KEY in_reply_to_id_bskRECORDCOMMENT (in_reply_to_id_bskRECORDCOMMENT);" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE cmtRECORDCOMMENT ADD reply_order_cached_data blob NULL default NULL;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bskRECORDCOMMENT ADD reply_order_cached_data blob NULL default NULL;" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE cmtRECORDCOMMENT ADD INDEX (reply_order_cached_data(40));" | ${prefix}/bin/dbexec
 	echo "ALTER TABLE bskRECORDCOMMENT ADD INDEX (reply_order_cached_data(40));" | ${prefix}/bin/dbexec
 	echo -e 'from invenio.webcommentadminlib import migrate_comments_populate_threads_index;\
 	migrate_comments_populate_threads_index()' | $(PYTHON)
 	echo -e 'from invenio.access_control_firerole import repair_role_definitions;\
 	repair_role_definitions()' | $(PYTHON)
 
 CLEANFILES = *~ *.pyc *.tmp
diff --git a/configure-tests.py b/configure-tests.py
index 0c729575a..1d686f9ce 100644
--- a/configure-tests.py
+++ b/configure-tests.py
@@ -1,470 +1,473 @@
 ## This file is part of Invenio.
 ## Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """
 Test the suitability of Python core and the availability of various
 Python modules for running Invenio.  Warn the user if there are
 eventual troubles.  Exit status: 0 if okay, 1 if not okay.  Useful for
 running from configure.ac.
 """
 
 ## minimally recommended/required versions:
 cfg_min_python_version = "2.4"
 cfg_max_python_version = "2.9.9999"
 cfg_min_mysqldb_version = "1.2.1_p2"
 
 ## 0) import modules needed for this testing:
 import string
 import sys
 import getpass
 import subprocess
 import re
 
 error_messages = []
 warning_messages = []
 
 def wait_for_user(msg):
     """Print MSG and prompt user for confirmation."""
     try:
         raw_input(msg)
     except KeyboardInterrupt:
         print "\n\nInstallation aborted."
         sys.exit(1)
     except EOFError:
         print " (continuing in batch mode)"
         return
 
 ## 1) check Python version:
 if sys.version < cfg_min_python_version:
     error_messages.append(
     """
     *******************************************************
     ** ERROR: TOO OLD PYTHON DETECTED: %s
     *******************************************************
     ** You seem to be using a too old version of Python. **
     ** You must use at least Python %s.                 **
     **                                                   **
     ** Note that if you have more than one Python        **
     ** installed on your system, you can specify the     **
     ** --with-python configuration option to choose      **
     ** a specific (e.g. non system wide) Python binary.  **
     **                                                   **
     ** Please upgrade your Python before continuing.     **
     *******************************************************
     """ % (string.replace(sys.version, "\n", ""), cfg_min_python_version)
     )
 
 if sys.version > cfg_max_python_version:
     error_messages.append(
     """
     *******************************************************
     ** ERROR: TOO NEW PYTHON DETECTED: %s
     *******************************************************
     ** You seem to be using a too new version of Python. **
     ** You must use at most Python %s.             **
     **                                                   **
     ** Perhaps you have downloaded and are installing an **
     ** old Invenio version?  Please look for more recent **
     ** Invenio version or please contact the development **
     ** team at <info@invenio-software.org> about this    **
     ** problem.                                          **
     **                                                   **
     ** Installation aborted.                             **
     *******************************************************
     """ % (string.replace(sys.version, "\n", ""), cfg_max_python_version)
     )
 
 ## 2) check for required modules:
 try:
     import MySQLdb
     import base64
     import cPickle
     import cStringIO
     import cgi
     import copy
     import fileinput
     import getopt
     import sys
     if sys.hexversion < 0x2060000:
         import md5
     else:
         import hashlib
     import marshal
     import os
     import signal
     import tempfile
     import time
     import traceback
     import unicodedata
     import urllib
     import zlib
     import wsgiref
 except ImportError, msg:
     error_messages.append("""
     *************************************************
     ** IMPORT ERROR %s
     *************************************************
     ** Perhaps you forgot to install some of the   **
     ** prerequisite Python modules?  Please look   **
     ** at our INSTALL file for more details and    **
     ** fix the problem before continuing!          **
     *************************************************
     """ % msg
     )
 
 ## 3) check for recommended modules:
 try:
     import rdflib
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that rdflib is needed only if you plan     **
     ** to work with the automatic classification of    **
     ** documents based on RDF-based taxonomies.        **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import pyRXP
 except ImportError, msg:
     warning_messages.append("""
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that PyRXP is not really required but      **
     ** we recommend it for fast XML MARC parsing.      **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import dateutil
 except ImportError, msg:
     warning_messages.append("""
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that dateutil is not really required but   **
     ** we recommend it for user-friendly date          **
     ** parsing.                                        **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import libxml2
 except ImportError, msg:
     warning_messages.append("""
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that libxml2 is not really required but    **
     ** we recommend it for XML metadata conversions    **
     ** and for fast XML parsing.                       **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import libxslt
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that libxslt is not really required but    **
     ** we recommend it for XML metadata conversions.   **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import Gnuplot
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that Gnuplot.py is not really required but **
     ** we recommend it in order to have nice download  **
     ** and citation history graphs on Detailed record  **
     ** pages.                                          **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import magic
     if not hasattr(magic, "open"):
         raise StandardError
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that magic module is not really required   **
     ** but we recommend it in order to have detailed   **
     ** content information about fulltext files.       **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 except StandardError:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING python-magic
     *****************************************************
     ** The python-magic package you installed is not   **
     ** the one supported by Invenio. Please refer to   **
     ** the INSTALL file for more details.              **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """
     )
 
 try:
     import reportlab
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that reportlab module is not really        **
     ** required, but we recommend it you want to       **
     ** enrich PDF with OCR information.                **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
-    import pyPdf
+    try:
+        import PyPDF2
+    except ImportError:
+        import pyPdf
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
-    ** Note that pyPdf module is not really            **
+    ** Note that pyPdf or pyPdf2 module is not really  **
     ** required, but we recommend it you want to       **
     ** enrich PDF with OCR information.                **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 
 ## 4) check for versions of some important modules:
 if MySQLdb.__version__ < cfg_min_mysqldb_version:
     error_messages.append(
     """
     *****************************************************
     ** ERROR: PYTHON MODULE MYSQLDB %s DETECTED
     *****************************************************
     ** You have to upgrade your MySQLdb to at least    **
     ** version %s.  You must fix this problem    **
     ** before continuing.  Please see the INSTALL file **
     ** for more details.                               **
     *****************************************************
     """ % (MySQLdb.__version__, cfg_min_mysqldb_version)
     )
 
 try:
     import Stemmer
     try:
         from Stemmer import algorithms
     except ImportError, msg:
         error_messages.append(
         """
         *****************************************************
         ** ERROR: STEMMER MODULE PROBLEM %s
         *****************************************************
         ** Perhaps you are using an old Stemmer version?   **
         ** You must either remove your old Stemmer or else **
         ** upgrade to Snowball Stemmer
         **   <http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz>
         ** before continuing.  Please see the INSTALL file **
         ** for more details.                               **
         *****************************************************
         """ % (msg)
         )
 except ImportError:
     pass # no prob, Stemmer is optional
 
 ## 5) check for Python.h (needed for intbitset):
 try:
     from distutils.sysconfig import get_python_inc
     path_to_python_h = get_python_inc() + os.sep + 'Python.h'
     if not os.path.exists(path_to_python_h):
         raise StandardError, "Cannot find %s" % path_to_python_h
 except StandardError, msg:
     error_messages.append(
     """
     *****************************************************
     ** ERROR: PYTHON HEADER FILE ERROR %s
     *****************************************************
     ** You do not seem to have Python developer files  **
     ** installed (such as Python.h).  Some operating   **
     ** systems provide these in a separate Python      **
     ** package called python-dev or python-devel.      **
     ** You must install such a package before          **
     ** continuing the installation process.            **
     *****************************************************
     """ % (msg)
     )
 
 ## Check if ffmpeg is installed and if so, with the minimum configuration for bibencode
 try:
     try:
         process = subprocess.Popen('ffprobe', stderr=subprocess.PIPE, stdout=subprocess.PIPE)
     except OSError:
         raise StandardError, "FFMPEG/FFPROBE does not seem to be installed!"
     returncode = process.wait()
     output = process.communicate()[1]
     RE_CONFIGURATION = re.compile("(--enable-[a-z0-9\-]*)")
     CONFIGURATION_REQUIRED = (
                 '--enable-gpl',
                 '--enable-version3',
                 '--enable-nonfree',
                 '--enable-libtheora',
                 '--enable-libvorbis',
                 '--enable-libvpx',
                 '--enable-libopenjpeg'
                 )
     options = RE_CONFIGURATION.findall(output)
     if sys.version_info < (2, 6):
         import sets
         s = sets.Set(CONFIGURATION_REQUIRED)
         if not s.issubset(options):
             raise StandardError, options.difference(s)
     else:
         if not set(CONFIGURATION_REQUIRED).issubset(options):
             raise StandardError, set(CONFIGURATION_REQUIRED).difference(options)
 except StandardError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** WARNING: FFMPEG CONFIGURATION MISSING %s
     *****************************************************
     ** You do not seem to have FFmpeg configured with  **
     ** the minimum video codecs to run the demo site.  **
     ** Please install the necessary libraries and      **
     ** re-install FFmpeg according to the Invenio      **
     ** installation manual (INSTALL).                  **
     *****************************************************
     """ % (msg)
     )
 
 if warning_messages:
     print """
     ******************************************************
     ** WARNING MESSAGES                                 **
     ******************************************************
     """
     for warning in warning_messages:
         print warning
 
 if error_messages:
     print """
     ******************************************************
     ** ERROR MESSAGES                                   **
     ******************************************************
     """
     for error in error_messages:
         print error
 
 if warning_messages and error_messages:
     print """
     There were %(n_err)s error(s) found that you need to solve.
     Please see above, solve them, and re-run configure.
     Note that there are also %(n_wrn)s warnings you may want
     to look into.  Aborting the installation.
     """ % {'n_wrn': len(warning_messages),
            'n_err': len(error_messages)}
 
     sys.exit(1)
 elif error_messages:
     print """
     There were %(n_err)s error(s) found that you need to solve.
     Please see above, solve them, and re-run configure.
     Aborting the installation.
     """ % {'n_err': len(error_messages)}
 
     sys.exit(1)
 elif warning_messages:
     print """
     There were %(n_wrn)s warnings found that you may want to
     look into, solve, and re-run configure before you
     continue the installation.  However, you can also continue
     the installation now and solve these issues later, if you wish.
     """ % {'n_wrn': len(warning_messages)}
     wait_for_user("Press ENTER to continue the installation...")
diff --git a/modules/bibindex/lib/bibindex_engine.py b/modules/bibindex/lib/bibindex_engine.py
index eda31580c..48f3110b1 100644
--- a/modules/bibindex/lib/bibindex_engine.py
+++ b/modules/bibindex/lib/bibindex_engine.py
@@ -1,1719 +1,1732 @@
 # -*- coding: utf-8 -*-
 ##
 ## This file is part of Invenio.
-## Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 CERN.
+## Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """
 BibIndex indexing engine implementation.  See bibindex executable for entry point.
 """
 
 __revision__ = "$Id$"
 
 import os
 import re
 import sys
 import time
 import urllib2
 import logging
 
 from invenio.config import \
      CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS, \
      CFG_BIBINDEX_CHARS_PUNCTUATION, \
      CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY, \
      CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES, \
      CFG_BIBINDEX_SYNONYM_KBRS, \
      CFG_CERN_SITE, CFG_INSPIRE_SITE, \
      CFG_BIBINDEX_SPLASH_PAGES, \
      CFG_SOLR_URL, \
      CFG_XAPIAN_ENABLED
 from invenio.bibindex_engine_config import CFG_MAX_MYSQL_THREADS, \
     CFG_MYSQL_THREAD_TIMEOUT, \
     CFG_CHECK_MYSQL_THREADS
 from invenio.bibindex_engine_tokenizer import \
      BibIndexFuzzyNameTokenizer, BibIndexExactNameTokenizer, \
      BibIndexPairTokenizer, BibIndexWordTokenizer, \
      BibIndexPhraseTokenizer
 from invenio.bibindexadminlib import get_idx_indexer
 from invenio.bibdocfile import bibdocfile_url_p, \
      bibdocfile_url_to_bibdoc, normalize_format, \
      download_url, guess_format_from_url, BibRecDocs, \
      decompose_bibdocfile_url
 from invenio.websubmit_file_converter import convert_file, get_file_converter_logger
 from invenio.search_engine import perform_request_search, \
      get_index_stemming_language, \
      get_synonym_terms
 from invenio.dbquery import run_sql, DatabaseError, serialize_via_marshal, \
      deserialize_via_marshal, wash_table_column_name
 from invenio.bibindex_engine_washer import wash_index_term
 from invenio.bibtask import task_init, write_message, get_datetime, \
     task_set_option, task_get_option, task_get_task_param, \
     task_update_progress, task_sleep_now_if_required
 from invenio.intbitset import intbitset
 from invenio.errorlib import register_exception
 from invenio.htmlutils import get_links_in_html_page
 from invenio.search_engine_utils import get_fieldvalues
 from invenio.solrutils_bibindex_indexer import solr_add_fulltext, solr_commit
 from invenio.xapianutils_bibindex_indexer import xapian_add
+from invenio.bibrankadminlib import get_def_name
 
 
 if sys.hexversion < 0x2040000:
     # pylint: disable=W0622
     from sets import Set as set
     # pylint: enable=W0622
 
 # FIXME: journal tag and journal pubinfo standard format are defined here:
 if CFG_CERN_SITE:
     CFG_JOURNAL_TAG = '773__%'
     CFG_JOURNAL_PUBINFO_STANDARD_FORM = "773__p 773__v (773__y) 773__c"
     CFG_JOURNAL_PUBINFO_STANDARD_FORM_REGEXP_CHECK = r'^\w.*\s\w.*\s\(\d+\)\s\w.*$'
 elif CFG_INSPIRE_SITE:
     CFG_JOURNAL_TAG = '773__%'
     CFG_JOURNAL_PUBINFO_STANDARD_FORM = "773__p,773__v,773__c"
     CFG_JOURNAL_PUBINFO_STANDARD_FORM_REGEXP_CHECK = r'^\w.*,\w.*,\w.*$'
 else:
     CFG_JOURNAL_TAG = '909C4%'
     CFG_JOURNAL_PUBINFO_STANDARD_FORM = "909C4p 909C4v (909C4y) 909C4c"
     CFG_JOURNAL_PUBINFO_STANDARD_FORM_REGEXP_CHECK = r'^\w.*\s\w.*\s\(\d+\)\s\w.*$'
 
 ## precompile some often-used regexp for speed reasons:
 re_subfields = re.compile('\$\$\w')
 re_block_punctuation_begin = re.compile(r"^" + CFG_BIBINDEX_CHARS_PUNCTUATION + "+")
 re_block_punctuation_end = re.compile(CFG_BIBINDEX_CHARS_PUNCTUATION + "+$")
 re_punctuation = re.compile(CFG_BIBINDEX_CHARS_PUNCTUATION)
 re_separators = re.compile(CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS)
 re_datetime_shift = re.compile("([-\+]{0,1})([\d]+)([dhms])")
 re_arxiv = re.compile(r'^arxiv:\d\d\d\d\.\d\d\d\d')
 
 nb_char_in_line = 50  # for verbose pretty printing
 chunksize = 1000 # default size of chunks that the records will be treated by
 base_process_size = 4500 # process base size
 _last_word_table = None
 
 fulltext_added = intbitset() # stores ids of records whose fulltexts have been added
 
 
 def list_union(list1, list2):
     "Returns union of the two lists."
     union_dict = {}
     for e in list1:
         union_dict[e] = 1
     for e in list2:
         union_dict[e] = 1
     return union_dict.keys()
 
 ## safety function for killing slow DB threads:
 def kill_sleepy_mysql_threads(max_threads=CFG_MAX_MYSQL_THREADS, thread_timeout=CFG_MYSQL_THREAD_TIMEOUT):
     """Check the number of DB threads and if there are more than
        MAX_THREADS of them, lill all threads that are in a sleeping
        state for more than THREAD_TIMEOUT seconds.  (This is useful
        for working around the the max_connection problem that appears
        during indexation in some not-yet-understood cases.)  If some
        threads are to be killed, write info into the log file.
     """
     res = run_sql("SHOW FULL PROCESSLIST")
     if len(res) > max_threads:
         for row in res:
             r_id, dummy, dummy, dummy, r_command, r_time, dummy, dummy = row
             if r_command == "Sleep" and int(r_time) > thread_timeout:
                 run_sql("KILL %s", (r_id,))
                 write_message("WARNING: too many DB threads, killing thread %s" % r_id, verbose=1)
     return
 
 def get_associated_subfield_value(recID, tag, value, associated_subfield_code):
     """Return list of ASSOCIATED_SUBFIELD_CODE, if exists, for record
     RECID and TAG of value VALUE.  Used by fulltext indexer only.
     Note: TAG must be 6 characters long (tag+ind1+ind2+sfcode),
     otherwise en empty string is returned.
     FIXME: what if many tag values have the same value but different
     associated_subfield_code?  Better use bibrecord library for this.
     """
     out = ""
     if len(tag) != 6:
         return out
     bibXXx = "bib" + tag[0] + tag[1] + "x"
     bibrec_bibXXx = "bibrec_" + bibXXx
     query = """SELECT bb.field_number, b.tag, b.value FROM %s AS b, %s AS bb
                WHERE bb.id_bibrec=%%s AND bb.id_bibxxx=b.id AND tag LIKE
                %%s%%""" % (bibXXx, bibrec_bibXXx)
     res = run_sql(query, (recID, tag[:-1]))
     field_number = -1
     for row in res:
         if row[1] == tag and row[2] == value:
             field_number = row[0]
     if field_number > 0:
         for row in res:
             if row[0] == field_number and row[1] == tag[:-1] + associated_subfield_code:
                 out = row[2]
                 break
     return out
 
 def get_field_tags(field):
     """Returns a list of MARC tags for the field code 'field'.
        Returns empty list in case of error.
        Example: field='author', output=['100__%','700__%']."""
     out = []
     query = """SELECT t.value FROM tag AS t, field_tag AS ft, field AS f
                 WHERE f.code=%s AND ft.id_field=f.id AND t.id=ft.id_tag
                 ORDER BY ft.score DESC"""
     res = run_sql(query, (field,))
     return [row[0] for row in res]
 
 def get_words_from_journal_tag(recID, tag):
     """
     Special procedure to extract words from journal tags.  Joins
     title/volume/year/page into a standard form that is also used for
     citations.
     """
 
     # get all journal tags/subfields:
     bibXXx = "bib" + tag[0] + tag[1] + "x"
     bibrec_bibXXx = "bibrec_" + bibXXx
     query = """SELECT bb.field_number,b.tag,b.value FROM %s AS b, %s AS bb
                 WHERE bb.id_bibrec=%%s
                   AND bb.id_bibxxx=b.id AND tag LIKE %%s""" % (bibXXx, bibrec_bibXXx)
     res = run_sql(query, (recID, tag))
     # construct journal pubinfo:
     dpubinfos = {}
     for row in res:
         nb_instance, subfield, value = row
         if subfield.endswith("c"):
             # delete pageend if value is pagestart-pageend
             # FIXME: pages may not be in 'c' subfield
             value = value.split('-', 1)[0]
         if dpubinfos.has_key(nb_instance):
             dpubinfos[nb_instance][subfield] = value
         else:
             dpubinfos[nb_instance] = {subfield: value}
     # construct standard format:
     lwords = []
     for dpubinfo in dpubinfos.values():
         # index all journal subfields separately
         for tag, val in dpubinfo.items():
             lwords.append(val)
         # index journal standard format:
         pubinfo = CFG_JOURNAL_PUBINFO_STANDARD_FORM
         for tag, val in dpubinfo.items():
             pubinfo = pubinfo.replace(tag, val)
         if CFG_JOURNAL_TAG[:-1] in pubinfo:
             # some subfield was missing, do nothing
             pass
         else:
             lwords.append(pubinfo)
     # return list of words and pubinfos:
     return lwords
 
 def get_field_count(recID, tags):
     """
     Return number of field instances having TAGS in record RECID.
 
     @param recID: record ID
     @type recID: int
     @param tags: list of tags to count, e.g. ['100__a', '700__a']
     @type tags: list
     @return: number of tags present in record
     @rtype: int
     @note: Works internally via getting field values, which may not be
         very efficient.  Could use counts only, or else retrieve stored
         recstruct format of the record and walk through it.
     """
     out = 0
     for tag in tags:
         out += len(get_fieldvalues(recID, tag))
     return out
 
 def get_author_canonical_ids_for_recid(recID):
     """
     Return list of author canonical IDs (e.g. `J.Ellis.1') for the
     given record.  Done by consulting BibAuthorID module.
     """
     from invenio.bibauthorid_dbinterface import get_persons_from_recids
     lwords = []
     res = get_persons_from_recids([recID])
     if res is None:
         ## BibAuthorID is not enabled
         return lwords
     else:
         dpersons, dpersoninfos = res
     for aid in dpersoninfos.keys():
         author_canonical_id = dpersoninfos[aid].get('canonical_id', '')
         if author_canonical_id:
             lwords.append(author_canonical_id)
     return lwords
 
 def get_words_from_date_tag(datestring, stemming_language=None):
     """
     Special procedure to index words from tags storing date-like
     information in format YYYY or YYYY-MM or YYYY-MM-DD.  Namely, we
     are indexing word-terms YYYY, YYYY-MM, YYYY-MM-DD, but never
     standalone MM or DD.
     """
     out = []
     for dateword in datestring.split():
         # maybe there are whitespaces, so break these too
         out.append(dateword)
         parts = dateword.split('-')
         for nb in range(1, len(parts)):
             out.append("-".join(parts[:nb]))
     return out
 
 def get_words_from_fulltext(url_direct_or_indirect, stemming_language=None):
     """Returns all the words contained in the document specified by
        URL_DIRECT_OR_INDIRECT with the words being split by various
        SRE_SEPARATORS regexp set earlier.  If FORCE_FILE_EXTENSION is
        set (e.g. to "pdf", then treat URL_DIRECT_OR_INDIRECT as a PDF
        file.  (This is interesting to index Indico for example.)  Note
        also that URL_DIRECT_OR_INDIRECT may be either a direct URL to
        the fulltext file or an URL to a setlink-like page body that
        presents the links to be indexed.  In the latter case the
        URL_DIRECT_OR_INDIRECT is parsed to extract actual direct URLs
        to fulltext documents, for all knows file extensions as
        specified by global CONV_PROGRAMS config variable.
     """
     write_message("... reading fulltext files from %s started" % url_direct_or_indirect, verbose=2)
     try:
         if bibdocfile_url_p(url_direct_or_indirect):
             write_message("... %s is an internal document" % url_direct_or_indirect, verbose=2)
             bibdoc = bibdocfile_url_to_bibdoc(url_direct_or_indirect)
             indexer = get_idx_indexer('fulltext')
             if indexer != 'native':
                 # A document might belong to multiple records
                 for rec_link in bibdoc.bibrec_links:
                     recid = rec_link["recid"]
                     # Adds fulltexts of all files once per records
                     if not recid in fulltext_added:
                         bibrecdocs = BibRecDocs(recid)
                         text = bibrecdocs.get_text()
                         if indexer == 'SOLR' and CFG_SOLR_URL:
                             solr_add_fulltext(recid, text)
                         elif indexer == 'XAPIAN' and CFG_XAPIAN_ENABLED:
                             xapian_add(recid, 'fulltext', text)
 
                     fulltext_added.add(recid)
                 # we are relying on an external information retrieval system
                 # to provide full-text indexing, so dispatch text to it and
                 # return nothing here:
                 return []
             else:
                 text = ""
                 if hasattr(bibdoc, "get_text"):
                     text = bibdoc.get_text()
                 return get_words_from_phrase(text, stemming_language)
         else:
             if CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY:
                 write_message("... %s is external URL but indexing only local files" % url_direct_or_indirect, verbose=2)
                 return []
             write_message("... %s is an external URL" % url_direct_or_indirect, verbose=2)
             urls_to_index = set()
             for splash_re, url_re in CFG_BIBINDEX_SPLASH_PAGES.iteritems():
                 if re.match(splash_re, url_direct_or_indirect):
                     write_message("... %s is a splash page (%s)" % (url_direct_or_indirect, splash_re), verbose=2)
                     html = urllib2.urlopen(url_direct_or_indirect).read()
                     urls = get_links_in_html_page(html)
                     write_message("... found these URLs in %s splash page: %s" % (url_direct_or_indirect, ", ".join(urls)), verbose=3)
                     for url in urls:
                         if re.match(url_re, url):
                             write_message("... will index %s (matched by %s)" % (url, url_re), verbose=2)
                             urls_to_index.add(url)
             if not urls_to_index:
                 urls_to_index.add(url_direct_or_indirect)
             write_message("... will extract words from %s" % ', '.join(urls_to_index), verbose=2)
             words = {}
             for url in urls_to_index:
                 tmpdoc = download_url(url)
                 file_converter_logger = get_file_converter_logger()
                 old_logging_level = file_converter_logger.getEffectiveLevel()
                 if task_get_task_param("verbose") > 3:
                     file_converter_logger.setLevel(logging.DEBUG)
                 try:
                     try:
                         tmptext = convert_file(tmpdoc, output_format='.txt')
                         text = open(tmptext).read()
                         os.remove(tmptext)
 
                         indexer = get_idx_indexer('fulltext')
                         if indexer != 'native':
                             if indexer == 'SOLR' and CFG_SOLR_URL:
                                 solr_add_fulltext(None, text) # FIXME: use real record ID
                             if indexer == 'XAPIAN' and CFG_XAPIAN_ENABLED:
                                 #xapian_add(None, 'fulltext', text) # FIXME: use real record ID
                                 pass
                             # we are relying on an external information retrieval system
                             # to provide full-text indexing, so dispatch text to it and
                             # return nothing here:
                             tmpwords = []
                         else:
                             tmpwords = get_words_from_phrase(text, stemming_language)
                         words.update(dict(map(lambda x: (x, 1), tmpwords)))
                     except Exception, e:
                         message = 'ERROR: it\'s impossible to correctly extract words from %s referenced by %s: %s' % (url, url_direct_or_indirect, e)
                         register_exception(prefix=message, alert_admin=True)
                         write_message(message, stream=sys.stderr)
                 finally:
                     os.remove(tmpdoc)
                     if task_get_task_param("verbose") > 3:
                         file_converter_logger.setLevel(old_logging_level)
             return words.keys()
     except Exception, e:
         message = 'ERROR: it\'s impossible to correctly extract words from %s: %s' % (url_direct_or_indirect, e)
         register_exception(prefix=message, alert_admin=True)
         write_message(message, stream=sys.stderr)
         return []
 
 
 def get_nothing_from_phrase(phrase, stemming_language=None):
     """ A dump implementation of get_words_from_phrase to be used when
     when a tag should not be indexed (such as when trying to extract phrases from
     8564_u)."""
     return []
 
 def swap_temporary_reindex_tables(index_id, reindex_prefix="tmp_"):
     """Atomically swap reindexed temporary table with the original one.
     Delete the now-old one."""
     write_message("Putting new tmp index tables for id %s into production" % index_id)
     run_sql(
         "RENAME TABLE " +
         "idxWORD%02dR TO old_idxWORD%02dR," % (index_id, index_id) +
         "%sidxWORD%02dR TO idxWORD%02dR," % (reindex_prefix, index_id, index_id) +
         "idxWORD%02dF TO old_idxWORD%02dF," % (index_id, index_id) +
         "%sidxWORD%02dF TO idxWORD%02dF," % (reindex_prefix, index_id, index_id) +
         "idxPAIR%02dR TO old_idxPAIR%02dR," % (index_id, index_id) +
         "%sidxPAIR%02dR TO idxPAIR%02dR," % (reindex_prefix, index_id, index_id) +
         "idxPAIR%02dF TO old_idxPAIR%02dF," % (index_id, index_id) +
         "%sidxPAIR%02dF TO idxPAIR%02dF," % (reindex_prefix, index_id, index_id) +
         "idxPHRASE%02dR TO old_idxPHRASE%02dR," % (index_id, index_id) +
         "%sidxPHRASE%02dR TO idxPHRASE%02dR," % (reindex_prefix, index_id, index_id) +
         "idxPHRASE%02dF TO old_idxPHRASE%02dF," % (index_id, index_id) +
         "%sidxPHRASE%02dF TO idxPHRASE%02dF;" % (reindex_prefix, index_id, index_id)
     )
     write_message("Dropping old index tables for id %s" % index_id)
     run_sql("DROP TABLE old_idxWORD%02dR, old_idxWORD%02dF, old_idxPAIR%02dR, old_idxPAIR%02dF, old_idxPHRASE%02dR, old_idxPHRASE%02dF" % (index_id, index_id, index_id, index_id, index_id, index_id)) # kwalitee: disable=sql
 
 def init_temporary_reindex_tables(index_id, reindex_prefix="tmp_"):
     """Create reindexing temporary tables."""
     write_message("Creating new tmp index tables for id %s" % index_id)
     run_sql("""DROP TABLE IF EXISTS %sidxWORD%02dF""" % (wash_table_column_name(reindex_prefix), index_id)) # kwalitee: disable=sql
     run_sql("""CREATE TABLE %sidxWORD%02dF (
                         id mediumint(9) unsigned NOT NULL auto_increment,
                         term varchar(50) default NULL,
                         hitlist longblob,
                         PRIMARY KEY  (id),
                         UNIQUE KEY term (term)
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
 
     run_sql("""DROP TABLE IF EXISTS %sidxWORD%02dR""" % (wash_table_column_name(reindex_prefix), index_id)) # kwalitee: disable=sql
     run_sql("""CREATE TABLE %sidxWORD%02dR (
                         id_bibrec mediumint(9) unsigned NOT NULL,
                         termlist longblob,
                         type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT',
                         PRIMARY KEY (id_bibrec,type)
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
 
     run_sql("""DROP TABLE IF EXISTS %sidxPAIR%02dF""" % (wash_table_column_name(reindex_prefix), index_id)) # kwalitee: disable=sql
     run_sql("""CREATE TABLE %sidxPAIR%02dF (
                         id mediumint(9) unsigned NOT NULL auto_increment,
                         term varchar(100) default NULL,
                         hitlist longblob,
                         PRIMARY KEY  (id),
                         UNIQUE KEY term (term)
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
 
     run_sql("""DROP TABLE IF EXISTS %sidxPAIR%02dR""" % (wash_table_column_name(reindex_prefix), index_id)) # kwalitee: disable=sql
     run_sql("""CREATE TABLE %sidxPAIR%02dR (
                         id_bibrec mediumint(9) unsigned NOT NULL,
                         termlist longblob,
                         type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT',
                         PRIMARY KEY (id_bibrec,type)
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
 
     run_sql("""DROP TABLE IF EXISTS %sidxPHRASE%02dF""" % (wash_table_column_name(reindex_prefix), index_id)) # kwalitee: disable=sql
     run_sql("""CREATE TABLE %sidxPHRASE%02dF (
                         id mediumint(9) unsigned NOT NULL auto_increment,
                         term text default NULL,
                         hitlist longblob,
                         PRIMARY KEY  (id),
                         KEY term (term(50))
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
 
     run_sql("""DROP TABLE IF EXISTS %sidxPHRASE%02dR""" % (wash_table_column_name(reindex_prefix), index_id)) # kwalitee: disable=sql
     run_sql("""CREATE TABLE %sidxPHRASE%02dR (
                         id_bibrec mediumint(9) unsigned NOT NULL default '0',
                         termlist longblob,
                         type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT',
                         PRIMARY KEY  (id_bibrec,type)
                         ) ENGINE=MyISAM""" % (reindex_prefix, index_id))
     run_sql("UPDATE idxINDEX SET last_updated='0000-00-00 00:00:00' WHERE id=%s", (index_id,))
 
 def get_fuzzy_authors_from_phrase(phrase, stemming_language=None):
     """
     Return list of fuzzy phrase-tokens suitable for storing into
     author phrase index.
     """
     author_tokenizer = BibIndexFuzzyNameTokenizer()
     return author_tokenizer.tokenize(phrase)
 
 def get_exact_authors_from_phrase(phrase, stemming_language=None):
     """
     Return list of exact phrase-tokens suitable for storing into
     exact author phrase index.
     """
     author_tokenizer = BibIndexExactNameTokenizer()
     return author_tokenizer.tokenize(phrase)
 
 def get_author_family_name_words_from_phrase(phrase, stemming_language=None):
     """
     Return list of words from author family names, not his/her first
     names.  The phrase is assumed to be the full author name.  This is
     useful for CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES.
     """
     d_family_names = {}
     # first, treat everything before first comma as surname:
     if ',' in phrase:
         d_family_names[phrase.split(',', 1)[0]] = 1
     # second, try fuzzy author tokenizer to find surname variants:
     for name in get_fuzzy_authors_from_phrase(phrase, stemming_language):
         if ',' in name:
             d_family_names[name.split(',', 1)[0]] = 1
     # now extract words from these surnames:
     d_family_names_words = {}
     for family_name in d_family_names.keys():
         for word in get_words_from_phrase(family_name, stemming_language):
             d_family_names_words[word] = 1
     return d_family_names_words.keys()
 
 def get_words_from_phrase(phrase, stemming_language=None):
     """
     Return a list of words extracted from phrase.
     """
     words_tokenizer = BibIndexWordTokenizer(stemming_language)
     return words_tokenizer.tokenize(phrase)
 
 def get_phrases_from_phrase(phrase, stemming_language=None):
     """Return list of phrases found in PHRASE.  Note that the phrase is
        split into groups depending on the alphanumeric characters and
        punctuation characters definition present in the config file.
     """
     phrase_tokenizer = BibIndexPhraseTokenizer(stemming_language)
     return phrase_tokenizer.tokenize(phrase)
 
 def get_pairs_from_phrase(phrase, stemming_language=None):
     """
     Return list of oairs extracted from phrase.
     """
     pairs_tokenizer = BibIndexPairTokenizer(stemming_language)
     return pairs_tokenizer.tokenize(phrase)
 
 def remove_subfields(s):
     "Removes subfields from string, e.g. 'foo $$c bar' becomes 'foo bar'."
     return re_subfields.sub(' ', s)
 
 def get_index_id_from_index_name(index_name):
     """Returns the words/phrase index id for INDEXNAME.
        Returns empty string in case there is no words table for this index.
        Example: field='author', output=4."""
     out = 0
     query = """SELECT w.id FROM idxINDEX AS w
                 WHERE w.name=%s LIMIT 1"""
     res = run_sql(query, (index_name,), 1)
     if res:
         out = res[0][0]
     return out
 
 def get_index_name_from_index_id(index_id):
     """Returns the words/phrase index name for INDEXID.
        Returns '' in case there is no words table for this indexid.
        Example: field=9, output='fulltext'."""
     res = run_sql("SELECT name FROM idxINDEX WHERE id=%s", (index_id,))
     if res:
         return res[0][0]
     return ''
 
 def get_index_tags(indexname):
     """Returns the list of tags that are indexed inside INDEXNAME.
        Returns empty list in case there are no tags indexed in this index.
        Note: uses get_field_tags() defined before.
        Example: field='author', output=['100__%', '700__%']."""
     out = []
     query = """SELECT f.code FROM idxINDEX AS w, idxINDEX_field AS wf,
     field AS f WHERE w.name=%s AND w.id=wf.id_idxINDEX
     AND f.id=wf.id_field"""
     res = run_sql(query, (indexname,))
     for row in res:
         out.extend(get_field_tags(row[0]))
     return out
 
 def get_all_indexes():
     """Returns the list of the names of all defined words indexes.
        Returns empty list in case there are no tags indexed in this index.
        Example: output=['global', 'author']."""
     out = []
     query = """SELECT name FROM idxINDEX"""
     res = run_sql(query)
     for row in res:
         out.append(row[0])
     return out
 
 def split_ranges(parse_string):
     """Parse a string a return the list or ranges."""
     recIDs = []
     ranges = parse_string.split(",")
     for arange in ranges:
         tmp_recIDs = arange.split("-")
 
         if len(tmp_recIDs) == 1:
             recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[0])])
         else:
             if int(tmp_recIDs[0]) > int(tmp_recIDs[1]): # sanity check
                 tmp = tmp_recIDs[0]
                 tmp_recIDs[0] = tmp_recIDs[1]
                 tmp_recIDs[1] = tmp
             recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[1])])
     return recIDs
 
 def get_word_tables(tables):
     """ Given a list of table names it return a list of tuples
     (index_id, index_name, index_tags).
     If tables is empty it returns the whole list."""
     wordTables = []
     if tables:
         indexes = tables.split(",")
         for index in indexes:
             index_id = get_index_id_from_index_name(index)
             if index_id:
                 wordTables.append((index_id, index, get_index_tags(index)))
             else:
                 write_message("Error: There is no %s words table." % index, sys.stderr)
     else:
         for index in get_all_indexes():
             index_id = get_index_id_from_index_name(index)
             wordTables.append((index_id, index, get_index_tags(index)))
     return wordTables
 
 def get_date_range(var):
     "Returns the two dates contained as a low,high tuple"
     limits = var.split(",")
     if len(limits) == 1:
         low = get_datetime(limits[0])
         return low, None
     if len(limits) == 2:
         low = get_datetime(limits[0])
         high = get_datetime(limits[1])
         return low, high
     return None, None
 
 def create_range_list(res):
     """Creates a range list from a recID select query result contained
     in res. The result is expected to have ascending numerical order."""
     if not res:
         return []
     row = res[0]
     if not row:
         return []
     else:
         range_list = [[row, row]]
     for row in res[1:]:
         row_id = row
         if row_id == range_list[-1][1] + 1:
             range_list[-1][1] = row_id
         else:
             range_list.append([row_id, row_id])
     return range_list
 
 def beautify_range_list(range_list):
     """Returns a non overlapping, maximal range list"""
     ret_list = []
     for new in range_list:
         found = 0
         for old in ret_list:
             if new[0] <= old[0] <= new[1] + 1 or new[0] - 1 <= old[1] <= new[1]:
                 old[0] = min(old[0], new[0])
                 old[1] = max(old[1], new[1])
                 found = 1
                 break
 
         if not found:
             ret_list.append(new)
 
     return ret_list
 
 def truncate_index_table(index_name):
     """Properly truncate the given index."""
     index_id = get_index_id_from_index_name(index_name)
     if index_id:
         write_message('Truncating %s index table in order to reindex.' % index_name, verbose=2)
         run_sql("UPDATE idxINDEX SET last_updated='0000-00-00 00:00:00' WHERE id=%s", (index_id,))
         run_sql("TRUNCATE idxWORD%02dF" % index_id) # kwalitee: disable=sql
         run_sql("TRUNCATE idxWORD%02dR" % index_id) # kwalitee: disable=sql
         run_sql("TRUNCATE idxPHRASE%02dF" % index_id) # kwalitee: disable=sql
         run_sql("TRUNCATE idxPHRASE%02dR" % index_id) # kwalitee: disable=sql
 
 def update_index_last_updated(index_id, starting_time=None):
     """Update last_updated column of the index table in the database.
     Puts starting time there so that if the task was interrupted for record download,
     the records will be reindexed next time."""
     if starting_time is None:
         return None
     write_message("updating last_updated to %s..." % starting_time, verbose=9)
     return run_sql("UPDATE idxINDEX SET last_updated=%s WHERE id=%s",
                     (starting_time, index_id,))
 
+def get_percentage_completed(num_done, num_total):
+    """ Return a string containing the approx. percentage completed """
+    percentage_remaining = 100.0 * float(num_done) / float(num_total)
+    if percentage_remaining:
+        percentage_display = "(%.1f%%)" % (percentage_remaining,)
+    else:
+        percentage_display = ""
+    return percentage_display
+
 #def update_text_extraction_date(first_recid, last_recid):
     #"""for all the bibdoc connected to the specified recid, set
     #the text_extraction_date to the task_starting_time."""
     #run_sql("UPDATE bibdoc JOIN bibrec_bibdoc ON id=id_bibdoc SET text_extraction_date=%s WHERE id_bibrec BETWEEN %s AND %s", (task_get_task_param('task_starting_time'), first_recid, last_recid))
 
 class WordTable:
     "A class to hold the words table."
 
     def __init__(self, index_name, index_id, fields_to_index, table_name_pattern, default_get_words_fnc, tag_to_words_fnc_map, wash_index_terms=50, is_fulltext_index=False):
         """Creates words table instance.
         @param index_name: the index name
         @param index_id: the index integer identificator
         @param fields_to_index: a list of fields to index
         @param table_name_pattern: i.e. idxWORD%02dF or idxPHRASE%02dF
         @parm default_get_words_fnc: the default function called to extract words from a metadata
         @param tag_to_words_fnc_map: a mapping to specify particular function to
             extract words from particular metdata (such as 8564_u)
         @param wash_index_terms: do we wash index terms, and if yes (when >0),
             how many characters do we keep in the index terms; see
             max_char_length parameter of wash_index_term()
         """
         self.index_name = index_name
         self.index_id = index_id
         self.tablename = table_name_pattern % index_id
+        self.humanname = get_def_name('%s' % (str(index_id),), "idxINDEX")[0][1]
         self.recIDs_in_mem = []
         self.fields_to_index = fields_to_index
         self.value = {}
         self.stemming_language = get_index_stemming_language(index_id)
         self.is_fulltext_index = is_fulltext_index
         self.wash_index_terms = wash_index_terms
 
         # tagToFunctions mapping. It offers an indirection level necessary for
         # indexing fulltext. The default is get_words_from_phrase
         self.tag_to_words_fnc_map = tag_to_words_fnc_map
         self.default_get_words_fnc = default_get_words_fnc
 
         if self.stemming_language and self.tablename.startswith('idxWORD'):
             write_message('%s has stemming enabled, language %s' % (self.tablename, self.stemming_language))
 
     def get_field(self, recID, tag):
         """Returns list of values of the MARC-21 'tag' fields for the
            record 'recID'."""
 
         out = []
         bibXXx = "bib" + tag[0] + tag[1] + "x"
         bibrec_bibXXx = "bibrec_" + bibXXx
         query = """SELECT value FROM %s AS b, %s AS bb
                 WHERE bb.id_bibrec=%%s AND bb.id_bibxxx=b.id
                 AND tag LIKE %%s""" % (bibXXx, bibrec_bibXXx)
         res = run_sql(query, (recID, tag))
         for row in res:
             out.append(row[0])
         return out
 
     def clean(self):
         "Cleans the words table."
         self.value = {}
 
     def put_into_db(self, mode="normal"):
         """Updates the current words table in the corresponding DB
            idxFOO table.  Mode 'normal' means normal execution,
            mode 'emergency' means words index reverting to old state.
            """
         write_message("%s %s wordtable flush started" % (self.tablename, mode))
         write_message('...updating %d words into %s started' % \
                 (len(self.value), self.tablename))
-        task_update_progress("%s flushed %d/%d words" % (self.tablename, 0, len(self.value)))
+        task_update_progress("(%s:%s) flushed %d/%d words" % (self.tablename, self.humanname, 0, len(self.value)))
 
         self.recIDs_in_mem = beautify_range_list(self.recIDs_in_mem)
 
         if mode == "normal":
             for group in self.recIDs_in_mem:
                 query = """UPDATE %sR SET type='TEMPORARY' WHERE id_bibrec
                 BETWEEN %%s AND %%s AND type='CURRENT'""" % self.tablename[:-1]
                 write_message(query % (group[0], group[1]), verbose=9)
                 run_sql(query, (group[0], group[1]))
 
         nb_words_total = len(self.value)
         nb_words_report = int(nb_words_total / 10.0)
         nb_words_done = 0
         for word in self.value.keys():
             self.put_word_into_db(word)
             nb_words_done += 1
             if nb_words_report != 0 and ((nb_words_done % nb_words_report) == 0):
                 write_message('......processed %d/%d words' % (nb_words_done, nb_words_total))
-                task_update_progress("%s flushed %d/%d words" % (self.tablename, nb_words_done, nb_words_total))
+                percentage_display = get_percentage_completed(nb_words_done, nb_words_total)
+                task_update_progress("(%s:%s) flushed %d/%d words %s" % (self.tablename, self.humanname, nb_words_done, nb_words_total, percentage_display))
         write_message('...updating %d words into %s ended' % \
                       (nb_words_total, self.tablename))
 
         write_message('...updating reverse table %sR started' % self.tablename[:-1])
         if mode == "normal":
             for group in self.recIDs_in_mem:
                 query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec
                 BETWEEN %%s AND %%s AND type='FUTURE'""" % self.tablename[:-1]
                 write_message(query % (group[0], group[1]), verbose=9)
                 run_sql(query, (group[0], group[1]))
                 query = """DELETE FROM %sR WHERE id_bibrec
                 BETWEEN %%s AND %%s AND type='TEMPORARY'""" % self.tablename[:-1]
                 write_message(query % (group[0], group[1]), verbose=9)
                 run_sql(query, (group[0], group[1]))
                 #if self.is_fulltext_index:
                     #update_text_extraction_date(group[0], group[1])
             write_message('End of updating wordTable into %s' % self.tablename, verbose=9)
         elif mode == "emergency":
             for group in self.recIDs_in_mem:
                 query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec
                 BETWEEN %%s AND %%s AND type='TEMPORARY'""" % self.tablename[:-1]
                 write_message(query % (group[0], group[1]), verbose=9)
                 run_sql(query, (group[0], group[1]))
                 query = """DELETE FROM %sR WHERE id_bibrec
                 BETWEEN %%s AND %%s AND type='FUTURE'""" % self.tablename[:-1]
                 write_message(query % (group[0], group[1]), verbose=9)
                 run_sql(query, (group[0], group[1]))
             write_message('End of emergency flushing wordTable into %s' % self.tablename, verbose=9)
         write_message('...updating reverse table %sR ended' % self.tablename[:-1])
 
         self.clean()
         self.recIDs_in_mem = []
         write_message("%s %s wordtable flush ended" % (self.tablename, mode))
-        task_update_progress("%s flush ended" % (self.tablename))
+        task_update_progress("(%s:%s) flush ended" % (self.tablename, self.humanname))
 
     def load_old_recIDs(self, word):
         """Load existing hitlist for the word from the database index files."""
         query = "SELECT hitlist FROM %s WHERE term=%%s" % self.tablename
         res = run_sql(query, (word,))
         if res:
             return intbitset(res[0][0])
         else:
             return None
 
     def merge_with_old_recIDs(self, word, set):
         """Merge the system numbers stored in memory (hash of recIDs with value +1 or -1
         according to whether to add/delete them) with those stored in the database index
         and received in set universe of recIDs for the given word.
 
         Return False in case no change was done to SET, return True in case SET
         was changed.
         """
         oldset = intbitset(set)
         set.update_with_signs(self.value[word])
         return set != oldset
 
     def put_word_into_db(self, word):
         """Flush a single word to the database and delete it from memory"""
 
         set = self.load_old_recIDs(word)
         if set is not None: # merge the word recIDs found in memory:
             if not self.merge_with_old_recIDs(word, set):
                 # nothing to update:
                 write_message("......... unchanged hitlist for ``%s''" % word, verbose=9)
                 pass
             else:
                 # yes there were some new words:
                 write_message("......... updating hitlist for ``%s''" % word, verbose=9)
                 run_sql("UPDATE %s SET hitlist=%%s WHERE term=%%s" % wash_table_column_name(self.tablename), (set.fastdump(), word)) # kwalitee: disable=sql
 
         else: # the word is new, will create new set:
             write_message("......... inserting hitlist for ``%s''" % word, verbose=9)
             set = intbitset(self.value[word].keys())
             try:
                 run_sql("INSERT INTO %s (term, hitlist) VALUES (%%s, %%s)" % wash_table_column_name(self.tablename), (word, set.fastdump())) # kwalitee: disable=sql
             except Exception, e:
                 ## We send this exception to the admin only when is not
                 ## already reparing the problem.
                 register_exception(prefix="Error when putting the term '%s' into db (hitlist=%s): %s\n" % (repr(word), set, e), alert_admin=(task_get_option('cmd') != 'repair'))
 
         if not set: # never store empty words
             run_sql("DELETE FROM %s WHERE term=%%s" % wash_table_column_name(self.tablename), (word,)) # kwalitee: disable=sql
 
         del self.value[word]
 
     def display(self):
         "Displays the word table."
         keys = self.value.keys()
         keys.sort()
         for k in keys:
             write_message("%s: %s" % (k, self.value[k]))
 
     def count(self):
         "Returns the number of words in the table."
         return len(self.value)
 
     def info(self):
         "Prints some information on the words table."
         write_message("The words table contains %d words." % self.count())
 
     def lookup_words(self, word=""):
         "Lookup word from the words table."
 
         if not word:
             done = 0
             while not done:
                 try:
                     word = raw_input("Enter word: ")
                     done = 1
                 except (EOFError, KeyboardInterrupt):
                     return
 
         if self.value.has_key(word):
             write_message("The word '%s' is found %d times." \
                 % (word, len(self.value[word])))
         else:
             write_message("The word '%s' does not exist in the word file."\
                               % word)
 
     def add_recIDs(self, recIDs, opt_flush):
         """Fetches records which id in the recIDs range list and adds
         them to the wordTable.  The recIDs range list is of the form:
         [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]].
         """
         global chunksize, _last_word_table
         flush_count = 0
         records_done = 0
         records_to_go = 0
 
         for arange in recIDs:
             records_to_go = records_to_go + arange[1] - arange[0] + 1
 
         time_started = time.time() # will measure profile time
         for arange in recIDs:
             i_low = arange[0]
             chunksize_count = 0
             while i_low <= arange[1]:
                 task_sleep_now_if_required()
                 # calculate chunk group of recIDs and treat it:
                 i_high = min(i_low + opt_flush - flush_count - 1, arange[1])
                 i_high = min(i_low + chunksize - chunksize_count - 1, i_high)
 
                 try:
                     self.chk_recID_range(i_low, i_high)
                 except StandardError:
                     if self.index_name == 'fulltext' and CFG_SOLR_URL:
                         solr_commit()
                     raise
 
                 write_message("%s adding records #%d-#%d started" % \
                         (self.tablename, i_low, i_high))
                 if CFG_CHECK_MYSQL_THREADS:
                     kill_sleepy_mysql_threads()
-                task_update_progress("%s adding recs %d-%d" % (self.tablename, i_low, i_high))
+                percentage_display = get_percentage_completed(records_done, records_to_go)
+                task_update_progress("(%s:%s) adding recs %d-%d %s" % (self.tablename, self.humanname, i_low, i_high, percentage_display))
                 self.del_recID_range(i_low, i_high)
                 just_processed = self.add_recID_range(i_low, i_high)
                 flush_count = flush_count + i_high - i_low + 1
                 chunksize_count = chunksize_count + i_high - i_low + 1
                 records_done = records_done + just_processed
                 write_message("%s adding records #%d-#%d ended  " % \
                         (self.tablename, i_low, i_high))
 
                 if chunksize_count >= chunksize:
                     chunksize_count = 0
                 # flush if necessary:
                 if flush_count >= opt_flush:
                     self.put_into_db()
                     self.clean()
                     if self.index_name == 'fulltext' and CFG_SOLR_URL:
                         solr_commit()
                     write_message("%s backing up" % (self.tablename))
                     flush_count = 0
                     self.log_progress(time_started, records_done, records_to_go)
                 # iterate:
                 i_low = i_high + 1
         if flush_count > 0:
             self.put_into_db()
             if self.index_name == 'fulltext' and CFG_SOLR_URL:
                 solr_commit()
             self.log_progress(time_started, records_done, records_to_go)
 
     def add_recIDs_by_date(self, dates, opt_flush):
         """Add records that were modified between DATES[0] and DATES[1].
            If DATES is not set, then add records that were modified since
            the last update of the index.
         """
         if not dates:
             table_id = self.tablename[-3:-1]
             query = """SELECT last_updated FROM idxINDEX WHERE id=%s"""
             res = run_sql(query, (table_id,))
             if not res:
                 return
             if not res[0][0]:
                 dates = ("0000-00-00", None)
             else:
                 dates = (res[0][0], None)
         if dates[1] is None:
             res = intbitset(run_sql("""SELECT b.id FROM bibrec AS b
                               WHERE b.modification_date >= %s""",
                           (dates[0],)))
             if self.is_fulltext_index:
                 res |= intbitset(run_sql("""SELECT id_bibrec FROM bibrec_bibdoc JOIN bibdoc ON id_bibdoc=id WHERE text_extraction_date <= modification_date AND modification_date >= %s AND status<>'DELETED'""", (dates[0],)))
         elif dates[0] is None:
             res = intbitset(run_sql("""SELECT b.id FROM bibrec AS b
                               WHERE b.modification_date <= %s""",
                           (dates[1],)))
             if self.is_fulltext_index:
                 res |= intbitset(run_sql("""SELECT id_bibrec FROM bibrec_bibdoc JOIN bibdoc ON id_bibdoc=id WHERE text_extraction_date <= modification_date AND modification_date <= %s AND status<>'DELETED'""", (dates[1],)))
         else:
             res = intbitset(run_sql("""SELECT b.id FROM bibrec AS b
                               WHERE b.modification_date >= %s AND
                                     b.modification_date <= %s""",
                           (dates[0], dates[1])))
             if self.is_fulltext_index:
                 res |= intbitset(run_sql("""SELECT id_bibrec FROM bibrec_bibdoc JOIN bibdoc ON id_bibdoc=id WHERE text_extraction_date <= modification_date AND modification_date >= %s AND modification_date <= %s AND status<>'DELETED'""", (dates[0], dates[1],)))
         alist = create_range_list(list(res))
         if not alist:
             write_message("No new records added. %s is up to date" % self.tablename)
         else:
             self.add_recIDs(alist, opt_flush)
         # special case of author indexes where we need to re-index
         # those records that were affected by changed BibAuthorID
         # attributions:
         if self.index_name in ('author', 'firstauthor', 'exactauthor', 'exactfirstauthor'):
             from invenio.bibauthorid_personid_maintenance import get_recids_affected_since
             # dates[1] is ignored, since BibAuthorID API does not offer upper limit search
             alist = create_range_list(get_recids_affected_since(dates[0]))
             if not alist:
                 write_message("No new records added by author canonical IDs. %s is up to date" % self.tablename)
             else:
                 self.add_recIDs(alist, opt_flush)
 
     def add_recID_range(self, recID1, recID2):
         """Add records from RECID1 to RECID2."""
         wlist = {}
         self.recIDs_in_mem.append([recID1, recID2])
         # special case of author indexes where we also add author
         # canonical IDs:
         if self.index_name in ('author', 'firstauthor', 'exactauthor', 'exactfirstauthor'):
             for recID in range(recID1, recID2 + 1):
                 if not wlist.has_key(recID):
                     wlist[recID] = []
                 wlist[recID] = list_union(get_author_canonical_ids_for_recid(recID),
                                           wlist[recID])
         # special case of journal index:
         if self.fields_to_index == [CFG_JOURNAL_TAG]:
             # FIXME: quick hack for the journal index; a special
             # treatment where we need to associate more than one
             # subfield into indexed term
             for recID in range(recID1, recID2 + 1):
                 new_words = get_words_from_journal_tag(recID, self.fields_to_index[0])
                 if not wlist.has_key(recID):
                     wlist[recID] = []
                 wlist[recID] = list_union(new_words, wlist[recID])
         elif self.index_name in ('authorcount',):
             # FIXME: quick hack for the authorcount index; we have to
             # count the number of author fields only
             for recID in range(recID1, recID2 + 1):
                 new_words = [str(get_field_count(recID, self.fields_to_index)),]
                 if not wlist.has_key(recID):
                     wlist[recID] = []
                 wlist[recID] = list_union(new_words, wlist[recID])
         else:
             # usual tag-by-tag indexing:
             for tag in self.fields_to_index:
                 get_words_function = self.tag_to_words_fnc_map.get(tag, self.default_get_words_fnc)
                 bibXXx = "bib" + tag[0] + tag[1] + "x"
                 bibrec_bibXXx = "bibrec_" + bibXXx
                 query = """SELECT bb.id_bibrec,b.value FROM %s AS b, %s AS bb
                         WHERE bb.id_bibrec BETWEEN %%s AND %%s
                         AND bb.id_bibxxx=b.id AND tag LIKE %%s""" % (bibXXx, bibrec_bibXXx)
                 res = run_sql(query, (recID1, recID2, tag))
                 if tag == '8564_u':
                     ## FIXME: Quick hack to be sure that hidden files are
                     ## actually indexed.
                     res = set(res)
                     for recid in xrange(int(recID1), int(recID2) + 1):
                         for bibdocfile in BibRecDocs(recid).list_latest_files():
                             res.add((recid, bibdocfile.get_url()))
                 for row in sorted(res):
                     recID, phrase = row
                     if not wlist.has_key(recID):
                         wlist[recID] = []
                     new_words = get_words_function(phrase, stemming_language=self.stemming_language) # ,self.separators
                     wlist[recID] = list_union(new_words, wlist[recID])
 
         # lookup index-time synonyms:
         if CFG_BIBINDEX_SYNONYM_KBRS.has_key(self.index_name):
             if len(wlist) == 0: return 0
             recIDs = wlist.keys()
             for recID in recIDs:
                 for word in wlist[recID]:
                     word_synonyms = get_synonym_terms(word,
                                                       CFG_BIBINDEX_SYNONYM_KBRS[self.index_name][0],
                                                       CFG_BIBINDEX_SYNONYM_KBRS[self.index_name][1])
                     if word_synonyms:
                         wlist[recID] = list_union(word_synonyms, wlist[recID])
 
         # were there some words for these recIDs found?
         if len(wlist) == 0: return 0
         recIDs = wlist.keys()
         for recID in recIDs:
             # was this record marked as deleted?
             if "DELETED" in self.get_field(recID, "980__c"):
                 wlist[recID] = []
                 write_message("... record %d was declared deleted, removing its word list" % recID, verbose=9)
             write_message("... record %d, termlist: %s" % (recID, wlist[recID]), verbose=9)
 
         # put words into reverse index table with FUTURE status:
         for recID in recIDs:
             run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'FUTURE')" % wash_table_column_name(self.tablename[:-1]), (recID, serialize_via_marshal(wlist[recID]))) # kwalitee: disable=sql
             # ... and, for new records, enter the CURRENT status as empty:
             try:
                 run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'CURRENT')" % wash_table_column_name(self.tablename[:-1]), (recID, serialize_via_marshal([]))) # kwalitee: disable=sql
             except DatabaseError:
                 # okay, it's an already existing record, no problem
                 pass
 
         # put words into memory word list:
         put = self.put
         for recID in recIDs:
             for w in wlist[recID]:
                 put(recID, w, 1)
 
         return len(recIDs)
 
     def log_progress(self, start, done, todo):
         """Calculate progress and store it.
         start: start time,
         done: records processed,
         todo: total number of records"""
         time_elapsed = time.time() - start
         # consistency check
         if time_elapsed == 0 or done > todo:
             return
 
         time_recs_per_min = done / (time_elapsed / 60.0)
         write_message("%d records took %.1f seconds to complete.(%1.f recs/min)"\
                 % (done, time_elapsed, time_recs_per_min))
 
         if time_recs_per_min:
             write_message("Estimated runtime: %.1f minutes" % \
                     ((todo - done) / time_recs_per_min))
 
     def put(self, recID, word, sign):
         """Adds/deletes a word to the word list."""
         try:
             if self.wash_index_terms:
                 word = wash_index_term(word, self.wash_index_terms)
             if self.value.has_key(word):
                 # the word 'word' exist already: update sign
                 self.value[word][recID] = sign
             else:
                 self.value[word] = {recID: sign}
         except:
             write_message("Error: Cannot put word %s with sign %d for recID %s." % (word, sign, recID))
 
     def del_recIDs(self, recIDs):
         """Fetches records which id in the recIDs range list and adds
         them to the wordTable.  The recIDs range list is of the form:
         [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]].
         """
         count = 0
         for arange in recIDs:
             task_sleep_now_if_required()
             self.del_recID_range(arange[0], arange[1])
             count = count + arange[1] - arange[0]
         self.put_into_db()
         if self.index_name == 'fulltext' and CFG_SOLR_URL:
             solr_commit()
 
     def del_recID_range(self, low, high):
         """Deletes records with 'recID' system number between low
            and high from memory words index table."""
         write_message("%s fetching existing words for records #%d-#%d started" % \
                 (self.tablename, low, high), verbose=3)
         self.recIDs_in_mem.append([low, high])
         query = """SELECT id_bibrec,termlist FROM %sR as bb WHERE bb.id_bibrec
         BETWEEN %%s AND %%s""" % (self.tablename[:-1])
         recID_rows = run_sql(query, (low, high))
         for recID_row in recID_rows:
             recID = recID_row[0]
             wlist = deserialize_via_marshal(recID_row[1])
             for word in wlist:
                 self.put(recID, word, -1)
         write_message("%s fetching existing words for records #%d-#%d ended" % \
                 (self.tablename, low, high), verbose=3)
 
     def report_on_table_consistency(self):
         """Check reverse words index tables (e.g. idxWORD01R) for
         interesting states such as 'TEMPORARY' state.
         Prints small report (no of words, no of bad words).
         """
         # find number of words:
         query = """SELECT COUNT(*) FROM %s""" % (self.tablename)
         res = run_sql(query, None, 1)
         if res:
             nb_words = res[0][0]
         else:
             nb_words = 0
 
         # find number of records:
         query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1])
         res = run_sql(query, None, 1)
         if res:
             nb_records = res[0][0]
         else:
             nb_records = 0
 
         # report stats:
         write_message("%s contains %d words from %d records" % (self.tablename, nb_words, nb_records))
 
         # find possible bad states in reverse tables:
         query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1])
         res = run_sql(query)
         if res:
             nb_bad_records = res[0][0]
         else:
             nb_bad_records = 999999999
         if nb_bad_records:
             write_message("EMERGENCY: %s needs to repair %d of %d index records" % \
                 (self.tablename, nb_bad_records, nb_records))
         else:
             write_message("%s is in consistent state" % (self.tablename))
 
         return nb_bad_records
 
     def repair(self, opt_flush):
         """Repair the whole table"""
         # find possible bad states in reverse tables:
         query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1])
         res = run_sql(query, None, 1)
         if res:
             nb_bad_records = res[0][0]
         else:
             nb_bad_records = 0
 
         if nb_bad_records == 0:
             return
 
         query = """SELECT id_bibrec FROM %sR WHERE type <> 'CURRENT'""" \
                 % (self.tablename[:-1])
         res = intbitset(run_sql(query))
         recIDs = create_range_list(list(res))
 
         flush_count = 0
         records_done = 0
         records_to_go = 0
 
         for arange in recIDs:
             records_to_go = records_to_go + arange[1] - arange[0] + 1
 
         time_started = time.time() # will measure profile time
         for arange in recIDs:
 
             i_low = arange[0]
             chunksize_count = 0
             while i_low <= arange[1]:
                 task_sleep_now_if_required()
                 # calculate chunk group of recIDs and treat it:
                 i_high = min(i_low + opt_flush - flush_count - 1, arange[1])
                 i_high = min(i_low + chunksize - chunksize_count - 1, i_high)
 
                 self.fix_recID_range(i_low, i_high)
 
                 flush_count = flush_count + i_high - i_low + 1
                 chunksize_count = chunksize_count + i_high - i_low + 1
                 records_done = records_done + i_high - i_low + 1
                 if chunksize_count >= chunksize:
                     chunksize_count = 0
                 # flush if necessary:
                 if flush_count >= opt_flush:
                     self.put_into_db("emergency")
                     self.clean()
                     flush_count = 0
                     self.log_progress(time_started, records_done, records_to_go)
                 # iterate:
                 i_low = i_high + 1
         if flush_count > 0:
             self.put_into_db("emergency")
             self.log_progress(time_started, records_done, records_to_go)
         write_message("%s inconsistencies repaired." % self.tablename)
 
     def chk_recID_range(self, low, high):
         """Check if the reverse index table is in proper state"""
         ## check db
         query = """SELECT COUNT(*) FROM %sR WHERE type <> 'CURRENT'
         AND id_bibrec BETWEEN %%s AND %%s""" % self.tablename[:-1]
         res = run_sql(query, (low, high), 1)
         if res[0][0] == 0:
             write_message("%s for %d-%d is in consistent state" % (self.tablename, low, high))
             return # okay, words table is consistent
 
         ## inconsistency detected!
         write_message("EMERGENCY: %s inconsistencies detected..." % self.tablename)
         error_message = "Errors found. You should check consistency of the " \
                 "%s - %sR tables.\nRunning 'bibindex --repair' is " \
                 "recommended." % (self.tablename, self.tablename[:-1])
         write_message("EMERGENCY: " + error_message, stream=sys.stderr)
         raise StandardError(error_message)
 
     def fix_recID_range(self, low, high):
         """Try to fix reverse index database consistency (e.g. table idxWORD01R) in the low,high doc-id range.
 
         Possible states for a recID follow:
         CUR TMP FUT: very bad things have happened: warn!
         CUR TMP    : very bad things have happened: warn!
         CUR     FUT: delete FUT (crash before flushing)
         CUR        : database is ok
             TMP FUT: add TMP to memory and del FUT from memory
                      flush (revert to old state)
             TMP    : very bad things have happened: warn!
                 FUT: very bad things have happended: warn!
         """
 
         state = {}
         query = "SELECT id_bibrec,type FROM %sR WHERE id_bibrec BETWEEN %%s AND %%s"\
                 % self.tablename[:-1]
         res = run_sql(query, (low, high))
         for row in res:
             if not state.has_key(row[0]):
                 state[row[0]] = []
             state[row[0]].append(row[1])
 
         ok = 1 # will hold info on whether we will be able to repair
         for recID in state.keys():
             if not 'TEMPORARY' in state[recID]:
                 if 'FUTURE' in state[recID]:
                     if 'CURRENT' not in state[recID]:
                         write_message("EMERGENCY: Index record %d is in inconsistent state. Can't repair it." % recID)
                         ok = 0
                     else:
                         write_message("EMERGENCY: Inconsistency in index record %d detected" % recID)
                         query = """DELETE FROM %sR
                         WHERE id_bibrec=%%s""" % self.tablename[:-1]
                         run_sql(query, (recID,))
                         write_message("EMERGENCY: Inconsistency in record %d repaired." % recID)
 
             else:
                 if 'FUTURE' in state[recID] and not 'CURRENT' in state[recID]:
                     self.recIDs_in_mem.append([recID, recID])
 
                     # Get the words file
                     query = """SELECT type,termlist FROM %sR
                     WHERE id_bibrec=%%s""" % self.tablename[:-1]
                     write_message(query, verbose=9)
                     res = run_sql(query, (recID,))
                     for row in res:
                         wlist = deserialize_via_marshal(row[1])
                         write_message("Words are %s " % wlist, verbose=9)
                         if row[0] == 'TEMPORARY':
                             sign = 1
                         else:
                             sign = -1
                         for word in wlist:
                             self.put(recID, word, sign)
 
                 else:
                     write_message("EMERGENCY: %s for %d is in inconsistent "
                             "state. Couldn't repair it." % (self.tablename,
                                 recID), stream=sys.stderr)
                     ok = 0
 
         if not ok:
             error_message = "Unrepairable errors found. You should check " \
                     "consistency of the %s - %sR tables. Deleting affected " \
                     "TEMPORARY and FUTURE entries from these tables is " \
                     "recommended; see the BibIndex Admin Guide." % \
                     (self.tablename, self.tablename[:-1])
             write_message("EMERGENCY: " + error_message, stream=sys.stderr)
             raise StandardError(error_message)
 
 def main():
     """Main that construct all the bibtask."""
     task_init(authorization_action='runbibindex',
             authorization_msg="BibIndex Task Submission",
             description="""Examples:
 \t%s -a -i 234-250,293,300-500 -u admin@localhost
 \t%s -a -w author,fulltext -M 8192 -v3
             \t%s -d -m +4d -A on --flush=10000\n""" % ((sys.argv[0],) * 3), help_specific_usage=""" Indexing options:
   -a, --add\t\tadd or update words for selected records
   -d, --del\t\tdelete words for selected records
   -i, --id=low[-high]\t\tselect according to doc recID
   -m, --modified=from[,to]\tselect according to modification date
   -c, --collection=c1[,c2]\tselect according to collection
   -R, --reindex\treindex the selected indexes from scratch
 
  Repairing options:
   -k, --check\t\tcheck consistency for all records in the table(s)
   -r, --repair\t\ttry to repair all records in the table(s)
 
  Specific options:
   -w, --windex=w1[,w2]\tword/phrase indexes to consider (all)
   -M, --maxmem=XXX\tmaximum memory usage in kB (no limit)
   -f, --flush=NNN\t\tfull consistent table flush after NNN records (10000)
 """,
             version=__revision__,
             specific_params=("adi:m:c:w:krRM:f:", [
                 "add",
                 "del",
                 "id=",
                 "modified=",
                 "collection=",
                 "windex=",
                 "check",
                 "repair",
                 "reindex",
                 "maxmem=",
                 "flush=",
             ]),
             task_stop_helper_fnc=task_stop_table_close_fnc,
             task_submit_elaborate_specific_parameter_fnc=task_submit_elaborate_specific_parameter,
             task_run_fnc=task_run_core,
             task_submit_check_options_fnc=task_submit_check_options)
 
 def task_submit_check_options():
     """Check for options compatibility."""
     if task_get_option("reindex"):
         if task_get_option("cmd") != "add" or task_get_option('id') or task_get_option('collection'):
             print >> sys.stderr, "ERROR: You can use --reindex only when adding modified record."
             return False
     return True
 
 def task_submit_elaborate_specific_parameter(key, value, opts, args):
     """ Given the string key it checks it's meaning, eventually using the
     value. Usually it fills some key in the options dict.
     It must return True if it has elaborated the key, False, if it doesn't
     know that key.
     eg:
     if key in ['-n', '--number']:
         self.options['number'] = value
         return True
     return False
     """
     if key in ("-a", "--add"):
         task_set_option("cmd", "add")
         if ("-x", "") in opts or ("--del", "") in opts:
             raise StandardError("Can not have --add and --del at the same time!")
     elif key in ("-k", "--check"):
         task_set_option("cmd", "check")
     elif key in ("-r", "--repair"):
         task_set_option("cmd", "repair")
     elif key in ("-d", "--del"):
         task_set_option("cmd", "del")
     elif key in ("-i", "--id"):
         task_set_option('id', task_get_option('id') + split_ranges(value))
     elif key in ("-m", "--modified"):
         task_set_option("modified", get_date_range(value))
     elif key in ("-c", "--collection"):
         task_set_option("collection", value)
     elif key in ("-R", "--reindex"):
         task_set_option("reindex", True)
     elif key in ("-w", "--windex"):
         task_set_option("windex", value)
     elif key in ("-M", "--maxmem"):
         task_set_option("maxmem", int(value))
         if task_get_option("maxmem") < base_process_size + 1000:
             raise StandardError("Memory usage should be higher than %d kB" % \
                 (base_process_size + 1000))
     elif key in ("-f", "--flush"):
         task_set_option("flush", int(value))
     else:
         return False
     return True
 
 def task_stop_table_close_fnc():
     """ Close tables to STOP. """
     global _last_word_table
     if _last_word_table:
         _last_word_table.put_into_db()
 
 def task_run_core():
     """Runs the task by fetching arguments from the BibSched task queue.  This is
     what BibSched will be invoking via daemon call.
     The task prints Fibonacci numbers for up to NUM on the stdout, and some
     messages on stderr.
     Return 1 in case of success and 0 in case of failure."""
     global _last_word_table
 
     if task_get_option("cmd") == "check":
         wordTables = get_word_tables(task_get_option("windex"))
         for index_id, index_name, index_tags in wordTables:
             if index_name == 'year' and CFG_INSPIRE_SITE:
                 fnc_get_words_from_phrase = get_words_from_date_tag
             elif index_name in ('author', 'firstauthor') and \
                      CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES:
                 fnc_get_words_from_phrase = get_author_family_name_words_from_phrase
             else:
                 fnc_get_words_from_phrase = get_words_from_phrase
             wordTable = WordTable(index_name=index_name,
                                   index_id=index_id,
                                   fields_to_index=index_tags,
                                   table_name_pattern='idxWORD%02dF',
                                   default_get_words_fnc=fnc_get_words_from_phrase,
                                   tag_to_words_fnc_map={'8564_u': get_words_from_fulltext},
                                   wash_index_terms=50)
             _last_word_table = wordTable
             wordTable.report_on_table_consistency()
             task_sleep_now_if_required(can_stop_too=True)
 
             if index_name in ('author', 'firstauthor') and \
                    CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES:
                 fnc_get_pairs_from_phrase = get_pairs_from_phrase # FIXME
             else:
                 fnc_get_pairs_from_phrase = get_pairs_from_phrase
             wordTable = WordTable(index_name=index_name,
                                   index_id=index_id,
                                   fields_to_index=index_tags,
                                   table_name_pattern='idxPAIR%02dF',
                                   default_get_words_fnc=fnc_get_pairs_from_phrase,
                                   tag_to_words_fnc_map={'8564_u': get_nothing_from_phrase},
                                   wash_index_terms=100)
             _last_word_table = wordTable
             wordTable.report_on_table_consistency()
             task_sleep_now_if_required(can_stop_too=True)
 
             if index_name in ('author', 'firstauthor'):
                 fnc_get_phrases_from_phrase = get_fuzzy_authors_from_phrase
             elif index_name in ('exactauthor', 'exactfirstauthor'):
                 fnc_get_phrases_from_phrase = get_exact_authors_from_phrase
             else:
                 fnc_get_phrases_from_phrase = get_phrases_from_phrase
             wordTable = WordTable(index_name=index_name,
                                   index_id=index_id,
                                   fields_to_index=index_tags,
                                   table_name_pattern='idxPHRASE%02dF',
                                   default_get_words_fnc=fnc_get_phrases_from_phrase,
                                   tag_to_words_fnc_map={'8564_u': get_nothing_from_phrase},
                                   wash_index_terms=0)
             _last_word_table = wordTable
             wordTable.report_on_table_consistency()
             task_sleep_now_if_required(can_stop_too=True)
         _last_word_table = None
         return True
 
     # Let's work on single words!
     wordTables = get_word_tables(task_get_option("windex"))
     for index_id, index_name, index_tags in wordTables:
         is_fulltext_index = index_name == 'fulltext'
         reindex_prefix = ""
         if task_get_option("reindex"):
             reindex_prefix = "tmp_"
             init_temporary_reindex_tables(index_id, reindex_prefix)
         if index_name == 'year' and CFG_INSPIRE_SITE:
             fnc_get_words_from_phrase = get_words_from_date_tag
         elif index_name in ('author', 'firstauthor') and \
                  CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES:
             fnc_get_words_from_phrase = get_author_family_name_words_from_phrase
         else:
             fnc_get_words_from_phrase = get_words_from_phrase
         wordTable = WordTable(index_name=index_name,
                               index_id=index_id,
                               fields_to_index=index_tags,
                               table_name_pattern=reindex_prefix + 'idxWORD%02dF',
                               default_get_words_fnc=fnc_get_words_from_phrase,
                               tag_to_words_fnc_map={'8564_u': get_words_from_fulltext},
                               is_fulltext_index=is_fulltext_index,
                               wash_index_terms=50)
         _last_word_table = wordTable
         wordTable.report_on_table_consistency()
         try:
             if task_get_option("cmd") == "del":
                 if task_get_option("id"):
                     wordTable.del_recIDs(task_get_option("id"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID, recID])
                     wordTable.del_recIDs(recIDs_range)
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     error_message = "Missing IDs of records to delete from " \
                             "index %s." % wordTable.tablename
                     write_message(error_message, stream=sys.stderr)
                     raise StandardError(error_message)
             elif task_get_option("cmd") == "add":
                 if task_get_option("id"):
                     wordTable.add_recIDs(task_get_option("id"), task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID, recID])
                     wordTable.add_recIDs(recIDs_range, task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     wordTable.add_recIDs_by_date(task_get_option("modified"), task_get_option("flush"))
                     ## here we used to update last_updated info, if run via automatic mode;
                     ## but do not update here anymore, since idxPHRASE will be acted upon later
                     task_sleep_now_if_required(can_stop_too=True)
             elif task_get_option("cmd") == "repair":
                 wordTable.repair(task_get_option("flush"))
                 task_sleep_now_if_required(can_stop_too=True)
             else:
                 error_message = "Invalid command found processing %s" % \
                     wordTable.tablename
                 write_message(error_message, stream=sys.stderr)
                 raise StandardError(error_message)
         except StandardError, e:
             write_message("Exception caught: %s" % e, sys.stderr)
             register_exception(alert_admin=True)
             if _last_word_table:
                 _last_word_table.put_into_db()
             raise
 
         wordTable.report_on_table_consistency()
         task_sleep_now_if_required(can_stop_too=True)
 
         # Let's work on pairs now
         if index_name in ('author', 'firstauthor') and \
                CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES:
             fnc_get_pairs_from_phrase = get_pairs_from_phrase # FIXME
         else:
             fnc_get_pairs_from_phrase = get_pairs_from_phrase
         wordTable = WordTable(index_name=index_name,
                               index_id=index_id,
                               fields_to_index=index_tags,
                               table_name_pattern=reindex_prefix + 'idxPAIR%02dF',
                               default_get_words_fnc=fnc_get_pairs_from_phrase,
                               tag_to_words_fnc_map={'8564_u': get_nothing_from_phrase},
                               wash_index_terms=100)
         _last_word_table = wordTable
         wordTable.report_on_table_consistency()
         try:
             if task_get_option("cmd") == "del":
                 if task_get_option("id"):
                     wordTable.del_recIDs(task_get_option("id"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID, recID])
                     wordTable.del_recIDs(recIDs_range)
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     error_message = "Missing IDs of records to delete from " \
                             "index %s." % wordTable.tablename
                     write_message(error_message, stream=sys.stderr)
                     raise StandardError(error_message)
             elif task_get_option("cmd") == "add":
                 if task_get_option("id"):
                     wordTable.add_recIDs(task_get_option("id"), task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID, recID])
                     wordTable.add_recIDs(recIDs_range, task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     wordTable.add_recIDs_by_date(task_get_option("modified"), task_get_option("flush"))
                     # let us update last_updated timestamp info, if run via automatic mode:
                     task_sleep_now_if_required(can_stop_too=True)
             elif task_get_option("cmd") == "repair":
                 wordTable.repair(task_get_option("flush"))
                 task_sleep_now_if_required(can_stop_too=True)
             else:
                 error_message = "Invalid command found processing %s" % \
                         wordTable.tablename
                 write_message(error_message, stream=sys.stderr)
                 raise StandardError(error_message)
         except StandardError, e:
             write_message("Exception caught: %s" % e, sys.stderr)
             register_exception()
             if _last_word_table:
                 _last_word_table.put_into_db()
             raise
 
         wordTable.report_on_table_consistency()
         task_sleep_now_if_required(can_stop_too=True)
 
         # Let's work on phrases now
         if index_name in ('author', 'firstauthor'):
             fnc_get_phrases_from_phrase = get_fuzzy_authors_from_phrase
         elif index_name in ('exactauthor', 'exactfirstauthor'):
             fnc_get_phrases_from_phrase = get_exact_authors_from_phrase
         else:
             fnc_get_phrases_from_phrase = get_phrases_from_phrase
         wordTable = WordTable(index_name=index_name,
                               index_id=index_id,
                               fields_to_index=index_tags,
                               table_name_pattern=reindex_prefix + 'idxPHRASE%02dF',
                               default_get_words_fnc=fnc_get_phrases_from_phrase,
                               tag_to_words_fnc_map={'8564_u': get_nothing_from_phrase},
                               wash_index_terms=0)
         _last_word_table = wordTable
         wordTable.report_on_table_consistency()
         try:
             if task_get_option("cmd") == "del":
                 if task_get_option("id"):
                     wordTable.del_recIDs(task_get_option("id"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID, recID])
                     wordTable.del_recIDs(recIDs_range)
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     error_message = "Missing IDs of records to delete from " \
                             "index %s." % wordTable.tablename
                     write_message(error_message, stream=sys.stderr)
                     raise StandardError(error_message)
             elif task_get_option("cmd") == "add":
                 if task_get_option("id"):
                     wordTable.add_recIDs(task_get_option("id"), task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 elif task_get_option("collection"):
                     l_of_colls = task_get_option("collection").split(",")
                     recIDs = perform_request_search(c=l_of_colls)
                     recIDs_range = []
                     for recID in recIDs:
                         recIDs_range.append([recID, recID])
                     wordTable.add_recIDs(recIDs_range, task_get_option("flush"))
                     task_sleep_now_if_required(can_stop_too=True)
                 else:
                     wordTable.add_recIDs_by_date(task_get_option("modified"), task_get_option("flush"))
                     # let us update last_updated timestamp info, if run via automatic mode:
                     update_index_last_updated(index_id, task_get_task_param('task_starting_time'))
                     task_sleep_now_if_required(can_stop_too=True)
             elif task_get_option("cmd") == "repair":
                 wordTable.repair(task_get_option("flush"))
                 task_sleep_now_if_required(can_stop_too=True)
             else:
                 error_message = "Invalid command found processing %s" % \
                         wordTable.tablename
                 write_message(error_message, stream=sys.stderr)
                 raise StandardError(error_message)
         except StandardError, e:
             write_message("Exception caught: %s" % e, sys.stderr)
             register_exception()
             if _last_word_table:
                 _last_word_table.put_into_db()
             raise
 
         wordTable.report_on_table_consistency()
         task_sleep_now_if_required(can_stop_too=True)
 
         if task_get_option("reindex"):
             swap_temporary_reindex_tables(index_id, reindex_prefix)
             update_index_last_updated(index_id, task_get_task_param('task_starting_time'))
         task_sleep_now_if_required(can_stop_too=True)
 
     _last_word_table = None
     return True
 
 
 ### okay, here we go:
 if __name__ == '__main__':
     main()
diff --git a/modules/bibsched/lib/bibsched.py b/modules/bibsched/lib/bibsched.py
index 8b7260212..2912bfe59 100644
--- a/modules/bibsched/lib/bibsched.py
+++ b/modules/bibsched/lib/bibsched.py
@@ -1,1827 +1,1827 @@
 # -*- coding: utf-8 -*-
 ##
 ## This file is part of Invenio.
 ## Copyright (C) 2006, 2007, 2008, 2009, 2010, 2011, 2012 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """BibSched - task management, scheduling and executing system for Invenio
 """
 
 __revision__ = "$Id$"
 
 import os
 import sys
 import time
 import re
 import marshal
 import getopt
 from itertools import chain
 from socket import gethostname
 from subprocess import Popen
 import signal
 
 from invenio.bibtask_config import \
     CFG_BIBTASK_VALID_TASKS, \
     CFG_BIBTASK_MONOTASKS, \
     CFG_BIBTASK_FIXEDTIMETASKS
 
 from invenio.config import \
      CFG_PREFIX, \
      CFG_BIBSCHED_REFRESHTIME, \
      CFG_BIBSCHED_LOG_PAGER, \
      CFG_BIBSCHED_EDITOR, \
      CFG_BINDIR, \
      CFG_LOGDIR, \
      CFG_BIBSCHED_GC_TASKS_OLDER_THAN, \
      CFG_BIBSCHED_GC_TASKS_TO_REMOVE, \
      CFG_BIBSCHED_GC_TASKS_TO_ARCHIVE, \
      CFG_BIBSCHED_MAX_NUMBER_CONCURRENT_TASKS, \
      CFG_SITE_URL, \
      CFG_BIBSCHED_NODE_TASKS, \
      CFG_BIBSCHED_MAX_ARCHIVED_ROWS_DISPLAY
 from invenio.dbquery import run_sql, real_escape_string
 from invenio.textutils import wrap_text_in_a_box
 from invenio.errorlib import register_exception, register_emergency
 from invenio.shellutils import run_shell_command
 
 CFG_VALID_STATUS = ('WAITING', 'SCHEDULED', 'RUNNING', 'CONTINUING',
                     '% DELETED', 'ABOUT TO STOP', 'ABOUT TO SLEEP', 'STOPPED',
                     'SLEEPING', 'KILLED', 'NOW STOP', 'ERRORS REPORTED')
 
 CFG_MOTD_PATH = os.path.join(CFG_PREFIX, "var", "run", "bibsched.motd")
 
 SHIFT_RE = re.compile("([-\+]{0,1})([\d]+)([dhms])")
 
 
 class RecoverableError(StandardError):
     pass
 
 
 def get_pager():
     """
     Return the first available pager.
     """
     paths = (
         os.environ.get('PAGER', ''),
         CFG_BIBSCHED_LOG_PAGER,
         '/usr/bin/less',
         '/bin/more'
     )
     for pager in paths:
         if os.path.exists(pager):
             return pager
 
 
 def get_editor():
     """
     Return the first available editor.
     """
     paths = (
         os.environ.get('EDITOR', ''),
         CFG_BIBSCHED_EDITOR,
         '/usr/bin/vim',
         '/usr/bin/emacs',
         '/usr/bin/vi',
         '/usr/bin/nano',
     )
     for editor in paths:
         if os.path.exists(editor):
             return editor
 
 
 def get_datetime(var, format_string="%Y-%m-%d %H:%M:%S"):
     """Returns a date string according to the format string.
        It can handle normal date strings and shifts with respect
        to now."""
     try:
         date = time.time()
         factors = {"d": 24*3600, "h": 3600, "m": 60, "s": 1}
         m = SHIFT_RE.match(var)
         if m:
             sign = m.groups()[0] == "-" and -1 or 1
             factor = factors[m.groups()[2]]
             value = float(m.groups()[1])
             date = time.localtime(date + sign * factor * value)
             date = time.strftime(format_string, date)
         else:
             date = time.strptime(var, format_string)
             date = time.strftime(format_string, date)
         return date
     except:
         return None
 
 
 def get_my_pid(process, args=''):
     if sys.platform.startswith('freebsd'):
         command = "ps -o pid,args | grep '%s %s' | grep -v 'grep' | sed -n 1p" % (process, args)
     else:
         command = "ps -C %s o '%%p%%a' | grep '%s %s' | grep -v 'grep' | sed -n 1p" % (process, process, args)
     answer = run_shell_command(command)[1].strip()
     if answer == '':
         answer = 0
     else:
         answer = answer[:answer.find(' ')]
     return int(answer)
 
 
 def get_task_pid(task_name, task_id, ignore_error=False):
     """Return the pid of task_name/task_id"""
     try:
         path = os.path.join(CFG_PREFIX, 'var', 'run', 'bibsched_task_%d.pid' % task_id)
         pid = int(open(path).read())
         os.kill(pid, signal.SIGUSR2)
         return pid
     except (OSError, IOError):
         if ignore_error:
             return 0
         register_exception()
         return get_my_pid(task_name, str(task_id))
 
 def get_last_taskid():
     """Return the last taskid used."""
     return run_sql("SELECT MAX(id) FROM schTASK")[0][0]
 
 def delete_task(task_id):
     """Delete the corresponding task."""
     run_sql("DELETE FROM schTASK WHERE id=%s", (task_id, ))
 
 def is_task_scheduled(task_name):
     """Check if a certain task_name is due for execution (WAITING or RUNNING)"""
     sql = """SELECT COUNT(proc) FROM schTASK
              WHERE proc = %s AND (status='WAITING' OR status='RUNNING')"""
     return run_sql(sql, (task_name,))[0][0] > 0
 
 
 def get_task_ids_by_descending_date(task_name, statuses=['SCHEDULED']):
     """Returns list of task ids, ordered by descending runtime."""
     sql = """SELECT id FROM schTASK
              WHERE proc=%s AND (%s)
              ORDER BY runtime DESC""" \
                         % " OR ".join(["status = '%s'" % x for x in statuses])
     return [x[0] for x in run_sql(sql, (task_name,))]
 
 
 def get_task_options(task_id):
     """Returns options for task_id read from the BibSched task queue table."""
     res = run_sql("SELECT arguments FROM schTASK WHERE id=%s", (task_id,))
     try:
         return marshal.loads(res[0][0])
     except IndexError:
         return list()
 
 
 def gc_tasks(verbose=False, statuses=None, since=None, tasks=None): # pylint: disable=W0613
     """Garbage collect the task queue."""
     if tasks is None:
         tasks = CFG_BIBSCHED_GC_TASKS_TO_REMOVE + CFG_BIBSCHED_GC_TASKS_TO_ARCHIVE
     if since is None:
         since = '-%id' % CFG_BIBSCHED_GC_TASKS_OLDER_THAN
     if statuses is None:
         statuses = ['DONE']
 
     statuses = [status.upper() for status in statuses if status.upper() != 'RUNNING']
 
     date = get_datetime(since)
 
     status_query = 'status in (%s)' % ','.join([repr(real_escape_string(status)) for status in statuses])
 
     for task in tasks:
         if task in CFG_BIBSCHED_GC_TASKS_TO_REMOVE:
             res = run_sql("""DELETE FROM schTASK WHERE proc=%%s AND %s AND
                              runtime<%%s""" % status_query, (task, date))
             write_message('Deleted %s %s tasks (created before %s) with %s'
                                             % (res, task, date, status_query))
         elif task in CFG_BIBSCHED_GC_TASKS_TO_ARCHIVE:
             run_sql("""INSERT INTO hstTASK(id,proc,host,user,
                        runtime,sleeptime,arguments,status,progress)
                        SELECT id,proc,host,user,
                        runtime,sleeptime,arguments,status,progress
                        FROM schTASK WHERE proc=%%s AND %s AND
                        runtime<%%s""" % status_query, (task, date))
             res = run_sql("""DELETE FROM schTASK WHERE proc=%%s AND %s AND
                              runtime<%%s""" % status_query, (task, date))
             write_message('Archived %s %s tasks (created before %s) with %s'
                                             % (res, task, date, status_query))
 
 
 def spawn_task(command, wait=False):
     """
     Spawn the provided command in a way that is detached from the current
     group. In this way a signal received by bibsched is not going to be
     automatically propagated to the spawned process.
     """
     def preexec():  # Don't forward signals.
         os.setsid()
 
     devnull = open(os.devnull, "w")
     process = Popen(command, preexec_fn=preexec, shell=True,
                     stderr=devnull, stdout=devnull)
     if wait:
         process.wait()
 
 
 def bibsched_get_host(task_id):
     """Retrieve the hostname of the task"""
     res = run_sql("SELECT host FROM schTASK WHERE id=%s LIMIT 1", (task_id, ), 1)
     if res:
         return res[0][0]
 
 
 def bibsched_set_host(task_id, host=""):
     """Update the progress of task_id."""
     return run_sql("UPDATE schTASK SET host=%s WHERE id=%s", (host, task_id))
 
 
 def bibsched_get_status(task_id):
     """Retrieve the task status."""
     res = run_sql("SELECT status FROM schTASK WHERE id=%s LIMIT 1", (task_id, ), 1)
     if res:
         return res[0][0]
 
 
 def bibsched_set_status(task_id, status, when_status_is=None):
     """Update the status of task_id."""
     if when_status_is is None:
         return run_sql("UPDATE schTASK SET status=%s WHERE id=%s",
                        (status, task_id))
     else:
         return run_sql("UPDATE schTASK SET status=%s WHERE id=%s AND status=%s",
                        (status, task_id, when_status_is))
 
 
 def bibsched_set_progress(task_id, progress):
     """Update the progress of task_id."""
     return run_sql("UPDATE schTASK SET progress=%s WHERE id=%s", (progress, task_id))
 
 
 def bibsched_set_priority(task_id, priority):
     """Update the priority of task_id."""
     return run_sql("UPDATE schTASK SET priority=%s WHERE id=%s", (priority, task_id))
 
 
 def bibsched_send_signal(proc, task_id, sig):
     """Send a signal to a given task."""
     if bibsched_get_host(task_id) != gethostname():
         return False
     pid = get_task_pid(proc, task_id, True)
     if pid:
         try:
             os.kill(pid, sig)
             return True
         except OSError:
             return False
     return False
 
 
 def is_monotask(task_id, proc, runtime, status, priority, host, sequenceid): # pylint: disable=W0613
     procname = proc.split(':')[0]
     return procname in CFG_BIBTASK_MONOTASKS
 
 
 def stop_task(other_task_id, other_proc, other_priority, other_status, other_sequenceid): # pylint: disable=W0613
     Log("Send STOP signal to #%d (%s) which was in status %s" % (other_task_id, other_proc, other_status))
     bibsched_set_status(other_task_id, 'ABOUT TO STOP', other_status)
 
 
 def sleep_task(other_task_id, other_proc, other_priority, other_status, other_sequenceid): # pylint: disable=W0613
     Log("Send SLEEP signal to #%d (%s) which was in status %s" % (other_task_id, other_proc, other_status))
     bibsched_set_status(other_task_id, 'ABOUT TO SLEEP', other_status)
 
 
 class Manager(object):
     def __init__(self, old_stdout):
         import curses
         import curses.panel
         from curses.wrapper import wrapper
         self.old_stdout = old_stdout
         self.curses = curses
         self.helper_modules = CFG_BIBTASK_VALID_TASKS
         self.running = 1
         self.footer_auto_mode = "Automatic Mode [A Manual] [1/2/3 Display] [P Purge] [l/L Log] [O Opts] [E Edit motd] [Q Quit]"
         self.footer_select_mode = "Manual Mode [A Automatic] [1/2/3 Display Type] [P Purge] [l/L Log] [O Opts] [E Edit motd] [Q Quit]"
         self.footer_waiting_item = "[R Run] [D Delete] [N Priority]"
         self.footer_running_item = "[S Sleep] [T Stop] [K Kill]"
         self.footer_stopped_item = "[I Initialise] [D Delete] [K Acknowledge]"
         self.footer_sleeping_item = "[W Wake Up] [T Stop] [K Kill]"
         self.item_status = ""
         self.rows = []
         self.panel = None
         self.display = 2
         self.first_visible_line = 0
         self.auto_mode = 0
         self.currentrow = None
         self.current_attr = 0
         self.hostname = gethostname()
         self.allowed_task_types = CFG_BIBSCHED_NODE_TASKS.get(self.hostname, CFG_BIBTASK_VALID_TASKS)
         self.motd = ""
         self.header_lines = 2
         self.read_motd()
         self.selected_line = self.header_lines
         wrapper(self.start)
 
     def read_motd(self):
         """Get a fresh motd from disk, if it exists."""
         self.motd = ""
         self.header_lines = 2
         try:
             if os.path.exists(CFG_MOTD_PATH):
                 motd = open(CFG_MOTD_PATH).read().strip()
                 if motd:
                     self.motd = "MOTD [%s] " % time.strftime("%Y-%m-%d %H:%M", time.localtime(os.path.getmtime(CFG_MOTD_PATH))) + motd
                     self.header_lines = 3
         except IOError:
             pass
 
     def handle_keys(self, char):
         if char == -1:
             return
         if self.auto_mode and (char not in (self.curses.KEY_UP,
                                            self.curses.KEY_DOWN,
                                            self.curses.KEY_PPAGE,
                                            self.curses.KEY_NPAGE,
                                            ord("g"), ord("G"), ord("n"),
                                            ord("q"), ord("Q"), ord("a"),
                                            ord("A"), ord("1"), ord("2"), ord("3"),
                                            ord("p"), ord("P"), ord("o"), ord("O"),
                                            ord("l"), ord("L"), ord("e"), ord("E"))):
             self.display_in_footer("in automatic mode")
         else:
             status = self.currentrow and self.currentrow[5] or None
             if char == self.curses.KEY_UP:
                 self.selected_line = max(self.selected_line - 1,
                                          self.header_lines)
                 self.repaint()
             if char == self.curses.KEY_PPAGE:
                 self.selected_line = max(self.selected_line - 10,
                                          self.header_lines)
                 self.repaint()
             elif char == self.curses.KEY_DOWN:
                 self.selected_line = min(self.selected_line + 1,
                                          len(self.rows) + self.header_lines - 1)
                 self.repaint()
             elif char == self.curses.KEY_NPAGE:
                 self.selected_line = min(self.selected_line + 10,
                                          len(self.rows) + self.header_lines - 1)
                 self.repaint()
             elif char == self.curses.KEY_HOME:
                 self.first_visible_line = 0
                 self.selected_line = self.header_lines
             elif char == ord("g"):
                 self.selected_line = self.header_lines
                 self.repaint()
             elif char == ord("G"):
                 self.selected_line = len(self.rows) + self.header_lines - 1
                 self.repaint()
             elif char in (ord("a"), ord("A")):
                 self.change_auto_mode()
             elif char == ord("l"):
                 self.openlog()
             elif char == ord("L"):
                 self.openlog(err=True)
             elif char in (ord("w"), ord("W")):
                 self.wakeup()
             elif char in (ord("n"), ord("N")):
                 self.change_priority()
             elif char in (ord("r"), ord("R")):
                 if status in ('WAITING', 'SCHEDULED'):
                     self.run()
             elif char in (ord("s"), ord("S")):
                 self.sleep()
             elif char in (ord("k"), ord("K")):
                 if status in ('ERROR', 'DONE WITH ERRORS', 'ERRORS REPORTED'):
                     self.acknowledge()
                 elif status is not None:
                     self.kill()
             elif char in (ord("t"), ord("T")):
                 self.stop()
             elif char in (ord("d"), ord("D")):
                 self.delete()
             elif char in (ord("i"), ord("I")):
                 self.init()
             elif char in (ord("p"), ord("P")):
                 self.purge_done()
             elif char in (ord("o"), ord("O")):
                 self.display_task_options()
             elif char in (ord("e"), ord("E")):
                 self.edit_motd()
                 self.read_motd()
             elif char == ord("1"):
                 self.display = 1
                 self.first_visible_line = 0
                 self.selected_line = self.header_lines
                 # We need to update the display to display done tasks
                 self.update_rows()
                 self.repaint()
                 self.display_in_footer("only done processes are displayed")
             elif char == ord("2"):
                 self.display = 2
                 self.first_visible_line = 0
                 self.selected_line = self.header_lines
                 # We need to update the display to display not done tasks
                 self.update_rows()
                 self.repaint()
                 self.display_in_footer("only not done processes are displayed")
             elif char == ord("3"):
                 self.display = 3
                 self.first_visible_line = 0
                 self.selected_line = self.header_lines
                 # We need to update the display to display archived tasks
                 self.update_rows()
                 self.repaint()
                 self.display_in_footer("only archived processes are displayed")
             elif char in (ord("q"), ord("Q")):
                 if self.curses.panel.top_panel() == self.panel:
                     self.panel = None
                     self.curses.panel.update_panels()
                 else:
                     self.running = 0
                     return
 
     def openlog(self, err=False):
         task_id = self.currentrow[0]
         if err:
             logname = os.path.join(CFG_LOGDIR, 'bibsched_task_%d.err' % task_id)
         else:
             logname = os.path.join(CFG_LOGDIR, 'bibsched_task_%d.log' % task_id)
         if os.path.exists(logname):
             pager = get_pager()
             if os.path.exists(pager):
                 self.curses.endwin()
                 os.system('%s %s' % (pager, logname))
                 print >> self.old_stdout, "\rPress ENTER to continue",
                 self.old_stdout.flush()
                 raw_input()
                 # We need to redraw the bibsched task list
                 # since we are displaying "Press ENTER to continue"
                 self.repaint()
             else:
                 self._display_message_box("No pager was found")
 
     def edit_motd(self):
         """Add, delete or change the motd message that will be shown when the
         bibsched monitor starts."""
         editor = get_editor()
         if editor:
             previous = self.motd
             self.curses.endwin()
             os.system("%s %s" % (editor, CFG_MOTD_PATH))
 
             # We need to redraw the MOTD part
             self.read_motd()
             self.repaint()
 
             if previous[24:] != self.motd[24:]:
                 if len(previous) == 0:
                     Log('motd set to "%s"' % self.motd.replace("\n", "|"))
                     self.selected_line += 1
                     self.header_lines += 1
                 elif len(self.motd) == 0:
                     Log('motd deleted')
                     self.selected_line -= 1
                     self.header_lines -= 1
                 else:
                     Log('motd changed to "%s"' % self.motd.replace("\n", "|"))
         else:
             self._display_message_box("No editor was found")
 
     def display_task_options(self):
         """Nicely display information about current process."""
         msg = '        id: %i\n\n' % self.currentrow[0]
         pid = get_task_pid(self.currentrow[1], self.currentrow[0], True)
         if pid is not None:
             msg += '       pid: %s\n\n' % pid
         msg += '  priority: %s\n\n' % self.currentrow[8]
         msg += '      proc: %s\n\n' % self.currentrow[1]
         msg += '      user: %s\n\n' % self.currentrow[2]
         msg += '   runtime: %s\n\n' % self.currentrow[3].strftime("%Y-%m-%d %H:%M:%S")
         msg += ' sleeptime: %s\n\n' % self.currentrow[4]
         msg += '    status: %s\n\n' % self.currentrow[5]
         msg += '  progress: %s\n\n' % self.currentrow[6]
         arguments = marshal.loads(self.currentrow[7])
         if type(arguments) is dict:
             # FIXME: REMOVE AFTER MAJOR RELEASE 1.0
             msg += '   options : %s\n\n' % arguments
         else:
             msg += 'executable : %s\n\n' % arguments[0]
             msg += ' arguments : %s\n\n' % ' '.join(arguments[1:])
         msg += '\n\nPress q to quit this panel...'
         msg = wrap_text_in_a_box(msg, style='no_border')
         rows = msg.split('\n')
         height = len(rows) + 2
         width = max([len(row) for row in rows]) + 4
         try:
             self.win = self.curses.newwin(
                 height,
                 width,
                 (self.height - height) / 2 + 1,
                 (self.width - width) / 2 + 1
             )
         except self.curses.error:
             return
         self.panel = self.curses.panel.new_panel(self.win)
         self.panel.top()
         self.win.border()
         i = 1
         for row in rows:
             self.win.addstr(i, 2, row, self.current_attr)
             i += 1
         self.win.refresh()
         while self.win.getkey() != 'q':
             pass
         self.panel = None
 
     def count_processes(self, status):
         out = 0
         res = run_sql("""SELECT COUNT(id) FROM schTASK
                          WHERE status=%s GROUP BY status""", (status,))
         try:
             out = res[0][0]
         except:
             pass
         return out
 
     def change_priority(self):
         task_id = self.currentrow[0]
         priority = self.currentrow[8]
         new_priority = self._display_ask_number_box("Insert the desired \
 priority for task %s. The smaller the number the less the priority. Note that \
 a number less than -10 will mean to always postpone the task while a number \
 bigger than 10 will mean some tasks with less priority could be stopped in \
 order to let this task run. The current priority is %s. New value:"
                                                         % (task_id, priority))
         try:
             new_priority = int(new_priority)
         except ValueError:
             return
         bibsched_set_priority(task_id, new_priority)
 
         # We need to update the tasks list with our new priority
         # to be able to display it
         self.update_rows()
         # We need to update the priority number next to the task
         self.repaint()
 
     def wakeup(self):
         task_id = self.currentrow[0]
         process = self.currentrow[1]
         status = self.currentrow[5]
         #if self.count_processes('RUNNING') + self.count_processes('CONTINUING') >= 1:
             #self.display_in_footer("a process is already running!")
         if status == "SLEEPING":
             if not bibsched_send_signal(process, task_id, signal.SIGCONT):
                 bibsched_set_status(task_id, "ERROR", "SLEEPING")
             self.update_rows()
             self.repaint()
             self.display_in_footer("process woken up")
         else:
             self.display_in_footer("process is not sleeping")
         self.stdscr.refresh()
 
     def _display_YN_box(self, msg):
         """Utility to display confirmation boxes."""
         msg += ' (Y/N)'
         msg = wrap_text_in_a_box(msg, style='no_border')
         rows = msg.split('\n')
         height = len(rows) + 2
         width = max([len(row) for row in rows]) + 4
         self.win = self.curses.newwin(
             height,
             width,
             (self.height - height) / 2 + 1,
             (self.width - width) / 2 + 1
         )
         self.panel = self.curses.panel.new_panel(self.win)
         self.panel.top()
         self.win.border()
         i = 1
         for row in rows:
             self.win.addstr(i, 2, row, self.current_attr)
             i += 1
         self.win.refresh()
         try:
             while 1:
                 c = self.win.getch()
                 if c in (ord('y'), ord('Y')):
                     return True
                 elif c in (ord('n'), ord('N')):
                     return False
         finally:
             self.panel = None
 
     def _display_ask_number_box(self, msg):
         """Utility to display confirmation boxes."""
         msg = wrap_text_in_a_box(msg, style='no_border')
         rows = msg.split('\n')
         height = len(rows) + 3
         width = max([len(row) for row in rows]) + 4
         self.win = self.curses.newwin(
             height,
             width,
             (self.height - height) / 2 + 1,
             (self.width - width) / 2 + 1
         )
         self.panel = self.curses.panel.new_panel(self.win)
         self.panel.top()
         self.win.border()
         i = 1
         for row in rows:
             self.win.addstr(i, 2, row, self.current_attr)
             i += 1
         self.win.refresh()
         self.win.move(height - 2, 2)
         self.curses.echo()
         ret = self.win.getstr()
         self.curses.noecho()
         self.panel = None
         return ret
 
     def _display_message_box(self, msg):
         """Utility to display message boxes."""
         rows = msg.split('\n')
         height = len(rows) + 2
         width = max([len(row) for row in rows]) + 3
         self.win = self.curses.newwin(
             height,
             width,
             (self.height - height) / 2 + 1,
             (self.width - width) / 2 + 1
         )
         self.panel = self.curses.panel.new_panel(self.win)
         self.panel.top()
         self.win.border()
         i = 1
         for row in rows:
             self.win.addstr(i, 2, row, self.current_attr)
             i += 1
         self.win.refresh()
         self.win.move(height - 2, 2)
         self.win.getkey()
         self.curses.noecho()
         self.panel = None
 
     def purge_done(self):
         """Garbage collector."""
         if self._display_YN_box(
             "You are going to purge the list of DONE tasks.\n\n"
             "%s tasks, submitted since %s days, will be archived.\n\n"
             "%s tasks, submitted since %s days, will be deleted.\n\n"
             "Are you sure?" % (
                 ', '.join(CFG_BIBSCHED_GC_TASKS_TO_ARCHIVE),
                 CFG_BIBSCHED_GC_TASKS_OLDER_THAN,
                 ', '.join(CFG_BIBSCHED_GC_TASKS_TO_REMOVE),
                 CFG_BIBSCHED_GC_TASKS_OLDER_THAN)):
             gc_tasks()
             # We removed some tasks from our list
             self.update_rows()
             self.repaint()
             self.display_in_footer("DONE processes purged")
 
     def run(self):
         task_id = self.currentrow[0]
         process = self.currentrow[1].split(':')[0]
         status = self.currentrow[5]
 
         if status == "WAITING":
             if process in self.helper_modules:
                 if run_sql("""UPDATE schTASK SET status='SCHEDULED', host=%s
                               WHERE id=%s and status='WAITING'""",
                            (self.hostname, task_id)):
                     program = os.path.join(CFG_BINDIR, process)
                     command = "%s %s" % (program, str(task_id))
                     spawn_task(command)
                     Log("manually running task #%d (%s)" % (task_id, process))
                     # We changed the status of one of our tasks
                     self.update_rows()
                     self.repaint()
                 else:
                     ## Process already running (typing too quickly on the keyboard?)
                     pass
             else:
                 self.display_in_footer("Process %s is not in the list of allowed processes." % process)
         else:
             self.display_in_footer("Process status should be SCHEDULED or WAITING!")
 
     def acknowledge(self):
         task_id = self.currentrow[0]
         status = self.currentrow[5]
         if status in ('ERROR', 'DONE WITH ERRORS', 'ERRORS REPORTED'):
             bibsched_set_status(task_id, 'ACK ' + status, status)
             self.update_rows()
             self.repaint()
             self.display_in_footer("Acknowledged error")
 
     def sleep(self):
         task_id = self.currentrow[0]
         status = self.currentrow[5]
         if status in ('RUNNING', 'CONTINUING'):
             bibsched_set_status(task_id, 'ABOUT TO SLEEP', status)
             self.update_rows()
             self.repaint()
             self.display_in_footer("SLEEP signal sent to task #%s" % task_id)
         else:
             self.display_in_footer("Cannot put to sleep non-running processes")
 
     def kill(self):
         task_id = self.currentrow[0]
         process = self.currentrow[1]
         status = self.currentrow[5]
         if status in ('RUNNING', 'CONTINUING', 'ABOUT TO STOP', 'ABOUT TO SLEEP', 'SLEEPING'):
             if self._display_YN_box("Are you sure you want to kill the %s process %s?" % (process, task_id)):
                 bibsched_send_signal(process, task_id, signal.SIGKILL)
                 bibsched_set_status(task_id, 'KILLED')
                 self.update_rows()
                 self.repaint()
                 self.display_in_footer("KILL signal sent to task #%s" % task_id)
         else:
             self.display_in_footer("Cannot kill non-running processes")
 
     def stop(self):
         task_id = self.currentrow[0]
         process = self.currentrow[1]
         status = self.currentrow[5]
         if status in ('RUNNING', 'CONTINUING', 'ABOUT TO SLEEP', 'SLEEPING'):
             if status == 'SLEEPING':
                 bibsched_set_status(task_id, 'NOW STOP', 'SLEEPING')
                 bibsched_send_signal(process, task_id, signal.SIGCONT)
                 count = 10
                 while bibsched_get_status(task_id) == 'NOW STOP':
                     if count <= 0:
                         bibsched_set_status(task_id, 'ERROR', 'NOW STOP')
                         self.update_rows()
                         self.repaint()
                         self.display_in_footer("It seems impossible to wakeup this task.")
                         return
                     time.sleep(CFG_BIBSCHED_REFRESHTIME)
                     count -= 1
             else:
                 bibsched_set_status(task_id, 'ABOUT TO STOP', status)
             self.update_rows()
             self.repaint()
             self.display_in_footer("STOP signal sent to task #%s" % task_id)
         else:
             self.display_in_footer("Cannot stop non-running processes")
 
     def delete(self):
         task_id = self.currentrow[0]
         status = self.currentrow[5]
         if status not in ('RUNNING', 'CONTINUING', 'SLEEPING', 'SCHEDULED', 'ABOUT TO STOP', 'ABOUT TO SLEEP'):
             bibsched_set_status(task_id, "%s_DELETED" % status, status)
             self.display_in_footer("process deleted")
             self.update_rows()
             self.repaint()
         else:
             self.display_in_footer("Cannot delete running processes")
 
     def init(self):
         task_id = self.currentrow[0]
         status = self.currentrow[5]
         if status not in ('RUNNING', 'CONTINUING', 'SLEEPING'):
             bibsched_set_status(task_id, "WAITING")
             bibsched_set_progress(task_id, "")
             bibsched_set_host(task_id, "")
             self.update_rows()
             self.repaint()
             self.display_in_footer("process initialised")
         else:
             self.display_in_footer("Cannot initialise running processes")
 
     def change_auto_mode(self):
         program = os.path.join(CFG_BINDIR, "bibsched")
         if self.auto_mode:
             COMMAND = "%s -q halt" % program
         else:
             COMMAND = "%s -q start" % program
         os.system(COMMAND)
 
         self.auto_mode = not self.auto_mode
         # We need to refresh the color of the header and footer
         self.repaint()
 
     def put_line(self, row, header=False, motd=False):
         ## ROW: (id,proc,user,runtime,sleeptime,status,progress,arguments,priority,host)
         ##       0  1    2    3       4         5      6        7         8        9
-        col_w = [7 , 25, 15, 21, 7, 11, 21, 60]
+        col_w = [8 , 25, 15, 21, 7, 12, 21, 60]
         maxx = self.width
         if self.y == self.selected_line - self.first_visible_line and self.y > 1:
             self.item_status = row[5]
             self.currentrow = row
         if motd:
             attr = self.curses.color_pair(1) + self.curses.A_BOLD
         elif self.y == self.header_lines - 2:
             if self.auto_mode:
                 attr = self.curses.color_pair(2) + self.curses.A_STANDOUT + self.curses.A_BOLD
             else:
                 attr = self.curses.color_pair(8) + self.curses.A_STANDOUT + self.curses.A_BOLD
         elif row[5] == "DONE":
             attr = self.curses.color_pair(5) + self.curses.A_BOLD
         elif row[5] == "STOPPED":
             attr = self.curses.color_pair(6) + self.curses.A_BOLD
         elif row[5].find("ERROR") > -1:
             attr = self.curses.color_pair(4) + self.curses.A_BOLD
         elif row[5] == "WAITING":
             attr = self.curses.color_pair(3) + self.curses.A_BOLD
         elif row[5] in ("RUNNING", "CONTINUING"):
             attr = self.curses.color_pair(2) + self.curses.A_BOLD
         elif not header and row[8]:
             attr = self.curses.A_BOLD
         else:
             attr = self.curses.A_NORMAL
         ## If the task is not relevant for this instance ob BibSched because
         ## the type of the task can not be run, or it is running on another
         ## machine: make it a different color
         if not header and (row[1].split(':')[0] not in self.allowed_task_types or
               (row[9] != '' and row[9] != self.hostname)):
             attr = self.curses.color_pair(6)
             if not row[6]:
                 nrow = list(row)
                 nrow[6] = 'Not allowed on this instance'
                 row = tuple(nrow)
         if self.y == self.selected_line - self.first_visible_line and self.y > 1:
             self.current_attr = attr
             attr += self.curses.A_REVERSE
         if header:  # Dirty hack. put_line should be better refactored.
             # row contains one less element: arguments
             ## !!! FIXME: THIS IS CRAP
             myline = str(row[0]).ljust(col_w[0]-1)
             myline += str(row[1]).ljust(col_w[1]-1)
             myline += str(row[2]).ljust(col_w[2]-1)
             myline += str(row[3]).ljust(col_w[3]-1)
             myline += str(row[4]).ljust(col_w[4]-1)
             myline += str(row[5]).ljust(col_w[5]-1)
             myline += str(row[6]).ljust(col_w[6]-1)
             myline += str(row[7]).ljust(col_w[7]-1)
         elif motd:
             myline = str(row[0])
         else:
              ## ROW: (id,proc,user,runtime,sleeptime,status,progress,arguments,priority,host)
              ##       0  1    2    3       4         5      6        7         8        9
             priority = str(row[8] and ' [%s]' % row[8] or '')
             myline = str(row[0]).ljust(col_w[0])[:col_w[0]-1]
             myline += (str(row[1])[:col_w[1]-len(priority)-2] + priority).ljust(col_w[1]-1)
             myline += str(row[2]).ljust(col_w[2])[:col_w[2]-1]
             myline += str(row[3]).ljust(col_w[3])[:col_w[3]-1]
             myline += str(row[4]).ljust(col_w[4])[:col_w[4]-1]
             myline += str(row[5]).ljust(col_w[5])[:col_w[5]-1]
             myline += str(row[9]).ljust(col_w[6])[:col_w[6]-1]
             myline += str(row[6]).ljust(col_w[7])[:col_w[7]-1]
         myline = myline.ljust(maxx)
         try:
             self.stdscr.addnstr(self.y, 0, myline, maxx, attr)
         except self.curses.error:
             pass
         self.y += 1
 
     def display_in_footer(self, footer, i=0, print_time_p=0):
         if print_time_p:
             footer = "%s %s" % (footer, time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
         maxx = self.stdscr.getmaxyx()[1]
         footer = footer.ljust(maxx)
         if self.auto_mode:
             colorpair = 2
         else:
             colorpair = 1
         try:
             self.stdscr.addnstr(self.y - i, 0, footer, maxx - 1, self.curses.A_STANDOUT + self.curses.color_pair(colorpair) + self.curses.A_BOLD)
         except self.curses.error:
             pass
 
     def repaint(self):
         if server_pid():
             self.auto_mode = 1
         else:
             if self.auto_mode == 1:
                 self.curses.beep()
             self.auto_mode = 0
         self.y = 0
         self.stdscr.erase()
         self.height, self.width = self.stdscr.getmaxyx()
         maxy = self.height - 2
         #maxx = self.width
         if len(self.motd) > 0:
             self.put_line((self.motd.strip().replace("\n", " - ")[:79], "", "", "", "", "", "", "", ""), header=False, motd=True)
         self.put_line(("ID", "PROC [PRI]", "USER", "RUNTIME", "SLEEP", "STATUS", "HOST", "PROGRESS"), header=True)
         self.put_line(("", "", "", "", "", "", "", ""), header=True)
         if self.selected_line > maxy + self.first_visible_line - 1:
             self.first_visible_line = self.selected_line - maxy + 1
         if self.selected_line < self.first_visible_line + 2:
             self.first_visible_line = self.selected_line - 2
         for row in self.rows[self.first_visible_line:self.first_visible_line+maxy-2]:
             self.put_line(row)
         self.y = self.stdscr.getmaxyx()[0] - 1
         if self.auto_mode:
             self.display_in_footer(self.footer_auto_mode, print_time_p=1)
         else:
             self.display_in_footer(self.footer_select_mode, print_time_p=1)
             footer2 = ""
             if self.item_status.find("DONE") > -1 or self.item_status in ("ERROR", "STOPPED", "KILLED", "ERRORS REPORTED"):
                 footer2 += self.footer_stopped_item
             elif self.item_status in ("RUNNING", "CONTINUING", "ABOUT TO STOP", "ABOUT TO SLEEP"):
                 footer2 += self.footer_running_item
             elif self.item_status == "SLEEPING":
                 footer2 += self.footer_sleeping_item
             elif self.item_status == "WAITING":
                 footer2 += self.footer_waiting_item
             self.display_in_footer(footer2, 1)
         self.stdscr.refresh()
 
     def update_rows(self):
         if self.display == 1:
             table = "schTASK"
             where = "and (status='DONE' or status LIKE 'ACK%')"
             order = "runtime DESC"
             limit = ""
         elif self.display == 2:
             table = "schTASK"
             where = "and (status<>'DONE' and status NOT LIKE 'ACK%')"
             order = "runtime ASC"
             limit = "limit %s" % CFG_BIBSCHED_MAX_ARCHIVED_ROWS_DISPLAY
         else:
             table = "hstTASK"
             order = "runtime DESC"
             where = ""
             limit = ""
         self.rows = run_sql("""SELECT id, proc, user, runtime, sleeptime,
                                status, progress, arguments, priority, host,
                                sequenceid
                                FROM %s
                                WHERE status NOT LIKE '%%_DELETED' %s
                                ORDER BY %s
                                %s""" % (table, where, order, limit))
         # Make sure we are not selecting a line that disappeared
         self.selected_line = min(self.selected_line,
                                  len(self.rows) + self.header_lines - 1)
 
     def start(self, stdscr):
         os.environ['BIBSCHED_MODE'] = 'manual'
         if self.curses.has_colors():
             self.curses.start_color()
             self.curses.init_pair(8, self.curses.COLOR_WHITE, self.curses.COLOR_BLACK)
             self.curses.init_pair(1, self.curses.COLOR_WHITE, self.curses.COLOR_RED)
             self.curses.init_pair(2, self.curses.COLOR_GREEN, self.curses.COLOR_BLACK)
             self.curses.init_pair(3, self.curses.COLOR_MAGENTA, self.curses.COLOR_BLACK)
             self.curses.init_pair(4, self.curses.COLOR_RED, self.curses.COLOR_BLACK)
             self.curses.init_pair(5, self.curses.COLOR_BLUE, self.curses.COLOR_BLACK)
             self.curses.init_pair(6, self.curses.COLOR_CYAN, self.curses.COLOR_BLACK)
             self.curses.init_pair(7, self.curses.COLOR_YELLOW, self.curses.COLOR_BLACK)
         self.stdscr = stdscr
         self.base_panel = self.curses.panel.new_panel(self.stdscr)
         self.base_panel.bottom()
         self.curses.panel.update_panels()
         self.height, self.width = stdscr.getmaxyx()
         self.stdscr.erase()
         if server_pid():
             self.auto_mode = 1
         ring = 4
         if len(self.motd) > 0:
             self._display_message_box(self.motd + "\nPress any key to close")
         while self.running:
             if ring == 4:
                 self.read_motd()
                 self.update_rows()
                 ring = 0
                 self.repaint()
             ring += 1
             char = -1
             try:
                 char = timed_out(self.stdscr.getch, 1)
                 if char == 27:  # escaping sequence
                     char = self.stdscr.getch()
                     if char == 79:  # arrow
                         char = self.stdscr.getch()
                         if char == 65:  # arrow up
                             char = self.curses.KEY_UP
                         elif char == 66:  # arrow down
                             char = self.curses.KEY_DOWN
                         elif char == 72:
                             char = self.curses.KEY_PPAGE
                         elif char == 70:
                             char = self.curses.KEY_NPAGE
                     elif char == 91:
                         char = self.stdscr.getch()
                         if char == 53:
                             char = self.stdscr.getch()
                             if char == 126:
                                 char = self.curses.KEY_HOME
             except TimedOutExc:
                 char = -1
             self.handle_keys(char)
 
 
 class BibSched(object):
     def __init__(self, debug=False):
         self.debug = debug
         self.hostname = gethostname()
         self.helper_modules = CFG_BIBTASK_VALID_TASKS
         ## All the tasks in the queue that the node is allowed to manipulate
         self.node_relevant_bibupload_tasks = ()
         self.node_relevant_waiting_tasks = ()
         self.node_relevant_active_tasks = ()
         ## All tasks of all nodes
         self.active_tasks_all_nodes = ()
         self.mono_tasks_all_nodes = ()
         self.allowed_task_types = CFG_BIBSCHED_NODE_TASKS.get(self.hostname, CFG_BIBTASK_VALID_TASKS)
         os.environ['BIBSCHED_MODE'] = 'automatic'
 
     def tie_task_to_host(self, task_id):
         """Sets the hostname of a task to the machine executing this script
         @return: True if the scheduling was successful, False otherwise,
             e.g. if the task was scheduled concurrently on a different host.
         """
         if not run_sql("""SELECT id FROM schTASK WHERE id=%s AND host=''
                           AND status='WAITING'""", (task_id, )):
             ## The task was already tied?
             return False
         run_sql("""UPDATE schTASK SET host=%s, status='SCHEDULED'
                    WHERE id=%s AND host='' AND status='WAITING'""",
                 (self.hostname, task_id))
         return bool(run_sql("SELECT id FROM schTASK WHERE id=%s AND host=%s",
                             (task_id, self.hostname)))
 
     def filter_for_allowed_tasks(self):
         """ Removes all tasks that are not allowed in this Invenio instance
         """
 
         def relevant_task(task_id, proc, runtime, status, priority, host, sequenceid): # pylint: disable=W0613
             # if host and self.hostname != host:
             #     return False
 
             procname = proc.split(':')[0]
             if procname not in self.allowed_task_types:
                 return False
 
             return True
 
         def filter_tasks(tasks):
             return tuple(t for t in tasks if relevant_task(*t))
 
         self.node_relevant_bibupload_tasks = filter_tasks(self.node_relevant_bibupload_tasks)
         self.node_relevant_active_tasks = filter_tasks(self.node_relevant_active_tasks)
         self.node_relevant_waiting_tasks = filter_tasks(self.node_relevant_waiting_tasks)
         self.node_relevant_sleeping_tasks = filter_tasks(self.node_relevant_sleeping_tasks)
 
     def is_task_safe_to_execute(self, proc1, proc2):
         """Return True when the two tasks can run concurrently."""
         return proc1 != proc2  # and not proc1.startswith('bibupload') and not proc2.startswith('bibupload')
 
     def get_tasks_to_sleep_and_stop(self, proc, task_set):
         """Among the task_set, return the list of tasks to stop and the list
         of tasks to sleep.
         """
         if proc in CFG_BIBTASK_MONOTASKS:
             return [], [t for t in task_set
                                 if t[3] not in ('SLEEPING', 'ABOUT TO SLEEP')]
 
         min_prio = None
         min_task_id = None
         min_proc = None
         min_status = None
         min_sequenceid = None
         to_stop = []
 
         ## For all the lower priority tasks...
         for (this_task_id, this_proc, this_priority, this_status, this_sequenceid) in task_set:
             if not self.is_task_safe_to_execute(this_proc, proc):
                 to_stop.append((this_task_id, this_proc, this_priority, this_status, this_sequenceid))
             elif (min_prio is None or this_priority < min_prio) and \
                             this_status not in ('SLEEPING', 'ABOUT TO SLEEP'):
                 ## We don't put to sleep already sleeping task :-)
                 min_prio = this_priority
                 min_task_id = this_task_id
                 min_proc = this_proc
                 min_status = this_status
                 min_sequenceid = this_sequenceid
 
         if to_stop:
             return to_stop, []
         elif min_task_id:
             return [], [(min_task_id, min_proc, min_prio, min_status, min_sequenceid)]
         else:
             return [], []
 
     def split_active_tasks_by_priority(self, task_id, priority):
         """Return two lists: the list of task_ids with lower priority and
         those with higher or equal priority."""
         higher = []
         lower = []
         ### !!! We already have this in node_relevant_active_tasks
         for other_task_id, task_proc, dummy, status, task_priority, task_host, sequenceid in self.node_relevant_active_tasks:
         # for other_task_id, task_proc, runtime, status, task_priority, task_host in self.node_relevant_active_tasks:
         # for other_task_id, task_proc, task_priority, status in self.get_running_tasks():
             if task_id == other_task_id:
                 continue
             if task_priority < priority and task_host == self.hostname:
                 lower.append((other_task_id, task_proc, task_priority, status, sequenceid))
             elif task_host == self.hostname:
                 higher.append((other_task_id, task_proc, task_priority, status, sequenceid))
         return lower, higher
 
     def handle_task(self, task_id, proc, runtime, status, priority, host, sequenceid):
         """Perform needed action of the row representing a task.
         Return True when task_status need to be refreshed"""
         debug = self.debug
         if debug:
             Log("task_id: %s, proc: %s, runtime: %s, status: %s, priority: %s, host: %s, sequenceid: %s" %
                 (task_id, proc, runtime, status, priority, host, sequenceid))
 
         if (task_id, proc, runtime, status, priority, host, sequenceid) in self.node_relevant_active_tasks:
             # For multi-node
             # check if we need to sleep ourselves for monotasks to be able to run
             for other_task_id, other_proc, dummy_other_runtime, other_status, other_priority, other_host, other_sequenceid in self.mono_tasks_all_nodes:
                 if priority < other_priority:
                     # Sleep ourselves
                     if status not in ('SLEEPING', 'ABOUT TO SLEEP'):
                         sleep_task(task_id, proc, priority, status, sequenceid)
                         return True
                     return False
 
         elif (task_id, proc, runtime, status, priority, host, sequenceid) in self.node_relevant_waiting_tasks:
             if debug:
                 Log("Trying to run %s" % task_id)
 
             if priority < -10:
                 if debug:
                     Log("Cannot run because priority < -10")
                 return False
 
             lower, higher = self.split_active_tasks_by_priority(task_id, priority)
             if debug:
                 Log('lower: %s' % lower)
                 Log('higher: %s' % higher)
 
             for other_task_id, other_proc, dummy_other_runtime, other_status, \
                 other_priority, other_host, other_sequenceid in chain(
                     self.node_relevant_sleeping_tasks,
                     self.active_tasks_all_nodes):
                 if task_id != other_task_id and \
                             not self.is_task_safe_to_execute(proc, other_proc):
                     ### !!! WE NEED TO CHECK FOR TASKS THAT CAN ONLY BE EXECUTED ON ONE MACHINE AT ONE TIME
                     ### !!! FOR EXAMPLE BIBUPLOADS WHICH NEED TO BE EXECUTED SEQUENTIALLY AND NEVER CONCURRENTLY
                     ## There's at least a higher priority task running that
                     ## cannot run at the same time of the given task.
                     ## We give up
                     if debug:
                         Log("Cannot run because task_id: %s, proc: %s is in the queue and incompatible" % (other_task_id, other_proc))
                     return False
 
             if sequenceid:
                 ## Let's normalize the prority of all tasks in a sequenceid to the
                 ## max priority of the group
                 max_priority = run_sql("""SELECT MAX(priority) FROM schTASK
                                           WHERE status='WAITING'
                                           AND sequenceid=%s""",
                                        (sequenceid, ))[0][0]
                 if run_sql("""UPDATE schTASK SET priority=%s
                               WHERE status='WAITING' AND sequenceid=%s""",
                            (max_priority, sequenceid)):
                     Log("Raised all waiting tasks with sequenceid "
                         "%s to the max priority %s" % (sequenceid, max_priority))
                     ## Some priorities where raised
                     return True
 
                 ## Let's normalize the runtime of all tasks in a sequenceid to
                 ## the compatible runtime.
                 current_runtimes = run_sql("""SELECT id, runtime FROM schTASK WHERE sequenceid=%s AND status='WAITING' ORDER by id""", (sequenceid, ))
                 runtimes_adjusted = False
                 if current_runtimes:
                     last_runtime = current_runtimes[0][1]
                     for the_task_id, runtime in current_runtimes:
                         if runtime < last_runtime:
                             run_sql("""UPDATE schTASK SET runtime=%s WHERE id=%s""", (last_runtime, the_task_id))
                             if debug:
                                 Log("Adjusted runtime of task_id %s to %s in order to be executed in the correct sequenceid order" % (the_task_id, last_runtime))
                             runtimes_adjusted = True
                             runtime = last_runtime
                         last_runtime = runtime
                 if runtimes_adjusted:
                     ## Some runtime have been adjusted
                     return True
 
             if sequenceid is not None:
                 for other_task_id, dummy_other_proc, dummy_other_runtime, dummy_other_status, dummy_other_priority, dummy_other_host, other_sequenceid in self.active_tasks_all_nodes:
                     if sequenceid == other_sequenceid and task_id > other_task_id:
                         Log('Task %s need to run after task %s since they have the same sequence id: %s' % (task_id, other_task_id, sequenceid))
                         ## If there is a task with same sequence number then do not run the current task
                         return False
 
             if proc in CFG_BIBTASK_MONOTASKS and higher:
                 ## This is a monotask
                 if debug:
                     Log("Cannot run because this is a monotask and there are higher priority tasks: %s" % (higher, ))
                 return False
 
             ## No higher priority task have issue with the given task.
             if proc not in CFG_BIBTASK_FIXEDTIMETASKS and len(higher) >= CFG_BIBSCHED_MAX_NUMBER_CONCURRENT_TASKS:
                 if debug:
                     Log("Cannot run because all resources (%s) are used (%s), higher: %s" % (CFG_BIBSCHED_MAX_NUMBER_CONCURRENT_TASKS, len(higher), higher))
                 return False
 
             ## Check for monotasks wanting to run
             for other_task_id, other_proc, dummy_other_runtime, other_status, other_priority, other_host, other_sequenceid in self.mono_tasks_all_nodes:
                 if priority < other_priority:
                     if debug:
                         Log("Cannot run because there is a monotask with higher priority: %s %s" % (other_task_id, other_proc))
                     return False
 
             ## We check if it is necessary to stop/put to sleep some lower priority
             ## task.
             tasks_to_stop, tasks_to_sleep = self.get_tasks_to_sleep_and_stop(proc, lower)
             if debug:
                 Log('tasks_to_stop: %s' % tasks_to_stop)
                 Log('tasks_to_sleep: %s' % tasks_to_sleep)
 
             if tasks_to_stop and priority < 100:
                 ## Only tasks with priority higher than 100 have the power
                 ## to put task to stop.
                 if debug:
                     Log("Cannot run because there are task to stop: %s and priority < 100" % tasks_to_stop)
                 return False
 
             procname = proc.split(':')[0]
             if not tasks_to_stop and (not tasks_to_sleep or (proc not in CFG_BIBTASK_MONOTASKS and len(self.node_relevant_active_tasks) < CFG_BIBSCHED_MAX_NUMBER_CONCURRENT_TASKS)):
                 if proc in CFG_BIBTASK_MONOTASKS and self.active_tasks_all_nodes:
                     if debug:
                         Log("Cannot run because this is a monotask and there are other tasks running: %s" % (self.node_relevant_active_tasks, ))
                     return False
 
                 def task_in_same_host(dummy_task_id, dummy_proc, dummy_runtime, dummy_status, dummy_priority, host, dummy_sequenceid):
                     return host == self.hostname
 
                 def filter_by_host(tasks):
                     return tuple(t for t in tasks if task_in_same_host(*t))
 
                 node_active_tasks = filter_by_host(self.node_relevant_active_tasks)
                 if len(node_active_tasks) >= CFG_BIBSCHED_MAX_NUMBER_CONCURRENT_TASKS:
                     if debug:
                         Log("Cannot run because all resources (%s) are used (%s), active: %s" % (CFG_BIBSCHED_MAX_NUMBER_CONCURRENT_TASKS, len(node_active_tasks), node_active_tasks))
                     return False
 
                 if status in ("SLEEPING", "ABOUT TO SLEEP"):
                     if host == self.hostname:
                         ## We can only wake up tasks that are running on our own host
                         for other_task_id, other_proc, dummy_other_runtime, other_status, dummy_other_priority, other_host, dummy_other_sequenceid in self.node_relevant_active_tasks:
                             ## But only if there are not other tasks still going to sleep, otherwise
                             ## we might end up stealing the slot for an higher priority  task.
                             if other_task_id != task_id and other_status in ('ABOUT TO SLEEP', 'ABOUT TO STOP') and other_host == self.hostname:
                                 if debug:
                                     Log("Not yet waking up task #%d since there are other tasks (%s #%d) going to sleep (higher priority task incoming?)" % (task_id, other_proc, other_task_id))
                                 return False
 
                         bibsched_set_status(task_id, "CONTINUING", status)
                         if not bibsched_send_signal(proc, task_id, signal.SIGCONT):
                             bibsched_set_status(task_id, "ERROR", "CONTINUING")
                             Log("Task #%d (%s) woken up but didn't existed anymore" % (task_id, proc))
                             return True
                         Log("Task #%d (%s) woken up" % (task_id, proc))
                         return True
                     else:
                         return False
                 elif procname in self.helper_modules:
                     program = os.path.join(CFG_BINDIR, procname)
                     ## Trick to log in bibsched.log the task exiting
                     exit_str = '&& echo "`date "+%%Y-%%m-%%d %%H:%%M:%%S"` --> Task #%d (%s) exited" >> %s' % (task_id, proc, os.path.join(CFG_LOGDIR, 'bibsched.log'))
                     command = "%s %s %s" % (program, str(task_id), exit_str)
                     ### Set the task to scheduled and tie it to this host
                     if self.tie_task_to_host(task_id):
                         Log("Task #%d (%s) started" % (task_id, proc))
                         ### Relief the lock for the BibTask, it is safe now to do so
                         spawn_task(command, wait=proc in CFG_BIBTASK_MONOTASKS)
                         count = 10
                         while run_sql("""SELECT status FROM schTASK
                                          WHERE id=%s AND status='SCHEDULED'""",
                                       (task_id, )):
                             ## Polling to wait for the task to really start,
                             ## in order to avoid race conditions.
                             if count <= 0:
                                 raise StandardError("Process %s (task_id: %s) was launched but seems not to be able to reach RUNNING status." % (proc, task_id))
                             time.sleep(CFG_BIBSCHED_REFRESHTIME)
                             count -= 1
                     return True
                 else:
                     raise StandardError("%s is not in the allowed modules" % procname)
             else:
                 ## It's not still safe to run the task.
                 ## We first need to stop tasks that should be stopped
                 ## and to put to sleep tasks that should be put to sleep
                 for t in tasks_to_stop:
                     stop_task(*t)
                 for t in tasks_to_sleep:
                     sleep_task(*t)
 
                 time.sleep(CFG_BIBSCHED_REFRESHTIME)
                 return True
 
     def check_errors(self):
         errors = run_sql("""SELECT id,proc,status FROM schTASK
                             WHERE status = 'ERROR'
                             OR status = 'DONE WITH ERRORS'
                             OR status = 'CERROR'""")
         if errors:
             error_msgs = []
             error_recoverable = True
             for e_id, e_proc, e_status in errors:
                 if run_sql("""UPDATE schTASK
                                SET status='ERRORS REPORTED'
                                WHERE id = %s AND (status='CERROR'
                                OR status='ERROR'
                                OR status='DONE WITH ERRORS')""", [e_id]):
                     msg = "    #%s %s -> %s" % (e_id, e_proc, e_status)
                     error_msgs.append(msg)
                     if e_status in ('ERROR', 'DONE WITH ERRORS'):
                         error_recoverable = False
             if error_msgs:
                 msg = "BibTask with ERRORS:\n%s" % '\n'.join(error_msgs)
                 if error_recoverable:
                     raise RecoverableError(msg)
                 else:
                     raise StandardError(msg)
 
     def calculate_rows(self):
         """Return all the node_relevant_active_tasks to work on."""
         try:
             self.check_errors()
         except RecoverableError, msg:
             register_emergency('Light emergency from %s: BibTask failed: %s' % (CFG_SITE_URL, msg))
 
         max_bibupload_priority, min_bibupload_priority = run_sql(
                     """SELECT MAX(priority), MIN(priority)
                         FROM schTASK
                         WHERE status IN ('WAITING', 'RUNNING', 'SLEEPING',
                                 'ABOUT TO STOP', 'ABOUT TO SLEEP',
                                 'SCHEDULED', 'CONTINUING')
                         AND proc = 'bibupload'
                         AND runtime <= NOW()""")[0]
         if max_bibupload_priority > min_bibupload_priority:
             run_sql(
                 """UPDATE schTASK SET priority = %s
                    WHERE status IN ('WAITING', 'RUNNING', 'SLEEPING',
                                     'ABOUT TO STOP', 'ABOUT TO SLEEP',
                                     'SCHEDULED', 'CONTINUING')
                    AND proc = 'bibupload'
                    AND runtime <= NOW()
                    AND priority < %s""", (max_bibupload_priority,
                                           max_bibupload_priority))
         ## The bibupload tasks are sorted by id, which means by the order they were scheduled
         self.node_relevant_bibupload_tasks = run_sql(
             """SELECT id, proc, runtime, status, priority, host, sequenceid
                FROM schTASK WHERE status IN ('WAITING', 'SLEEPING')
                AND proc = 'bibupload'
                AND runtime <= NOW()
                ORDER BY id ASC LIMIT 1""", n=1)
         ## The other tasks are sorted by priority
         self.node_relevant_waiting_tasks = run_sql(
             """SELECT id, proc, runtime, status, priority, host, sequenceid
                FROM schTASK WHERE (status='WAITING' AND runtime <= NOW())
                OR status = 'SLEEPING'
                ORDER BY priority DESC, runtime ASC, id ASC""")
         self.node_relevant_sleeping_tasks = run_sql(
             """SELECT id, proc, runtime, status, priority, host, sequenceid
                FROM schTASK WHERE status = 'SLEEPING'
                ORDER BY priority DESC, runtime ASC, id ASC""")
         self.node_relevant_active_tasks = run_sql(
             """SELECT id, proc, runtime, status, priority, host, sequenceid
                FROM schTASK WHERE status IN ('RUNNING', 'CONTINUING',
                                              'SCHEDULED', 'ABOUT TO STOP',
                                              'ABOUT TO SLEEP')""")
         self.active_tasks_all_nodes = tuple(self.node_relevant_active_tasks)
         self.mono_tasks_all_nodes = tuple(t for t in self.node_relevant_waiting_tasks if is_monotask(*t))
         ## Remove tasks that can not be executed on this host
         self.filter_for_allowed_tasks()
 
     def watch_loop(self):
         ## Cleaning up scheduled task not run because of bibsched being
         ## interrupted in the middle.
         run_sql("""UPDATE schTASK
                    SET status = 'WAITING'
                    WHERE status = 'SCHEDULED'
                    AND host = %s""", (self.hostname, ))
 
         try:
             while True:
                 if self.debug:
                     Log("New bibsched cycle")
                 self.calculate_rows()
                 ## Let's first handle running node_relevant_active_tasks.
                 for task in self.node_relevant_active_tasks:
                     if self.handle_task(*task):
                         break
                 else:
                     # If nothing has changed we can go on to run tasks.
                     for task in self.node_relevant_waiting_tasks:
                         if task[1] == 'bibupload' and self.node_relevant_bibupload_tasks:
                             ## We switch in bibupload serial mode!
                             ## which means we execute the first next bibupload.
                             if self.handle_task(*self.node_relevant_bibupload_tasks[0]):
                                 ## Something has changed
                                 break
                         elif self.handle_task(*task):
                             ## Something has changed
                             break
                     else:
                         time.sleep(CFG_BIBSCHED_REFRESHTIME)
         except Exception, err:
             register_exception(alert_admin=True)
             try:
                 register_emergency('Emergency from %s: BibSched halted: %s' % (CFG_SITE_URL, err))
             except NotImplementedError:
                 pass
             raise
 
 
 class TimedOutExc(Exception):
     def __init__(self, value="Timed Out"):
         Exception.__init__(self)
         self.value = value
 
     def __str__(self):
         return repr(self.value)
 
 
 def timed_out(f, timeout, *args, **kwargs):
     def handler(signum, frame): # pylint: disable=W0613
         raise TimedOutExc()
 
     old = signal.signal(signal.SIGALRM, handler)
     signal.alarm(timeout)
     try:
         result = f(*args, **kwargs)
     finally:
         signal.signal(signal.SIGALRM, old)
     signal.alarm(0)
     return result
 
 
 def Log(message):
     log = open(CFG_LOGDIR + "/bibsched.log", "a")
     log.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime()))
     log.write(message)
     log.write("\n")
     log.close()
 
 
 def redirect_stdout_and_stderr():
     "This function redirects stdout and stderr to bibsched.log and bibsched.err file."
     old_stdout = sys.stdout
     old_stderr = sys.stderr
     sys.stdout = open(CFG_LOGDIR + "/bibsched.log", "a")
     sys.stderr = open(CFG_LOGDIR + "/bibsched.err", "a")
     return old_stdout, old_stderr
 
 
 def restore_stdout_and_stderr(stdout, stderr):
     sys.stdout = stdout
     sys.stderr = stderr
 
 
 def usage(exitcode=1, msg=""):
     """Prints usage info."""
     if msg:
         sys.stderr.write("Error: %s.\n" % msg)
 
     sys.stderr.write("""\
 Usage: %s [options] [start|stop|restart|monitor|status]
 
 The following commands are available for bibsched:
 
    start      start bibsched in background
    stop       stop running bibtasks and the bibsched daemon safely
    halt       halt running bibsched while keeping bibtasks running
    restart    restart running bibsched
    monitor    enter the interactive monitor
    status     get report about current status of the queue
    purge      purge the scheduler queue from old tasks
 
 General options:
   -h, --help       \t Print this help.
   -V, --version    \t Print version information.
   -q, --quiet      \t Quiet mode
   -d, --debug      \t Write debugging information in bibsched.log
 Status options:
   -s, --status=LIST\t Which BibTask status should be considered (default is Running,waiting)
   -S, --since=TIME\t Since how long time to consider tasks e.g.: 30m, 2h, 1d (default
   is all)
   -t, --tasks=LIST\t Comma separated list of BibTask to consider (default
                   \t is all)
 Purge options:
   -s, --status=LIST\t Which BibTask status should be considered (default is DONE)
   -S, --since=TIME\t Since how long time to consider tasks e.g.: 30m, 2h, 1d (default
   is %s days)
   -t, --tasks=LIST\t Comma separated list of BibTask to consider (default
                   \t is %s)
 
 """ % (sys.argv[0], CFG_BIBSCHED_GC_TASKS_OLDER_THAN, ','.join(CFG_BIBSCHED_GC_TASKS_TO_REMOVE + CFG_BIBSCHED_GC_TASKS_TO_ARCHIVE)))
 
     sys.exit(exitcode)
 
 pidfile = os.path.join(CFG_PREFIX, 'var', 'run', 'bibsched.pid')
 
 
 def error(msg):
     print >> sys.stderr, "error: %s" % msg
     sys.exit(1)
 
 
 def warning(msg):
     print >> sys.stderr, "warning: %s" % msg
 
 
 def server_pid(ping_the_process=True, check_is_really_bibsched=True):
     # The pid must be stored on the filesystem
     try:
         pid = int(open(pidfile).read())
     except IOError:
         return None
 
     if ping_the_process:
         # Even if the pid is available, we check if it corresponds to an
         # actual process, as it might have been killed externally
         try:
             os.kill(pid, signal.SIGCONT)
         except OSError:
             warning("pidfile %s found referring to pid %s which is not running" % (pidfile, pid))
             return None
 
     if check_is_really_bibsched:
         output = run_shell_command("ps p %s -o args=", (str(pid), ))[1]
         if not 'bibsched' in output:
             warning("pidfile %s found referring to pid %s which does not correspond to bibsched: cmdline is %s" % (pidfile, pid, output))
             return None
 
     return pid
 
 
 def start(verbose=True, debug=False):
     """ Fork this process in the background and start processing
     requests. The process PID is stored in a pid file, so that it can
     be stopped later on."""
 
     if verbose:
         sys.stdout.write("starting bibsched: ")
         sys.stdout.flush()
 
     pid = server_pid(ping_the_process=False)
     if pid:
         pid2 = server_pid()
         if pid2:
             error("another instance of bibsched (pid %d) is running" % pid2)
         else:
             warning("%s exist but the corresponding bibsched (pid %s) seems not be running" % (pidfile, pid))
             warning("erasing %s and continuing..." % (pidfile, ))
             os.remove(pidfile)
 
     # start the child process using the "double fork" technique
     pid = os.fork()
     if pid > 0:
         sys.exit(0)
 
     os.setsid()
     os.chdir('/')
 
     pid = os.fork()
 
     if pid > 0:
         if verbose:
             sys.stdout.write('pid %d\n' % pid)
 
         Log("daemon started (pid %d)" % pid)
         open(pidfile, 'w').write('%d' % pid)
         return
 
     sys.stdin.close()
     redirect_stdout_and_stderr()
 
     sched = BibSched(debug=debug)
     try:
         sched.watch_loop()
     finally:
         try:
             os.remove(pidfile)
         except OSError:
             pass
 
 
 def halt(verbose=True, soft=False, debug=False): # pylint: disable=W0613
     pid = server_pid()
     if not pid:
         if soft:
             print >> sys.stderr, 'bibsched seems not to be running.'
             return
         else:
             error('bibsched seems not to be running.')
 
     try:
         os.kill(pid, signal.SIGKILL)
     except OSError:
         print >> sys.stderr, 'no bibsched process found'
 
     Log("daemon stopped (pid %d)" % pid)
 
     if verbose:
         print "stopping bibsched: pid %d" % pid
     os.unlink(pidfile)
 
 
 def monitor(verbose=True, debug=False): # pylint: disable=W0613
     old_stdout, old_stderr = redirect_stdout_and_stderr()
     try:
         Manager(old_stdout)
     finally:
         restore_stdout_and_stderr(old_stdout, old_stderr)
 
 
 def write_message(msg, stream=None, verbose=1): # pylint: disable=W0613
     """Write message and flush output stream (may be sys.stdout or sys.stderr).
     Useful for debugging stuff."""
     if stream is None:
         stream = sys.stdout
     if msg:
         if stream == sys.stdout or stream == sys.stderr:
             stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ",
                 time.localtime()))
             try:
                 stream.write("%s\n" % msg)
             except UnicodeEncodeError:
                 stream.write("%s\n" % msg.encode('ascii', 'backslashreplace'))
             stream.flush()
         else:
             sys.stderr.write("Unknown stream %s.  [must be sys.stdout or sys.stderr]\n" % stream)
 
 
 def report_queue_status(verbose=True, status=None, since=None, tasks=None): # pylint: disable=W0613
     """
     Report about the current status of BibSched queue on standard output.
     """
 
     def report_about_processes(status='RUNNING', since=None, tasks=None):
         """
         Helper function to report about processes with the given status.
         """
         if tasks is None:
             task_query = ''
         else:
             task_query = 'AND proc IN (%s)' % (
                 ','.join([repr(real_escape_string(task)) for task in tasks]))
         if since is None:
             since_query = ''
         else:
             # We're not interested in future task
             if since.startswith('+') or since.startswith('-'):
                 since = since[1:]
             since = '-' + since
             since_query = "AND runtime >= '%s'" % get_datetime(since)
 
         res = run_sql("""SELECT id, proc, user, runtime, sleeptime,
                                 status, progress, priority
                         FROM schTASK WHERE status=%%s %(task_query)s
                         %(since_query)s ORDER BY id ASC""" % {
                             'task_query': task_query,
                             'since_query' : since_query},
                     (status,))
 
         write_message("%s processes: %d" % (status, len(res)))
         for (proc_id, proc_proc, proc_user, proc_runtime, proc_sleeptime,
              proc_status, proc_progress, proc_priority) in res:
             write_message(' * ID="%s" PRIORITY="%s" PROC="%s" USER="%s" '
                           'RUNTIME="%s" SLEEPTIME="%s" STATUS="%s" '
                           'PROGRESS="%s"' % (proc_id,
                           proc_priority, proc_proc, proc_user, proc_runtime,
                           proc_sleeptime, proc_status, proc_progress))
         return
 
     write_message("BibSched queue status report for %s:" % gethostname())
     mode = server_pid() and "AUTOMATIC" or "MANUAL"
     write_message("BibSched queue running mode: %s" % mode)
     if status is None:
         report_about_processes('Running', since, tasks)
         report_about_processes('Waiting', since, tasks)
     else:
         for state in status:
             report_about_processes(state, since, tasks)
     write_message("Done.")
 
 
 def restart(verbose=True, debug=False):
     halt(verbose, soft=True, debug=debug)
     start(verbose, debug=debug)
 
 
 def stop(verbose=True, debug=False):
     """
     * Stop bibsched
     * Send stop signal to all the running tasks
     * wait for all the tasks to stop
     * return
     """
     if verbose:
         print "Stopping BibSched if running"
     halt(verbose, soft=True, debug=debug)
     run_sql("UPDATE schTASK SET status='WAITING' WHERE status='SCHEDULED'")
     res = run_sql("""SELECT id, proc, status FROM schTASK
                      WHERE status NOT LIKE 'DONE'
                      AND status NOT LIKE '%_DELETED'
                      AND (status='RUNNING'
                          OR status='ABOUT TO STOP'
                          OR status='ABOUT TO SLEEP'
                          OR status='SLEEPING'
                          OR status='CONTINUING')""")
     if verbose:
         print "Stopping all running BibTasks"
     for task_id, proc, status in res:
         if status == 'SLEEPING':
             bibsched_send_signal(proc, task_id, signal.SIGCONT)
             time.sleep(CFG_BIBSCHED_REFRESHTIME)
         bibsched_set_status(task_id, 'ABOUT TO STOP')
     while run_sql("""SELECT id FROM schTASK
                      WHERE status NOT LIKE 'DONE'
                      AND status NOT LIKE '%_DELETED'
                      AND (status='RUNNING'
                           OR status='ABOUT TO STOP'
                           OR status='ABOUT TO SLEEP'
                           OR status='SLEEPING'
                           OR status='CONTINUING')"""):
         if verbose:
             sys.stdout.write('.')
             sys.stdout.flush()
             time.sleep(CFG_BIBSCHED_REFRESHTIME)
 
     if verbose:
         print "\nStopped"
     Log("BibSched and all BibTasks stopped")
 
 
 def main():
     from invenio.bibtask import check_running_process_user
     check_running_process_user()
 
     verbose = True
     status = None
     since = None
     tasks = None
     debug = False
 
     try:
         opts, args = getopt.gnu_getopt(sys.argv[1:], "hVdqS:s:t:", [
             "help", "version", "debug", "quiet", "since=", "status=", "task="])
     except getopt.GetoptError, err:
         Log("Error: %s" % err)
         usage(1, err)
 
     for opt, arg in opts:
         if opt in ["-h", "--help"]:
             usage(0)
 
         elif opt in ["-V", "--version"]:
             print __revision__
             sys.exit(0)
 
         elif opt in ['-q', '--quiet']:
             verbose = False
 
         elif opt in ['-s', '--status']:
             status = arg.split(',')
 
         elif opt in ['-S', '--since']:
             since = arg
 
         elif opt in ['-t', '--task']:
             tasks = arg.split(',')
 
         elif opt in ['-d', '--debug']:
             debug = True
 
         else:
             usage(1)
 
     try:
         cmd = args[0]
     except IndexError:
         cmd = 'monitor'
 
     try:
         if cmd in ('status', 'purge'):
             {'status' : report_queue_status,
               'purge' : gc_tasks}[cmd](verbose, status, since, tasks)
         else:
             {'start': start,
             'halt': halt,
             'stop': stop,
             'restart': restart,
             'monitor': monitor}[cmd](verbose=verbose, debug=debug)
     except KeyError:
         usage(1, 'unkown command: %s' % cmd)
 
 
 if __name__ == '__main__':
     main()
diff --git a/modules/miscutil/lib/web_api_key_unit_tests.py b/modules/miscutil/lib/web_api_key_regression_tests.py
similarity index 97%
copy from modules/miscutil/lib/web_api_key_unit_tests.py
copy to modules/miscutil/lib/web_api_key_regression_tests.py
index 5b1944ddb..72fa2b097 100644
--- a/modules/miscutil/lib/web_api_key_unit_tests.py
+++ b/modules/miscutil/lib/web_api_key_regression_tests.py
@@ -1,120 +1,120 @@
 # -*- coding: utf-8 -*-
 ##
 ## This file is part of Invenio.
 ## Copyright (C) 2006, 2007, 2008, 2010, 2011 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 from invenio import web_api_key
 
 """Unit tests for REST like authentication API."""
 
 try:
     import hashlib
 except:
     pass
 import unittest
 import re
 import hmac
 import urllib
 import time
 import string
 
 from invenio.testutils import make_test_suite, run_test_suite
 from invenio.dbquery import run_sql
 
 web_api_key.CFG_WEB_API_KEY_ALLOWED_URL = [('/search\?*', 0, True),
                                         ('/bad\?*', -1, True)] #Just for testing
 
 web_api_key._CFG_WEB_API_KEY_ALLOWED_URL = [(re.compile(_url), _authorized_time, _need_timestamp)
         for _url, _authorized_time, _need_timestamp in web_api_key.CFG_WEB_API_KEY_ALLOWED_URL]
 
 def build_web_request(path, params, api_key=None, secret_key=None):
     items = (hasattr(params, 'items') and [params.items()] or [list(params)])[0]
     if api_key:
         items.append(('apikey', api_key))
     if secret_key:
         items.append(('timestamp', str(int(time.time()))))
         items = sorted(items, key=lambda x: x[0].lower())
         url = '%s?%s' % (path, urllib.urlencode(items))
         signature = hmac.new(secret_key, url, hashlib.sha1).hexdigest()
         items.append(('signature', signature))
     if not items:
         return path
     return '%s?%s' % (path, urllib.urlencode(items))
 
 class APIKeyTest(unittest.TestCase):
     """ Test functions related to the REST authentication API """
     def setUp(self):
         self.id_admin = run_sql('SELECT id FROM user WHERE nickname="admin"')[0][0]
 
     def test_create_remove_show_key(self):
         """apikey - create/list/delete REST key"""
         self.assertEqual(0, len(web_api_key.show_web_api_keys(uid=self.id_admin)))
         web_api_key.create_new_web_api_key(self.id_admin, "Test key I")
         web_api_key.create_new_web_api_key(self.id_admin, "Test key II")
         web_api_key.create_new_web_api_key(self.id_admin, "Test key III")
         web_api_key.create_new_web_api_key(self.id_admin, "Test key IV")
         web_api_key.create_new_web_api_key(self.id_admin, "Test key V")
         self.assertEqual(5, len(web_api_key.show_web_api_keys(uid=self.id_admin)))
         self.assertEqual(5, len(web_api_key.show_web_api_keys(uid=self.id_admin, diff_status='')))
         keys_info = web_api_key.show_web_api_keys(uid=self.id_admin)
         web_api_key.mark_web_api_key_as_removed(keys_info[0][0])
         self.assertEqual(4, len(web_api_key.show_web_api_keys(uid=self.id_admin)))
-        self.assertEqual(5, len(web_api_key.show_web_api_keys(uid=self.id_admin,diff_status='')))
+        self.assertEqual(5, len(web_api_key.show_web_api_keys(uid=self.id_admin, diff_status='')))
 
         run_sql("UPDATE webapikey SET status='WARNING' WHERE id=%s", (keys_info[1][0],))
         run_sql("UPDATE webapikey SET status='REVOKED' WHERE id=%s", (keys_info[2][0],))
 
         self.assertEqual(4, len(web_api_key.show_web_api_keys(uid=self.id_admin)))
         self.assertEqual(5, len(web_api_key.show_web_api_keys(uid=self.id_admin, diff_status='')))
 
         run_sql("DELETE FROM webapikey")
 
     def test_acc_get_uid_from_request(self):
         """webapikey - Login user from request using REST key"""
         path = '/search'
         params = 'ln=es&sc=1&c=Articles & Preprints&action_search=Buscar&p=ellis'
 
         self.assertEqual(0, len(web_api_key.show_web_api_keys(uid=self.id_admin)))
         web_api_key.create_new_web_api_key(self.id_admin, "Test key I")
 
         key_info = run_sql("SELECT id FROM webapikey WHERE id_user=%s", (self.id_admin,))
         url = web_api_key.build_web_request(path, params, api_key=key_info[0][0])
         url = string.split(url, '?')
         uid = web_api_key.acc_get_uid_from_request(url[0], url[1])
         self.assertEqual(uid, self.id_admin)
 
         url = web_api_key.build_web_request(path, params, api_key=key_info[0][0])
         url += "123" # corrupt the key
         url = string.split(url, '?')
         uid = web_api_key.acc_get_uid_from_request(url[0], url[1])
         self.assertEqual(uid, -1)
 
         path = '/bad'
         uid = web_api_key.acc_get_uid_from_request(path, "")
         self.assertEqual(uid, -1)
-        params = { 'nocache': 'yes', 'limit': 123 }
+        params = {'nocache': 'yes', 'limit': 123}
         url = web_api_key.build_web_request(path, params, api_key=key_info[0][0])
         url = string.split(url, '?')
         uid = web_api_key.acc_get_uid_from_request(url[0], url[1])
         self.assertEqual(uid, -1)
 
         run_sql("DELETE FROM webapikey")
 
 TEST_SUITE = make_test_suite(APIKeyTest)
 
 if __name__ == "__main__":
     run_test_suite(TEST_SUITE)
-    run_sql("DELETE FROM webapikey")
\ No newline at end of file
+    run_sql("DELETE FROM webapikey")
diff --git a/modules/miscutil/lib/web_api_key_unit_tests.py b/modules/miscutil/lib/web_api_key_unit_tests.py
index 5b1944ddb..beaf633fb 100644
--- a/modules/miscutil/lib/web_api_key_unit_tests.py
+++ b/modules/miscutil/lib/web_api_key_unit_tests.py
@@ -1,120 +1,32 @@
 # -*- coding: utf-8 -*-
 ##
 ## This file is part of Invenio.
-## Copyright (C) 2006, 2007, 2008, 2010, 2011 CERN.
+## Copyright (C) 2006, 2007, 2008, 2010, 2011, 2013 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
-from invenio import web_api_key
 
 """Unit tests for REST like authentication API."""
 
-try:
-    import hashlib
-except:
-    pass
-import unittest
-import re
-import hmac
-import urllib
-import time
-import string
+# Note: tests moved to regression tests.  Keeping this file here with
+# empty test case set in order to overwrite any previously installed
+# file.  Also, keeping TEST_SUITE empty so that `inveniocfg
+# --run-unit-tests' would not complain.
 
 from invenio.testutils import make_test_suite, run_test_suite
-from invenio.dbquery import run_sql
 
-web_api_key.CFG_WEB_API_KEY_ALLOWED_URL = [('/search\?*', 0, True),
-                                        ('/bad\?*', -1, True)] #Just for testing
-
-web_api_key._CFG_WEB_API_KEY_ALLOWED_URL = [(re.compile(_url), _authorized_time, _need_timestamp)
-        for _url, _authorized_time, _need_timestamp in web_api_key.CFG_WEB_API_KEY_ALLOWED_URL]
-
-def build_web_request(path, params, api_key=None, secret_key=None):
-    items = (hasattr(params, 'items') and [params.items()] or [list(params)])[0]
-    if api_key:
-        items.append(('apikey', api_key))
-    if secret_key:
-        items.append(('timestamp', str(int(time.time()))))
-        items = sorted(items, key=lambda x: x[0].lower())
-        url = '%s?%s' % (path, urllib.urlencode(items))
-        signature = hmac.new(secret_key, url, hashlib.sha1).hexdigest()
-        items.append(('signature', signature))
-    if not items:
-        return path
-    return '%s?%s' % (path, urllib.urlencode(items))
-
-class APIKeyTest(unittest.TestCase):
-    """ Test functions related to the REST authentication API """
-    def setUp(self):
-        self.id_admin = run_sql('SELECT id FROM user WHERE nickname="admin"')[0][0]
-
-    def test_create_remove_show_key(self):
-        """apikey - create/list/delete REST key"""
-        self.assertEqual(0, len(web_api_key.show_web_api_keys(uid=self.id_admin)))
-        web_api_key.create_new_web_api_key(self.id_admin, "Test key I")
-        web_api_key.create_new_web_api_key(self.id_admin, "Test key II")
-        web_api_key.create_new_web_api_key(self.id_admin, "Test key III")
-        web_api_key.create_new_web_api_key(self.id_admin, "Test key IV")
-        web_api_key.create_new_web_api_key(self.id_admin, "Test key V")
-        self.assertEqual(5, len(web_api_key.show_web_api_keys(uid=self.id_admin)))
-        self.assertEqual(5, len(web_api_key.show_web_api_keys(uid=self.id_admin, diff_status='')))
-        keys_info = web_api_key.show_web_api_keys(uid=self.id_admin)
-        web_api_key.mark_web_api_key_as_removed(keys_info[0][0])
-        self.assertEqual(4, len(web_api_key.show_web_api_keys(uid=self.id_admin)))
-        self.assertEqual(5, len(web_api_key.show_web_api_keys(uid=self.id_admin,diff_status='')))
-
-        run_sql("UPDATE webapikey SET status='WARNING' WHERE id=%s", (keys_info[1][0],))
-        run_sql("UPDATE webapikey SET status='REVOKED' WHERE id=%s", (keys_info[2][0],))
-
-        self.assertEqual(4, len(web_api_key.show_web_api_keys(uid=self.id_admin)))
-        self.assertEqual(5, len(web_api_key.show_web_api_keys(uid=self.id_admin, diff_status='')))
-
-        run_sql("DELETE FROM webapikey")
-
-    def test_acc_get_uid_from_request(self):
-        """webapikey - Login user from request using REST key"""
-        path = '/search'
-        params = 'ln=es&sc=1&c=Articles & Preprints&action_search=Buscar&p=ellis'
-
-        self.assertEqual(0, len(web_api_key.show_web_api_keys(uid=self.id_admin)))
-        web_api_key.create_new_web_api_key(self.id_admin, "Test key I")
-
-        key_info = run_sql("SELECT id FROM webapikey WHERE id_user=%s", (self.id_admin,))
-        url = web_api_key.build_web_request(path, params, api_key=key_info[0][0])
-        url = string.split(url, '?')
-        uid = web_api_key.acc_get_uid_from_request(url[0], url[1])
-        self.assertEqual(uid, self.id_admin)
-
-        url = web_api_key.build_web_request(path, params, api_key=key_info[0][0])
-        url += "123" # corrupt the key
-        url = string.split(url, '?')
-        uid = web_api_key.acc_get_uid_from_request(url[0], url[1])
-        self.assertEqual(uid, -1)
-
-        path = '/bad'
-        uid = web_api_key.acc_get_uid_from_request(path, "")
-        self.assertEqual(uid, -1)
-        params = { 'nocache': 'yes', 'limit': 123 }
-        url = web_api_key.build_web_request(path, params, api_key=key_info[0][0])
-        url = string.split(url, '?')
-        uid = web_api_key.acc_get_uid_from_request(url[0], url[1])
-        self.assertEqual(uid, -1)
-
-        run_sql("DELETE FROM webapikey")
-
-TEST_SUITE = make_test_suite(APIKeyTest)
+TEST_SUITE = make_test_suite()
 
 if __name__ == "__main__":
     run_test_suite(TEST_SUITE)
-    run_sql("DELETE FROM webapikey")
\ No newline at end of file
diff --git a/modules/webaccess/lib/external_authentication_robot.py b/modules/webaccess/lib/external_authentication_robot.py
index 2374c3d9f..dbdd2fd4c 100644
--- a/modules/webaccess/lib/external_authentication_robot.py
+++ b/modules/webaccess/lib/external_authentication_robot.py
@@ -1,412 +1,412 @@
 # -*- coding: utf-8 -*-
 ##
 ## This file is part of Invenio.
 ## Copyright (C) 2010, 2011, 2013 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """External user authentication for simple robots
 
 This implement an external authentication system suitable for robots usage.
 User attributes are retrieved directly from the form dictionary of the request
 object.
 """
 
 import os
 import sys
 import hmac
 import time
 import base64
 
 if sys.hexversion < 0x2050000:
     import sha as sha1
 else:
     from hashlib import sha1
 
 from cPickle import dumps
 from zlib import decompress, compress
 
 from invenio.jsonutils import json, json_unicode_to_utf8
 from invenio.shellutils import mymkdir
 from invenio.external_authentication import ExternalAuth, InvenioWebAccessExternalAuthError
 from invenio.config import CFG_ETCDIR, CFG_SITE_URL, CFG_SITE_SECURE_URL
 
 CFG_ROBOT_EMAIL_ATTRIBUTE_NAME = 'email'
 CFG_ROBOT_NICKNAME_ATTRIBUTE_NAME = 'nickname'
 CFG_ROBOT_GROUPS_ATTRIBUTE_NAME = 'groups'
 CFG_ROBOT_TIMEOUT_ATTRIBUTE_NAME = '__timeout__'
 CFG_ROBOT_USERIP_ATTRIBUTE_NAME = '__userip__'
 CFG_ROBOT_GROUPS_SEPARATOR = ';'
 CFG_ROBOT_URL_TIMEOUT = 3600
 
 CFG_ROBOT_KEYS_PATH = os.path.join(CFG_ETCDIR, 'webaccess', 'robot_keys.dat')
 
 def normalize_ip(ip, up_to_bytes=4):
     """
     @param up_to_bytes: set this to the number of bytes that should
     be considered in the normalization. E.g. is this is set two 2, only the
     first two bytes will be considered, while the remaining two will be set
     to 0.
     @return: a normalized IP, e.g. 123.02.12.12 -> 123.2.12.12
     """
     try:
         ret = []
         for i, number in enumerate(ip.split(".")):
             if i < up_to_bytes:
                 ret.append(str(int(number)))
             else:
                 ret.append("0")
         return '.'.join(ret)
     except ValueError:
         ## e.g. if it's IPV6 ::1
         return ip
 
 def load_robot_keys():
     """
     @return: the robot key dictionary.
     """
     from cPickle import loads
     from zlib import decompress
     try:
         robot_keys = loads(decompress(open(CFG_ROBOT_KEYS_PATH).read()))
         if not isinstance(robot_keys, dict):
             return {}
         else:
             return robot_keys
     except:
         return {}
 
 class ExternalAuthRobot(ExternalAuth):
     """
     This class implement an external authentication method suitable to be
     used by an external service that, after having authenticated a user,
     will provide a URL to the user that, once followed, will successfully
     login the user into Invenio, with any detail the external service
     decided to provide to the Invenio installation.
 
     Such URL should be built as follows:
         BASE?QUERY
 
     where BASE is CFG_SITE_SECURE_URL/youraccount/robotlogin
 
     and QUERY is a urlencoded mapping of the following key->values:
       - assertion: an assertion, i.e. a piece of information describing the
         user, see below for more details.
       - robot: the identifier of the external service providing the assertion
       - login_method: the name of the login method as defined in CFG_EXTERNAL_AUTHENTICATION.
       - digest: the digest of the signature as detailed below.
       - referer: the URL where the user should be redirected after successful
         login (it is called referer as, for historical reasons, this is the
         original URL of the page on which, a human-user has clicked "login".
 
     the "assertion" should be a JSON serialized mapping with the following
     keys:
       - email: the email of the user (i.e. its identifier).
       - nickname: optional nickname of the user.
       - groups: an optional ';'-separated list of groups to which the user
         belongs to.
       - __timeout__: the number of seconds (floating point) from the Epoch,
         after which the URL will no longer be valid. (expressed in UTC)
       - __userip__: the IP address of the user for whom this URL has been
         created. (if the user will follow this URL using a different URL the
         request will not be valid)
       - any other key can be added and will be merged in the external user
         settings.
 
     If L{use_zlib} is True the assertion is a base64-url-flavour encoding
     of the zlib compression of the original assertion (useful for shortening
     the URL while make it easy to type).
 
     The "digest" is the hexadecimal representation of the digest using the
     HMAC-SHA1 method to sign the assertion with the secret key associated
     with the robot for the given login_method.
 
     @param enforce_external_nicknames: whether to trust nicknames provided by
         the external service and use them (if possible) as unique identifier
         in the system.
     @type enforce_external_nicknames: boolean
     @param email_attribute_name: the actual key in the assertion that will
         contain the email.
     @type email_attribute_name: string
     @param nickname_attribute_name: the actual key in the assertion that will
         contain the nickname.
     @type nickname_attribute_name: string
     @param groups_attribute_name: the actual key in the assertion that will
         contain the groups.
     @type groups_attribute_name: string
     @param groups_separator: the string used to separate groups.
     @type groups_separator: string
     @param timeout_attribute_name: the actual key in the assertion that will
         contain the timeout.
     @type timeout_attribute_name: string
     @param userip_attribute_name: the actual key in the assertion that will
         contain the user IP.
     @type userip_attribute_name: string
     @param external_id_attribute_name: the actual string that identifies the
         user in the external authentication system. By default this is set
         to be the same as the nickname, but this can be configured.
     @param check_user_ip: whether to check for the IP address of the user
         using the given URL, against the IP address stored in the assertion
         to be identical. If 0, no IP check will be performed, if 1, only the
         1st byte will be compared, if 2, only the first two bytes will be
         compared, if 3, only the first three bytes, and if 4, the whole IP
         address will be checked.
     @type check_user_ip: int
     @param use_zlib: whether to use base64-url-flavour encoding of the zlib
         compression of the json serialization of the assertion or simply
         the json serialization of the assertion.
     @type use_zlib: boolean
     """
     def __init__(self, enforce_external_nicknames=False,
             email_attribute_name=CFG_ROBOT_EMAIL_ATTRIBUTE_NAME,
             nickname_attribute_name=CFG_ROBOT_NICKNAME_ATTRIBUTE_NAME,
             groups_attribute_name=CFG_ROBOT_GROUPS_ATTRIBUTE_NAME,
             groups_separator=CFG_ROBOT_GROUPS_SEPARATOR,
             timeout_attribute_name=CFG_ROBOT_TIMEOUT_ATTRIBUTE_NAME,
             userip_attribute_name=CFG_ROBOT_USERIP_ATTRIBUTE_NAME,
             check_user_ip=4,
             external_id_attribute_name=CFG_ROBOT_NICKNAME_ATTRIBUTE_NAME,
             use_zlib=True,
             ):
         ExternalAuth.__init__(self, enforce_external_nicknames=enforce_external_nicknames)
         self.email_attribute_name = email_attribute_name
         self.nickname_attribute_name = nickname_attribute_name
         self.groups_attribute_name = groups_attribute_name
         self.groups_separator = groups_separator
         self.timeout_attribute_name = timeout_attribute_name
         self.userip_attribute_name = userip_attribute_name
         self.external_id_attribute_name = external_id_attribute_name
         self.check_user_ip = check_user_ip
         self.use_zlib = use_zlib
 
     def __extract_attribute(self, req):
         """
         Load from the request the given assertion, extract all the attribute
         to properly login the user, and verify that the data are actually
         both well formed and signed correctly.
         """
         from invenio.webinterface_handler import wash_urlargd
         args = wash_urlargd(req.form, {
             'assertion': (str, ''),
             'robot': (str, ''),
             'digest': (str, ''),
             'login_method': (str, '')})
         assertion = args['assertion']
         digest = args['digest']
         robot = args['robot']
         login_method = args['login_method']
         shared_key = load_robot_keys().get(login_method, {}).get(robot)
         if shared_key is None:
             raise InvenioWebAccessExternalAuthError("A key does not exist for robot: %s, login_method: %s" % (robot, login_method))
         if not self.verify(shared_key, assertion, digest):
             raise InvenioWebAccessExternalAuthError("The provided assertion does not validate against the digest %s for robot %s" % (repr(digest), repr(robot)))
         if self.use_zlib:
             try:
                 ## Workaround to Perl implementation that does not add
                 ## any padding to the base64 encoding.
                 needed_pad = (4 - len(assertion) % 4) % 4
                 assertion += needed_pad * '='
                 assertion = decompress(base64.urlsafe_b64decode(assertion))
             except:
                 raise InvenioWebAccessExternalAuthError("The provided assertion is corrupted")
         data = json_unicode_to_utf8(json.loads(assertion))
         if not isinstance(data, dict):
             raise InvenioWebAccessExternalAuthError("The provided assertion is invalid")
         timeout = data[self.timeout_attribute_name]
         if timeout < time.time():
             raise InvenioWebAccessExternalAuthError("The provided assertion is expired")
         userip = data.get(self.userip_attribute_name)
         if not self.check_user_ip or (normalize_ip(userip, self.check_user_ip) == normalize_ip(req.remote_ip, self.check_user_ip)):
             return data
         else:
             raise InvenioWebAccessExternalAuthError("The provided assertion has been issued for a different IP address (%s instead of %s)" % (userip, req.remote_ip))
 
     def auth_user(self, username, password, req=None):
         """Authenticate user-supplied USERNAME and PASSWORD.  Return
         None if authentication failed, or the email address of the
         person if the authentication was successful.  In order to do
         this you may perhaps have to keep a translation table between
         usernames and email addresses.
         Raise InvenioWebAccessExternalAuthError in case of external troubles.
         """
         data = self.__extract_attribute(req)
         email = data.get(self.email_attribute_name)
         ext_id = data.get(self.external_id_attribute_name, email)
         if email:
             if isinstance(email, str):
-                return email.strip().lower(), ext_id.strip()
+                return email.strip().lower(), str(ext_id).strip()
             else:
                 raise InvenioWebAccessExternalAuthError("The email provided in the assertion is invalid: %s" % (repr(email)))
         else:
             return None, None
 
     def fetch_user_groups_membership(self, username, password=None, req=None):
         """Given a username and a password, returns a dictionary of groups
         and their description to which the user is subscribed.
         Raise InvenioWebAccessExternalAuthError in case of troubles.
         """
         if self.groups_attribute_name:
             data = self.__extract_attribute(req)
             groups = data.get(self.groups_attribute_name)
             if groups:
                 if isinstance(groups, str):
                     groups = [group.strip() for group in groups.split(self.groups_separator)]
                     return dict(zip(groups, groups))
                 else:
                     raise InvenioWebAccessExternalAuthError("The groups provided in the assertion are invalid: %s" % (repr(groups)))
         return {}
 
     def fetch_user_nickname(self, username, password=None, req=None):
         """Given a username and a password, returns the right nickname belonging
         to that user (username could be an email).
         """
         if self.nickname_attribute_name:
             data = self.__extract_attribute(req)
             nickname = data.get(self.nickname_attribute_name)
             if nickname:
                 if isinstance(nickname, str):
                     return nickname.strip().lower()
                 else:
                     raise InvenioWebAccessExternalAuthError("The nickname provided in the assertion is invalid: %s" % (repr(nickname)))
         return None
 
     def fetch_user_preferences(self, username, password=None, req=None):
         """Given a username and a password, returns a dictionary of keys and
         values, corresponding to external infos and settings.
 
         userprefs = {"telephone": "2392489",
                      "address": "10th Downing Street"}
 
         (WEBUSER WILL erase all prefs that starts by EXTERNAL_ and will
         store: "EXTERNAL_telephone"; all internal preferences can use whatever
         name but starting with EXTERNAL). If a pref begins with HIDDEN_ it will
         be ignored.
         """
         data = self.__extract_attribute(req)
         for key in (self.email_attribute_name, self.groups_attribute_name, self.nickname_attribute_name, self.timeout_attribute_name, self.userip_attribute_name):
             if key and key in data:
                 del data[key]
         return data
 
     def robot_login_method_p():
         """Return True if this method is dedicated to robots and should
         not therefore be available as a choice to regular users upon login.
         """
         return True
     robot_login_method_p = staticmethod(robot_login_method_p)
 
     def sign(secret, assertion):
         """
         @return: a signature of the given assertion.
         @rtype: string
         @note: override this method if you want to change the signature
             algorithm (e.g. to use GPG).
         @see: L{verify}
         """
         return hmac.new(secret, assertion, sha1).hexdigest()
     sign = staticmethod(sign)
 
     def verify(secret, assertion, signature):
         """
         @return: True if the signature is valid
         @rtype: boolean
         @note: override this method if you want to change the signature
             algorithm (e.g. to use GPG)
         @see: L{sign}
         """
         return hmac.new(secret, assertion, sha1).hexdigest() == signature
     verify = staticmethod(verify)
 
     def test_create_example_url(self, email, login_method, robot, ip, assertion=None, timeout=None, referer=None, groups=None, nickname=None):
         """
         Create a test URL to test the robot login.
 
         @param email: email of the user we want to login as.
         @type email: string
         @param login_method: the login_method name as specified in CFG_EXTERNAL_AUTHENTICATION.
         @type login_method: string
         @param robot: the identifier of this robot.
         @type robot: string
         @param assertion: any further data we want to send to.
         @type: json serializable mapping
         @param ip: the IP of the user.
         @type: string
         @param timeout: timeout when the URL will expire (in seconds from the Epoch)
         @type timeout: float
         @param referer: the URL where to land after successful login.
         @type referer: string
         @param groups: the list of optional group of the user.
         @type groups: list of string
         @param nickname: the optional nickname of the user.
         @type nickname: string
         @return: the URL to login as the user.
         @rtype: string
         """
         from invenio.access_control_config import CFG_EXTERNAL_AUTHENTICATION
         from invenio.urlutils import create_url
         if assertion is None:
             assertion = {}
         assertion[self.email_attribute_name] = email
         if nickname:
             assertion[self.nickname_attribute_name] = nickname
         if groups:
             assertion[self.groups_attribute_name] = self.groups_separator.join(groups)
         if timeout is None:
             timeout = time.time() + CFG_ROBOT_URL_TIMEOUT
         assertion[self.timeout_attribute_name] = timeout
         if referer is None:
             referer = CFG_SITE_URL
         if login_method is None:
             for a_login_method, details in CFG_EXTERNAL_AUTHENTICATION.iteritems():
                 if details[2]:
                     login_method = a_login_method
                     break
         robot_keys = load_robot_keys()
         assertion[self.userip_attribute_name] = ip
         assertion = json.dumps(assertion)
         if self.use_zlib:
             assertion = base64.urlsafe_b64encode(compress(assertion))
         shared_key = robot_keys[login_method][robot]
         digest = self.sign(shared_key, assertion)
         return create_url("%s%s" % (CFG_SITE_SECURE_URL, "/youraccount/robotlogin"), {
             'assertion': assertion,
             'robot': robot,
             'login_method': login_method,
             'digest': digest,
             'referer': referer})
 
 def update_robot_key(login_method, robot, key=None):
     """
     Utility to update the robot key store.
     @param login_method: the login_method name as per L{CFG_EXTERNAL_AUTHENTICATION}.
         It should correspond to a robot-enable login method.
     @type: string
     @param robot: the robot identifier
     @type robot: string
     @param key: the secret
     @type key: string
     @note: if the secret is empty the corresponding key will be removed.
     """
     robot_keys = load_robot_keys()
     if key is None and login_method in robot_keys and robot in robot_keys[login_method]:
         del robot_keys[login_method][robot]
         if not robot_keys[login_method]:
             del robot_keys[login_method]
     else:
         if login_method not in robot_keys:
             robot_keys[login_method] = {}
         robot_keys[login_method][robot] = key
     mymkdir(os.path.join(CFG_ETCDIR, 'webaccess'))
     open(CFG_ROBOT_KEYS_PATH, 'w').write(compress(dumps(robot_keys, -1)))
diff --git a/modules/websearch/lib/search_engine_query_parser.py b/modules/websearch/lib/search_engine_query_parser.py
index f983f045f..4cc7bd88c 100644
--- a/modules/websearch/lib/search_engine_query_parser.py
+++ b/modules/websearch/lib/search_engine_query_parser.py
@@ -1,1251 +1,1338 @@
 # -*- coding: utf-8 -*-
 
 ## This file is part of Invenio.
-## Copyright (C) 2008, 2010, 2011, 2012 CERN.
+## Copyright (C) 2008, 2010, 2011, 2012, 2013 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 # pylint: disable=C0301
 
 """Invenio Search Engine query parsers."""
 
 import re
 import string
 from datetime import datetime
 
 try:
     import dateutil
     if not hasattr(dateutil, '__version__') or dateutil.__version__ != '2.0':
         from dateutil import parser as du_parser
         from dateutil.relativedelta import relativedelta as du_delta
+        from dateutil import relativedelta
         GOT_DATEUTIL = True
     else:
         from warnings import warn
         warn("Not using dateutil module because the version %s is not compatible with Python-2.x" % dateutil.__version__)
         GOT_DATEUTIL = False
 except ImportError:
     # Ok, no date parsing is possible, but continue anyway,
     # since this package is only recommended, not mandatory.
     GOT_DATEUTIL = False
 
 from invenio.bibindex_engine_tokenizer import BibIndexFuzzyNameTokenizer as FNT
 from invenio.logicutils import to_cnf
 from invenio.config import CFG_WEBSEARCH_SPIRES_SYNTAX
 
 
 NameScanner = FNT()
 
 
 class InvenioWebSearchMismatchedParensError(Exception):
     """Exception for parse errors caused by mismatched parentheses."""
     def __init__(self, message):
         """Initialization."""
         self.message = message
     def __str__(self):
         """String representation."""
         return repr(self.message)
 
 
 class SearchQueryParenthesisedParser(object):
     """Search query parser that handles arbitrarily-nested parentheses
 
     Parameters:
     * substitution_dict: a dictionary mapping strings to other strings.  By
       default, maps 'and', 'or' and 'not' to '+', '|', and '-'.  Dictionary
       values will be treated as valid operators for output.
 
     A note (valkyrie 25.03.2011):
     Based on looking through the prod search logs, it is evident that users,
     when they are using parentheses to do searches, only run word characters
     up against parens when they intend the parens to be part of the word (e.g.
     U(1)), and when they are using parentheses to combine operators, they put
     a space before and after them.  As of writing, this is the behavior that
     SQPP now expects, in order that it be able to handle such queries as
     e(+)e(-) that contain operators in parentheses that should be interpreted
     as words.
     """
 
     def __init__(self, substitution_dict = {'and': '+', 'or': '|', 'not': '-'}):
         self.substitution_dict = substitution_dict
         self.specials = set(['(', ')', '+', '|', '-', '+ -'])
         self.__tl_idx = 0
         self.__tl_len = 0
 
     # I think my names are both concise and clear
     # pylint: disable=C0103
     def _invenio_to_python_logical(self, q):
         """Translate the + and - in invenio query strings into & and ~."""
         p = q
         p = re.sub('\+ -', '&~', p)
         p = re.sub('\+', '&', p)
         p = re.sub('-', '~', p)
         p = re.sub(' ~', ' & ~', p)
         return p
 
     def _python_logical_to_invenio(self, q):
         """Translate the & and ~ in logical expression strings into + and -."""
         p = q
         p = re.sub('\& ~', '-', p)
         p = re.sub('~', '-', p)
         p = re.sub('\&', '+', p)
         return p
     # pylint: enable=C0103
 
     def parse_query(self, query):
         """Make query into something suitable for search_engine.
 
         This is the main entry point of the class.
 
         Given an expression of the form:
         "expr1 or expr2 (expr3 not (expr4 or expr5))"
         produces annoted list output suitable for consumption by search_engine,
         of the form:
         ['+', 'expr1', '|', 'expr2', '+', 'expr3 - expr4 | expr5']
 
         parse_query() is a wrapper for self.tokenize() and self.parse().
         """
         toklist = self.tokenize(query)
         depth, balanced, dummy_d0_p = self.nesting_depth_and_balance(toklist)
         if not balanced:
             raise SyntaxError("Mismatched parentheses in "+str(toklist))
         toklist, var_subs = self.substitute_variables(toklist)
         if depth > 1:
             toklist = self.tokenize(self.logically_reduce(toklist))
         return self.parse(toklist, var_subs)
 
     def substitute_variables(self, toklist):
         """Given a token list, return a copy of token list in which all free
         variables are bound with boolean variable names of the form 'pN'.
         Additionally, all the substitutable logical operators are exchanged
         for their symbolic form and implicit ands are made explicit
 
         e.g., ((author:'ellis, j' and title:quark) or author:stevens jones)
         becomes:
               ((p0 + p1) | p2 + p3)
         with the substitution table:
         {'p0': "author:'ellis, j'", 'p1': "title:quark",
          'p2': "author:stevens", 'p3': "jones" }
 
         Return value is the substituted token list and a copy of the
         substitution table.
         """
         def labels():
             i = 0
             while True:
                 yield 'p'+str(i)
                 i += 1
 
         def filter_front_ands(toklist):
             """Filter out extra logical connectives and whitespace from the front."""
             while toklist[0] == '+' or toklist[0] == '|' or toklist[0] == '':
                 toklist = toklist[1:]
             return toklist
 
         var_subs = {}
         labeler = labels()
         new_toklist = ['']
         cannot_be_anded = self.specials.difference((')',))
         for token in toklist:
             token = token.lower()
             if token in self.substitution_dict:
                 if token == 'not' and new_toklist[-1] == '+':
                     new_toklist[-1] = '-'
                 else:
                     new_toklist.append(self.substitution_dict[token])
             elif token == '(':
                 if new_toklist[-1] not in self.specials:
                     new_toklist.append('+')
                 new_toklist.append(token)
             elif token not in self.specials:
                 # apparently generators are hard for pylint to figure out
                 # Turns off msg about labeler not having a 'next' method
                 # pylint: disable=E1101
                 label = labeler.next()
                 # pylint: enable=E1101
                 var_subs[label] = token
                 if new_toklist[-1] not in cannot_be_anded:
                     new_toklist.append('+')
                 new_toklist.append(label)
             else:
                 if token == '-' and new_toklist[-1] == '+':
                     new_toklist[-1] = '-'
                 else:
                     new_toklist.append(token)
         return filter_front_ands(new_toklist), var_subs
 
     def nesting_depth_and_balance(self, token_list):
         """Checks that parentheses are balanced and counts how deep they nest"""
         depth = 0
         maxdepth = 0
         depth0_pairs = 0
         good_depth = True
         for i in range(len(token_list)):
             token = token_list[i]
             if token == '(':
                 if depth == 0:
                     depth0_pairs += 1
                 depth += 1
                 if depth > maxdepth:
                     maxdepth += 1
             elif token == ')':
                 depth -= 1
             if depth == -1:        # can only happen with unmatched )
                 good_depth = False # so force depth check to fail
                 depth = 0          # but keep maxdepth in good range
         return maxdepth, depth == 0 and good_depth, depth0_pairs
 
     def logically_reduce(self, token_list):
         """Return token_list in conjunctive normal form as a string.
 
         CNF has the property that there will only ever be one level of
         parenthetical nesting, and all distributable operators (such as
         the not in -(p | q) will be fully distributed (as -p + -q).
         """
 
         maxdepth, dummy_balanced, d0_p = self.nesting_depth_and_balance(token_list)
         s = ' '.join(token_list)
         s = self._invenio_to_python_logical(s)
         last_maxdepth = 0
         while maxdepth != last_maxdepth:             # XXX: sometimes NaryExpr doesn't
             try:                                     # fully flatten Expr; but it usually
                 s = str(to_cnf(s))                   # does in 2 passes FIXME: diagnose
             except SyntaxError:
                 raise SyntaxError(str(s)+" couldn't be converted to a logic expression.")
             last_maxdepth = maxdepth
             maxdepth, dummy_balanced, d0_p = self.nesting_depth_and_balance(self.tokenize(s))
         if d0_p == 1 and s[0] == '(' and s[-1] == ')': # s can come back with extra parens
             s = s[1:-1]
         s = self._python_logical_to_invenio(s)
         return s
 
     def tokenize(self, query):
         """Given a query string, return a list of tokens from that string.
 
         * Isolates meaningful punctuation: ( ) + | -
         * Keeps single- and double-quoted strings together without interpretation.
         * Splits everything else on whitespace.
 
         i.e.:
         "expr1|expr2 (expr3-(expr4 or expr5))"
         becomes:
         ['expr1', '|', 'expr2', '(', 'expr3', '-', '(', 'expr4', 'or', 'expr5', ')', ')']
 
         special case:
         "e(+)e(-)" interprets '+' and '-' as word characters since they are in parens with
         word characters run up against them.
         it becomes:
         ['e(+)e(-)']
         """
         ###
         # Invariants:
         # * Query is never modified
         # * In every loop iteration, querytokens grows to the right
         # * The only return point is at the bottom of the function, and the only
         #   return value is querytokens
         ###
 
         def get_tokens(s):
             """
             Given string s, return a list of s's tokens.
 
             Adds space around special punctuation, then splits on whitespace.
             """
             s = ' '+s
             s = s.replace('->', '####DATE###RANGE##OP#') # XXX: Save '->'
             s = re.sub('(?P<outside>[a-zA-Z0-9_,=:]+)\((?P<inside>[a-zA-Z0-9_,+-/]*)\)',
                        '#####\g<outside>####PAREN###\g<inside>##PAREN#', s) # XXX: Save U(1) and SL(2,Z)
             s = re.sub('####PAREN###(?P<content0>[.0-9/-]*)(?P<plus>[+])(?P<content1>[.0-9/-]*)##PAREN#',
                        '####PAREN###\g<content0>##PLUS##\g<content1>##PAREN#', s)
             s = re.sub('####PAREN###(?P<content0>([.0-9/]|##PLUS##)*)(?P<minus>[-])' +\
                                    '(?P<content1>([.0-9/]|##PLUS##)*)##PAREN#',
                        '####PAREN###\g<content0>##MINUS##\g<content1>##PAREN#', s) # XXX: Save e(+)e(-)
             for char in self.specials:
                 if char == '-':
                     s = s.replace(' -', ' - ')
                     s = s.replace(')-', ') - ')
                     s = s.replace('-(', ' - (')
                 else:
                     s = s.replace(char, ' '+char+' ')
             s = re.sub('##PLUS##', '+', s)
             s = re.sub('##MINUS##', '-', s) # XXX: Restore e(+)e(-)
             s = re.sub('#####(?P<outside>[a-zA-Z0-9_,=:]+)####PAREN###(?P<inside>[a-zA-Z0-9_,+-/]*)##PAREN#',
                        '\g<outside>(\g<inside>)', s) # XXX: Restore U(1) and SL(2,Z)
             s = s.replace('####DATE###RANGE##OP#', '->') # XXX: Restore '->'
             return s.split()
 
         querytokens = []
         current_position = 0
 
         re_quotes_match = re.compile(r'(?![\\])(".*?[^\\]")' + r"|(?![\\])('.*?[^\\]')")
 
         for match in re_quotes_match.finditer(query):
             match_start = match.start()
             quoted_region = match.group(0).strip()
 
             # clean the content after the previous quotes and before current quotes
             unquoted = query[current_position : match_start]
             querytokens.extend(get_tokens(unquoted))
 
             # XXX: In case we end up with e.g. title:, "compton scattering", make it
             # title:"compton scattering"
             if querytokens and querytokens[0] and querytokens[-1][-1] == ':':
                 querytokens[-1] += quoted_region
             # XXX: In case we end up with e.g. "expr1",->,"expr2", make it
             # "expr1"->"expr2"
             elif len(querytokens) >= 2 and querytokens[-1] == '->':
                 arrow = querytokens.pop()
                 querytokens[-1] += arrow + quoted_region
             else:
                 # add our newly tokenized content to the token list
                 querytokens.extend([quoted_region])
 
             # move current position to the end of the tokenized content
             current_position = match.end()
 
         # get tokens from the last appearance of quotes until the query end
         unquoted = query[current_position : len(query)]
         querytokens.extend(get_tokens(unquoted))
 
         return querytokens
 
     def parse(self, token_list, variable_substitution_dict=None):
         """Make token_list consumable by search_engine.
 
         Turns a list of tokens and a variable mapping into a grouped list
         of subexpressions in the format suitable for use by search_engine,
         e.g.:
         ['+', 'searchterm', '-', 'searchterm to exclude', '|', 'another term']
 
         Incidentally, this works recursively so parens can cause arbitrarily
         deep nestings.  But since the search_engine doesn't know about nested
         structures, we need to flatten the input structure first.
         """
         ###
         # Invariants:
         # * Token list is never modified
         # * Balanced parens remain balanced; unbalanced parens are an error
         # * Individual tokens may only be exchanged for items in the variable
         #   substitution dict; otherwise they pass through unmolested
         # * Return value is built up mostly as a stack
         ###
 
         op_symbols = self.substitution_dict.values()
         self.__tl_idx = 0
         self.__tl_len = len(token_list)
 
         def inner_parse(token_list, open_parens=False):
             '''
                 although it's not in the API, it seems sensible to comment
                 this function a bit.
 
                 dist_token here is a token (e.g. a second-order operator)
                 which needs to be distributed across other tokens inside
                 the inner parens
             '''
 
             if open_parens:
                 parsed_values = []
             else:
                 parsed_values = ['+']
 
             i = 0
             while i < len(token_list):
                 token = token_list[i]
                 if i > 0 and parsed_values[-1] not in op_symbols:
                     parsed_values.append('+')
                 if token == '(':
                     # if we need to distribute something over the tokens inside the parens
                     # we will know it because... it will end in a :
                     # that part of the list will be 'px', '+', '('
                     distributing = (len(parsed_values) > 2 and parsed_values[-2].endswith(':') and parsed_values[-1] == '+')
                     if distributing:
                         # we don't need the + if we are distributing
                         parsed_values = parsed_values[:-1]
                     offset = self.__tl_len - len(token_list)
                     inner_value = inner_parse(token_list[i+1:], True)
                     inner_value = ' '.join(inner_value)
                     if distributing:
                         if len(self.tokenize(inner_value)) == 1:
                             parsed_values[-1] = parsed_values[-1] + inner_value
                         elif "'" in inner_value:
                             parsed_values[-1] = parsed_values[-1] + '"' + inner_value + '"'
                         elif '"' in inner_value:
                             parsed_values[-1] = parsed_values[-1] + "'" + inner_value + "'"
                         else:
                             parsed_values[-1] = parsed_values[-1] + '"' + inner_value + '"'
                     else:
                         parsed_values.append(inner_value)
                     self.__tl_idx += 1
                     i = self.__tl_idx - offset
                 elif token == ')':
                     if parsed_values[-1] in op_symbols:
                         parsed_values = parsed_values[:-1]
                     if len(parsed_values) > 1 and parsed_values[0] == '+' and parsed_values[1] in op_symbols:
                         parsed_values = parsed_values[1:]
                     return parsed_values
                 elif token in op_symbols:
                     if len(parsed_values) > 0:
                         parsed_values[-1] = token
                     else:
                         parsed_values = [token]
                 else:
                     if variable_substitution_dict != None and token in variable_substitution_dict:
                         token = variable_substitution_dict[token]
                     parsed_values.append(token)
                 i += 1
                 self.__tl_idx += 1
 
             # If we have an extra start symbol, remove the default one
             if parsed_values[1] in op_symbols:
                 parsed_values = parsed_values[1:]
             return parsed_values
 
         return inner_parse(token_list, False)
 
 
 class SpiresToInvenioSyntaxConverter:
     """Converts queries defined with SPIRES search syntax into queries
     that use Invenio search syntax.
     """
 
     # Constants defining fields
     _DATE_ADDED_FIELD = 'datecreated:'
     _DATE_UPDATED_FIELD = 'datemodified:'
     _DATE_FIELD = 'year:'
 
     _A_TAG = 'author:'
     _EA_TAG = 'exactauthor:'
 
-
     # Dictionary containing the matches between SPIRES keywords
     # and their corresponding Invenio keywords or fields
     # SPIRES keyword : Invenio keyword or field
     _SPIRES_TO_INVENIO_KEYWORDS_MATCHINGS = {
         # address
         'address' : 'address:',
         # affiliation
         'affiliation' : 'affiliation:',
         'affil' : 'affiliation:',
         'aff' : 'affiliation:',
         'af' : 'affiliation:',
         'institution' : 'affiliation:',
         'inst' : 'affiliation:',
         # any field
         'any' : 'anyfield:',
         # author count
         'ac' : 'authorcount:',
         # bulletin
         'bb' : 'reportnumber:',
         'bbn' : 'reportnumber:',
         'bull' : 'reportnumber:',
         'bulletin-bd' : 'reportnumber:',
         'bulletin-bd-no' : 'reportnumber:',
         'eprint' : 'reportnumber:',
         # citation / reference
         'c' : 'reference:',
         'citation' : 'reference:',
         'cited' : 'reference:',
         'jour-vol-page' : 'reference:',
         'jvp' : 'reference:',
         # collaboration
         'collaboration' : 'collaboration:',
         'collab-name' : 'collaboration:',
         'cn' : 'collaboration:',
         # conference number
         'conf-number' : '111__g:',
         'cnum' : '773__w:',
         # country
         'cc' : '044__a:',
         'country' : '044__a:',
         # date
         'date': _DATE_FIELD,
         'd': _DATE_FIELD,
         # date added
         'date-added': _DATE_ADDED_FIELD,
         'dadd': _DATE_ADDED_FIELD,
         'da': _DATE_ADDED_FIELD,
         # date updated
         'date-updated': _DATE_UPDATED_FIELD,
         'dupd': _DATE_UPDATED_FIELD,
         'du': _DATE_UPDATED_FIELD,
         # first author
         'fa' : 'firstauthor:',
         'first-author' : 'firstauthor:',
         # author
         'a' : 'author:',
         'au' : 'author:',
         'author' : 'author:',
         'name' : 'author:',
         # exact author
         # this is not a real keyword match. It is pseudo keyword that
         # will be replaced later with author search
         'ea' : 'exactauthor:',
         'exact-author' : 'exactauthor:',
         # experiment
         'exp' : 'experiment:',
         'experiment' : 'experiment:',
         'expno' : 'experiment:',
         'sd' : 'experiment:',
         'se' : 'experiment:',
         # journal
         'journal' : 'journal:',
         'j' : 'journal:',
         'published_in' : 'journal:',
         'spicite' : 'journal:',
         'vol' : 'journal:',
         # journal page
         'journal-page' : '773__c:',
         'jp' : '773__c:',
         # journal year
         'journal-year' : '773__y:',
         'jy' : '773__y:',
         # key
         'key' : '970__a:',
         'irn' : '970__a:',
         'record' : '970__a:',
         'document' : '970__a:',
         'documents' : '970__a:',
         # keywords
         'k' : 'keyword:',
         'keywords' : 'keyword:',
         'kw' : 'keyword:',
         # note
         'note' : '500__a:',
         # old title
         'old-title' : '246__a:',
         'old-t' : '246__a:',
         'ex-ti' : '246__a:',
         'et' : '246__a:',
         #postal code
         'postalcode' : 'postalcode:',
         'zip' : 'postalcode:',
         'cc' : 'postalcode:',
         # ppf subject
         'ppf-subject' : '650__a:',
         'status' : '650__a:',
         # recid
         'recid' : 'recid:',
         # report number
         'r' : 'reportnumber:',
         'rn' : 'reportnumber:',
         'rept' : 'reportnumber:',
         'report' : 'reportnumber:',
         'report-num' : 'reportnumber:',
         # title
         't' : 'title:',
         'ti' : 'title:',
         'title' : 'title:',
         'with-language' : 'title:',
         # fulltext
         'fulltext' : 'fulltext:',
         'ft' : 'fulltext:',
         # topic
         'topic' : '695__a:',
         'tp' : '695__a:',
         'hep-topic' : '695__a:',
         'desy-keyword' : '695__a:',
         'dk' : '695__a:',
 
         # topcite
         'topcit' : 'cited:',
         'topcite' : 'cited:',
 
         # captions
         'caption' : 'caption:',
         # category
         'arx' : '037__c:',
         'category' : '037__c:',
         # primarch
         'parx' : '037__c:',
         'primarch' : '037__c:',
         # texkey
         'texkey' : '035__z:',
         # type code
         'tc' : 'collection:',
         'ty' : 'collection:',
         'type' : 'collection:',
         'type-code' : 'collection:',
         'scl': 'collection:',
         'ps':  'collection:',
         # field code
         'f' : 'subject:',
         'fc' : 'subject:',
         'field' : 'subject:',
         'field-code' : 'subject:',
         'subject' : 'subject:',
         # coden
         'bc' : 'journal:',
         'browse-only-indx' : 'journal:',
         'coden' : 'journal:',
         'journal-coden' : 'journal:',
 
         # jobs specific codes
         'job' : 'title:',
         'position' : 'title:',
         'region' : 'region:',
         'continent' : 'region:',
         'deadline' : '046__a:',
         'rank' : 'rank:',
 
         # replace all the keywords without match with empty string
         # this will remove the noise from the unknown keywrds in the search
         # and will in all fields for the words following the keywords
 
         # energy
         'e' : '',
         'energy' : '',
         'energyrange-code' : '',
         # exact experiment number
         'ee' : '',
         'exact-exp' : '',
         'exact-expno' : '',
         # hidden note
         'hidden-note' : '',
         'hn' : '',
         # ppf
         'ppf' : '',
         'ppflist' : '',
         # slac topics
         'ppfa' : '',
         'slac-topics' : '',
         'special-topics' : '',
         'stp' : '',
         # test index
         'test' : '',
         'testindex' : '',
     }
 
     _SECOND_ORDER_KEYWORD_MATCHINGS = {
         'rawref' : 'rawref:',
         'refersto' : 'refersto:',
         'refs': 'refersto:',
         'citedby' : 'citedby:'
     }
 
     _INVENIO_KEYWORDS_FOR_SPIRES_PHRASE_SEARCHES = [
         'affiliation:',
         #'cited:', # topcite is technically a phrase index - this isn't necessary
         '773__y:', # journal-year
         '773__c:', # journal-page
         '773__w:', # cnum
         '044__a:', # country code
         'subject:', # field code
         'collection:', # type code
         '035__z:', # texkey
         # also exact expno, corp-auth, url, abstract, doi, mycite, citing
         # but we have no invenio equivalents for these ATM
     ]
 
     def __init__(self):
         """Initialize the state of the converter"""
         self._months = {}
         self._month_name_to_month_number = {}
         self._init_months()
         self._compile_regular_expressions()
 
     def _compile_regular_expressions(self):
         """Compiles some of the regular expressions that are used in the class
         for higher performance."""
 
         # regular expression that matches the contents in single and double quotes
         # taking in mind if they are escaped.
         self._re_quotes_match = re.compile(r'(?![\\])(".*?[^\\]")' + r"|(?![\\])('.*?[^\\]')")
 
         # match cases where a keyword distributes across a conjunction
         self._re_distribute_keywords = re.compile(r'''(?ix)     # verbose, ignorecase on
                   \b(?P<keyword>\S*:)            # a keyword is anything that's not whitespace with a colon
                   (?P<content>[^:]+?)\s*         # content is the part that comes after the keyword; it should NOT
                                                  # have colons in it!  that implies that we might be distributing
                                                  # a keyword OVER another keyword.  see ticket #701
                   (?P<combination>\ and\ not\ |\ and\ |\ or\ |\ not\ )\s*
                   (?P<last_content>[^:]*?)       # oh look, content without a keyword!
                   (?=\ and\ |\ or\ |\ not\ |$)''')
 
         # massaging SPIRES quirks
         self._re_pattern_IRN_search = re.compile(r'970__a:(?P<irn>\d+)')
         self._re_topcite_match = re.compile(r'(?P<x>cited:\d+)\+')
 
         # regular expression that matches author patterns
         # and author patterns with second-order-ops on top
         # does not match names with " or ' around them, since
         # those should not be touched
         self._re_author_match = re.compile(r'''(?ix)    # verbose, ignorecase
             \b((?P<secondorderop>[^\s]+:)?)     # do we have a second-order-op on top?
             ((?P<first>first)?)author:(?P<name>
                         [^\'\"]     # first character not a quotemark
                         [^()]*?     # some stuff that isn't parentheses (that is dealt with in pp)
                         [^\'\"])    # last character not a quotemark
             (?=\ and\ not\ |\ and\ |\ or\ |\ not\ |$)''')
 
         # regular expression that matches exact author patterns
         # the group defined in this regular expression is used in method
         # _convert_spires_exact_author_search_to_invenio_author_search(...)
         # in case of changes correct also the code in this method
         self._re_exact_author_match = re.compile(r'\b((?P<secondorderop>[^\s]+:)?)exactauthor:(?P<author_name>[^\'\"].*?[^\'\"]\b)(?= and not | and | or | not |$)', re.IGNORECASE)
 
         # match a second-order operator with no operator following it
         self._re_second_order_op_no_index_match = re.compile(r'''(?ix) # ignorecase, verbose
                 (^|\b|:)(?P<second_order_op>(refersto|citedby):)
                     (?P<search_terms>[^\"\'][^:]+?)       # anything without an index should be absorbed here
                 \s*
                 (?P<conjunction_or_next_keyword>(\ and\ |\ not\ |\ or\ |\ \w+:\w+|$))
             ''')
 
         # match search term, its content (words that are searched) and
         # the operator preceding the term.
         self._re_search_term_pattern_match = re.compile(r'\b(?P<combine_operator>find|and|or|not)\s+(?P<search_term>\S+:)(?P<search_content>.+?)(?= and not | and | or | not |$)', re.IGNORECASE)
 
         # match journal searches
         self._re_search_term_is_journal = re.compile(r'''(?ix)  # verbose, ignorecase
                 \b(?P<leading>(find|and|or|not)\s+journal:) # first combining operator and index
                 (?P<search_content>.+?)     # what we are searching
                 (?=\ and\ not\ |\ and\ |\ or\ |\ not\ |$)''')
 
         # regular expression matching date after pattern
         self._re_date_after_match = re.compile(r'\b(?P<searchop>d|date|dupd|dadd|da|date-added|du|date-updated)\b\s*(after|>)\s*(?P<search_content>.+?)(?= and not | and | or | not |$)', re.IGNORECASE)
 
         # regular expression matching date after pattern
         self._re_date_before_match = re.compile(r'\b(?P<searchop>d|date|dupd|dadd|da|date-added|du|date-updated)\b\s*(before|<)\s*(?P<search_content>.+?)(?= and not | and | or | not |$)', re.IGNORECASE)
 
         # match date searches which have been keyword-substituted
         self._re_keysubbed_date_expr = re.compile(r'\b(?P<term>(' + self._DATE_ADDED_FIELD + ')|(' + self._DATE_UPDATED_FIELD + ')|(' + self._DATE_FIELD + '))(?P<content>.+?)(?= and not | and | or | not |$)', re.IGNORECASE)
 
         # for finding (and changing) a variety of different SPIRES search keywords
         self._re_spires_find_keyword = re.compile('^(f|fin|find)\s+', re.IGNORECASE)
 
         # for finding boolean expressions
         self._re_boolean_expression = re.compile(r' and | or | not | and not ')
 
         # patterns for subbing out spaces within quotes temporarily
         self._re_pattern_single_quotes = re.compile("'(.*?)'")
         self._re_pattern_double_quotes = re.compile("\"(.*?)\"")
         self._re_pattern_regexp_quotes = re.compile("\/(.*?)\/")
         self._re_pattern_space = re.compile("__SPACE__")
         self._re_pattern_equals = re.compile("__EQUALS__")
 
+        # for date math:
+        self._re_datemath = re.compile(r'(?P<datestamp>.+)\s+(?P<operator>[-+])\s+(?P<units>\d+)')
+
+
     def is_applicable(self, query):
         """Is this converter applicable to this query?
 
         Return true if query begins with find, fin, or f, or if it contains
         a SPIRES-specific keyword (a, t, etc.), or if it contains the invenio
         author: field search. """
         if not CFG_WEBSEARCH_SPIRES_SYNTAX:
             #SPIRES syntax is switched off
             return False
         query = query.lower()
         if self._re_spires_find_keyword.match(query):
             #leading 'find' is present and SPIRES syntax is switched on
             return True
         if CFG_WEBSEARCH_SPIRES_SYNTAX > 1:
             for word in query.split(' '):
                 if self._SPIRES_TO_INVENIO_KEYWORDS_MATCHINGS.has_key(word):
                     return True
         return False
 
     def convert_query(self, query):
         """Convert SPIRES syntax queries to Invenio syntax.
 
         Do nothing to queries not in SPIRES syntax."""
 
         # SPIRES syntax allows searches with 'find' or 'fin'.
         if self.is_applicable(query):
             query = re.sub(self._re_spires_find_keyword, 'find ', query)
             if not query.startswith('find'):
                 query = 'find ' + query
 
             # a holdover from SPIRES syntax is e.g. date = 2000 rather than just date 2000
             query = self._remove_extraneous_equals_signs(query)
 
             # these calls are before keywords replacement because when keywords
             # are replaced, date keyword is replaced by specific field search
             # and the DATE keyword is not match in DATE BEFORE or DATE AFTER
             query = self._convert_spires_date_before_to_invenio_span_query(query)
             query = self._convert_spires_date_after_to_invenio_span_query(query)
 
             # call to _replace_spires_keywords_with_invenio_keywords should be at the
             # beginning because the next methods use the result of the replacement
             query = self._standardize_already_invenio_keywords(query)
             query = self._replace_spires_keywords_with_invenio_keywords(query)
             query = self._normalise_journal_page_format(query)
             query = self._distribute_keywords_across_combinations(query)
             query = self._distribute_and_quote_second_order_ops(query)
 
             query = self._convert_dates(query)
             query = self._convert_irns_to_spires_irns(query)
             query = self._convert_topcite_to_cited(query)
             query = self._convert_spires_author_search_to_invenio_author_search(query)
             query = self._convert_spires_exact_author_search_to_invenio_author_search(query)
             query = self._convert_spires_truncation_to_invenio_truncation(query)
             query = self._expand_search_patterns(query)
 
             # remove FIND in the beginning of the query as it is not necessary in Invenio
             query = query[4:]
             query = query.strip()
 
         return query
 
     def _init_months(self):
         """Defines a dictionary matching the name
         of the month with its corresponding number"""
 
         # this dictionary is used when generating match patterns for months
         self._months = {'jan':'01', 'january':'01',
                          'feb':'02', 'february':'02',
                          'mar':'03', 'march':'03',
                          'apr':'04', 'april':'04',
                          'may':'05', 'may':'05',
                          'jun':'06', 'june':'06',
                          'jul':'07', 'july':'07',
                          'aug':'08', 'august':'08',
                          'sep':'09', 'september':'09',
                          'oct':'10', 'october':'10',
                          'nov':'11', 'november':'11',
                          'dec':'12', 'december':'12'}
         # this dictionary is used to transform name of the month
         # to a number used in the date format. By this reason it
         # contains also the numbers itself to simplify the conversion
         self._month_name_to_month_number = {'1':'01', '01':'01',
                                             '2':'02', '02':'02',
                                             '3':'03', '03':'03',
                                             '4':'04', '04':'04',
                                             '5':'05', '05':'05',
                                             '6':'06', '06':'06',
                                             '7':'07', '07':'07',
                                             '8':'08', '08':'08',
                                             '9':'09', '09':'09',
                                             '10':'10',
                                             '11':'11',
                                             '12':'12',}
         # combine it with months in order to cover all the cases
         self._month_name_to_month_number.update(self._months)
 
     def _get_month_names_match(self):
         """Retruns part of a patter that matches month in a date"""
 
         months_match = ''
         for month_name in self._months.keys():
             months_match = months_match + month_name + '|'
 
         months_match = r'\b(' + months_match[0:-1] + r')\b'
 
         return months_match
 
     def _convert_dates(self, query):
         """Tries to find dates in query and make them look like ISO-8601."""
 
+        def parse_relative_unit(date_str):
+            units = 0
+            datemath = self._re_datemath.match(date_str)
+            if datemath:
+                date_str = datemath.group('datestamp')
+                units = int(datemath.group('operator') + datemath.group('units'))
+            return date_str, units
+
+        def guess_best_year(d):
+            if d.year > datetime.today().year + 10:
+                return d - du_delta(years=100)
+            else:
+                return d
+
+        def parse_date_unit(date_str):
+            begin = date_str
+            end = None
+
+            # First split, relative time directive
+            # e.g. "2012-01-01 - 3" to ("2012-01-01", -3)
+            date_str, relative_units = parse_relative_unit(date_str)
+
+            try:
+                d = datetime.strptime(date_str, '%Y-%m-%d')
+                d += du_delta(days=relative_units)
+                return datetime.strftime(d, '%Y-%m-%d'), end
+            except ValueError:
+                pass
+
+            try:
+                d = datetime.strptime(date_str, '%y-%m-%d')
+                d += du_delta(days=relative_units)
+                d = guess_best_year(d)
+                return datetime.strftime(d, '%Y-%m-%d'), end
+            except ValueError:
+                pass
+
+            try:
+                d = datetime.strptime(date_str, '%Y-%m')
+                d += du_delta(months=relative_units)
+                return datetime.strftime(d, '%Y-%m'), end
+            except ValueError:
+                pass
+
+            try:
+                d = datetime.strptime(date_str, '%Y')
+                d += du_delta(years=relative_units)
+                return datetime.strftime(d, '%Y'), end
+            except ValueError:
+                pass
+
+            try:
+                d = datetime.strptime(date_str, '%y')
+                d += du_delta(days=relative_units)
+                d = guess_best_year(d)
+                return datetime.strftime(d, '%Y'), end
+            except ValueError:
+                pass
+
+            try:
+                d = datetime.strptime(date_str, '%b %y')
+                d = guess_best_year(d)
+                return datetime.strftime(d, '%Y-%m'), end
+            except ValueError:
+                pass
+
+            if 'this week' in date_str:
+                # Past monday to today
+                # This week is iffy, not sure if we should
+                # start with sunday or monday
+                begin = datetime.today()
+                begin += du_delta(weekday=relativedelta.SU(-1))
+                end = datetime.today()
+                begin = datetime.strftime(begin, '%Y-%m-%d')
+                end = datetime.strftime(end, '%Y-%m-%d')
+            elif 'last week' in date_str:
+                # Past monday to today
+                # Same problem as last week
+                begin = datetime.today()
+                begin += du_delta(weekday=relativedelta.SU(-2))
+                end = datetime.today()
+                end += du_delta(weekday=relativedelta.SA(-1))
+                begin = datetime.strftime(begin, '%Y-%m-%d')
+                end = datetime.strftime(end, '%Y-%m-%d')
+            elif 'this month' in date_str:
+                d = datetime.today()
+                begin = datetime.strftime(d, '%Y-%m')
+            elif 'last month' in date_str:
+                d = datetime.today() - du_delta(months=1)
+                begin = datetime.strftime(d, '%Y-%m')
+            elif 'yesterday' in date_str:
+                d = datetime.today() - du_delta(days=1)
+                begin = datetime.strftime(d, '%Y-%m-%d')
+            elif 'today' in date_str:
+                start = datetime.today()
+                start += du_delta(days=relative_units)
+                begin = datetime.strftime(start, '%Y-%m-%d')
+            elif date_str.strip() == '0':
+                begin = '0'
+            else:
+                default = datetime(datetime.today().year, 1, 1)
+                try:
+                    d = du_parser.parse(date_str, default=default)
+                except ValueError:
+                    begin = date_str
+                else:
+                    begin = datetime.strftime(d, '%Y-%m-%d')
+
+            return begin, end
+
         def mangle_with_dateutils(query):
-            DEFAULT = datetime(datetime.today().year, 1, 1)
             result = ''
             position = 0
             for match in self._re_keysubbed_date_expr.finditer(query):
                 result += query[position : match.start()]
+                datestamp = match.group('content')
+                if '->' in datestamp:
+                    begin_unit, end_unit = datestamp.split('->', 1)
+                    begin, dummy = parse_date_unit(begin_unit)
+                    end, dummy = parse_date_unit(end_unit)
+                else:
+                    begin, end = parse_date_unit(datestamp)
+
+                if end:
+                    daterange = '%s->%s' % (begin, end)
+                else:
+                    daterange = begin
 
-                isodates = []
-                dates = match.group('content').split('->') # Warning: generalizing but should only ever be 2 items
-                for datestamp in dates:
-                    if datestamp != None:
-                        if re.match('[0-9]{1,4}$', datestamp):
-                            isodates.append(datestamp)
-                        else:
-                            units = 0
-                            datestamp = re.sub('yesterday', datetime.strftime(datetime.today()
-                                                            +du_delta(days=-1), '%Y-%m-%d'),
-                                               datestamp)
-                            datestamp = re.sub('today', datetime.strftime(datetime.today(), '%Y-%m-%d'), datestamp)
-                            datestamp = re.sub('this week', datetime.strftime(datetime.today()
-                                                            +du_delta(days=-(datetime.today().isoweekday()%7)), '%Y-%m-%d'),
-                                                datestamp)
-                            datestamp = re.sub('last week', datetime.strftime(datetime.today()
-                                                            +du_delta(days=-((datetime.today().isoweekday()%7)+7)), '%Y-%m-%d'),
-                                                datestamp)
-                            datestamp = re.sub('this month', datetime.strftime(datetime.today(), '%Y-%m'),
-                                                datestamp)
-                            datestamp = re.sub('last month', datetime.strftime(datetime.today()
-                                                            +du_delta(months=-1), '%Y-%m'),
-                                               datestamp)
-                            datemath = re.match(r'(?P<datestamp>.+)\s+(?P<operator>[-+])\s+(?P<units>\d+)', datestamp)
-                            if datemath:
-                                datestamp = datemath.group('datestamp')
-                                units += int(datemath.group('operator') + datemath.group('units'))
-                            try:
-                                dtobj = du_parser.parse(datestamp, default=DEFAULT)
-                                dtobj = dtobj + du_delta(days=units)
-                                if dtobj.day == 1:
-                                    isodates.append("%d-%02d" % (dtobj.year, dtobj.month))
-                                else:
-                                    isodates.append("%d-%02d-%02d" % (dtobj.year, dtobj.month, dtobj.day))
-                            except ValueError:
-                                isodates.append(datestamp)
-
-                daterange = '->'.join(isodates)
                 result += match.group('term') + daterange
                 position = match.end()
             result += query[position : ]
             return result
 
         if GOT_DATEUTIL:
             query = mangle_with_dateutils(query)
         # else do nothing with the dates
         return query
 
     def _convert_irns_to_spires_irns(self, query):
         """Prefix IRN numbers with SPIRES- so they match the INSPIRE format."""
         def create_replacement_pattern(match):
             """method used for replacement with regular expression"""
             return '970__a:SPIRES-' + match.group('irn')
         query = self._re_pattern_IRN_search.sub(create_replacement_pattern, query)
         return query
 
     def _convert_topcite_to_cited(self, query):
         """Replace SPIRES topcite x+ with cited:x->999999999"""
         def create_replacement_pattern(match):
             """method used for replacement with regular expression"""
             return match.group('x') + '->999999999'
         query = self._re_topcite_match.sub(create_replacement_pattern, query)
         return query
 
     def _convert_spires_date_after_to_invenio_span_query(self, query):
         """Converts date after SPIRES search term into invenio span query"""
 
         def create_replacement_pattern(match):
             """method used for replacement with regular expression"""
             return match.group('searchop') + ' ' + match.group('search_content') + '->9999'
 
         query = self._re_date_after_match.sub(create_replacement_pattern, query)
 
         return query
 
     def _convert_spires_date_before_to_invenio_span_query(self, query):
         """Converts date before SPIRES search term into invenio span query"""
 
         # method used for replacement with regular expression
         def create_replacement_pattern(match):
             return match.group('searchop') + ' ' + '0->' + match.group('search_content')
 
         query = self._re_date_before_match.sub(create_replacement_pattern, query)
 
         return query
 
     def _expand_search_patterns(self, query):
         """Expands search queries.
 
         If a search term is followed by several words e.g.
         author:ellis or title:THESE THREE WORDS it is expanded to
         author:ellis or (title:THESE and title:THREE...)
 
         All keywords are thus expanded.  XXX: this may lead to surprising
         results for any later parsing stages if we're not careful.
         """
 
         def create_replacements(term, content):
             result = ''
             content = content.strip()
 
 
             # replace spaces within quotes by __SPACE__ temporarily:
             content = self._re_pattern_single_quotes.sub(lambda x: "'"+string.replace(x.group(1), ' ', '__SPACE__')+"'", content)
             content = self._re_pattern_double_quotes.sub(lambda x: "\""+string.replace(x.group(1), ' ', '__SPACE__')+"\"", content)
             content = self._re_pattern_regexp_quotes.sub(lambda x: "/"+string.replace(x.group(1), ' ', '__SPACE__')+"/", content)
 
             if term in self._INVENIO_KEYWORDS_FOR_SPIRES_PHRASE_SEARCHES \
                     and not self._re_boolean_expression.search(content) and ' ' in content:
                 # the case of things which should be searched as phrases
                 result = term + '"' + content + '"'
 
             else:
                 words = content.split()
                 if len(words) == 0:
                     # this should almost never happen, req user to say 'find a junk:'
                     result = term
                 elif len(words) == 1:
                     # this is more common but still occasional
                     result = term + words[0]
                 else:
                     # general case
                     result = '(' + term + words[0]
                     for word in words[1:]:
                         result += ' and ' + term + word
                     result += ')'
 
             # replace back __SPACE__ by spaces:
             result = self._re_pattern_space.sub(" ", result)
             return result.strip()
 
         result = ''
         current_position = 0
         for match in self._re_search_term_pattern_match.finditer(query):
             result += query[current_position : match.start()]
             result += ' ' + match.group('combine_operator') + ' '
             result += create_replacements(match.group('search_term'), match.group('search_content'))
             current_position = match.end()
         result += query[current_position : len(query)]
         return result.strip()
 
     def _remove_extraneous_equals_signs(self, query):
         """In SPIRES, both date = 2000 and date 2000 are acceptable. Get rid of the ="""
         query = self._re_pattern_single_quotes.sub(lambda x: "'"+string.replace(x.group(1), '=', '__EQUALS__')+"'", query)
         query = self._re_pattern_double_quotes.sub(lambda x: "\""+string.replace(x.group(1), '=', '__EQUALS__')+'\"', query)
         query = self._re_pattern_regexp_quotes.sub(lambda x: "/"+string.replace(x.group(1), '=', '__EQUALS__')+"/", query)
 
         query = query.replace('=', '')
 
         query = self._re_pattern_equals.sub("=", query)
 
         return query
 
     def _convert_spires_truncation_to_invenio_truncation(self, query):
         """Replace SPIRES truncation symbol # with invenio trancation symbol *"""
         return query.replace('#', '*')
 
     def _convert_spires_exact_author_search_to_invenio_author_search(self, query):
         """Converts SPIRES search patterns for exact author into search pattern
         for invenio"""
 
         # method used for replacement with regular expression
         def create_replacement_pattern(match):
             # the regular expression where this group name is defined is in
             # the method _compile_regular_expressions()
             return self._EA_TAG + '"' + match.group('author_name') + '"'
 
         query = self._re_exact_author_match.sub(create_replacement_pattern, query)
 
         return query
 
     def _convert_spires_author_search_to_invenio_author_search(self, query):
         """Converts SPIRES search patterns for authors to search patterns in invenio
         that give similar results to the spires search.
         """
 
         # result of the replacement
         result = ''
         current_position = 0
         for match in self._re_author_match.finditer(query):
             result += query[current_position : match.start() ]
             if match.group('secondorderop'):
                 result += match.group('secondorderop')
             scanned_name = NameScanner.scan(match.group('name'))
             author_atoms = self._create_author_search_pattern_from_fuzzy_name_dict(scanned_name)
             if match.group('first'):
                 author_atoms = author_atoms.replace('author:', 'firstauthor:')
             if author_atoms.find(' ') == -1:
                 result += author_atoms + ' '
             else:
                 result += '(' + author_atoms + ') '
             current_position = match.end()
         result += query[current_position : len(query)]
         return result
 
     def _create_author_search_pattern_from_fuzzy_name_dict(self, fuzzy_name):
         """Creates an invenio search pattern for an author from a fuzzy name dict"""
 
         author_name = ''
         author_middle_name = ''
         author_surname = ''
         full_search = ''
         if len(fuzzy_name['nonlastnames']) > 0:
             author_name = fuzzy_name['nonlastnames'][0]
         if len(fuzzy_name['nonlastnames']) == 2:
             author_middle_name = fuzzy_name['nonlastnames'][1]
         if len(fuzzy_name['nonlastnames']) > 2:
             author_middle_name = ' '.join(fuzzy_name['nonlastnames'][1:])
         if fuzzy_name['raw']:
             full_search = fuzzy_name['raw']
         author_surname = ' '.join(fuzzy_name['lastnames'])
 
         NAME_IS_INITIAL = (len(author_name) == 1)
         NAME_IS_NOT_INITIAL = not NAME_IS_INITIAL
 
         # we expect to have at least surname
         if author_surname == '' or author_surname == None:
             return ''
 
         # ellis ---> "author:ellis"
         #if author_name == '' or author_name == None:
         if not author_name:
             return self._A_TAG + author_surname
 
         # ellis, j ---> "ellis, j*"
         if NAME_IS_INITIAL and not author_middle_name:
             return self._A_TAG + '"' + author_surname + ', ' + author_name + '*"'
 
         # if there is middle name we expect to have also name and surname
         # ellis, j. r. ---> ellis, j* r*
         # j r ellis ---> ellis, j* r*
         # ellis, john r. ---> ellis, j* r* or ellis, j. r. or ellis, jo. r.
         # ellis, john r. ---> author:ellis, j* r* or exactauthor:ellis, j r or exactauthor:ellis jo r
         if author_middle_name:
             search_pattern = self._A_TAG + '"' + author_surname + ', ' + author_name + '*' + ' ' + author_middle_name.replace(" ","* ") + '*"'
             if NAME_IS_NOT_INITIAL:
                 for i in range(1, len(author_name)):
                     search_pattern += ' or ' + self._EA_TAG + "\"%s, %s %s\"" % (author_surname, author_name[0:i], author_middle_name)
             return search_pattern
 
         # ellis, jacqueline ---> "ellis, jacqueline" or "ellis, j.*" or "ellis, j" or "ellis, ja.*" or "ellis, ja" or "ellis, jacqueline *, ellis, j *"
         # in case we don't use SPIRES data, the ending dot is ommited.
         search_pattern = self._A_TAG + '"' + author_surname + ', ' + author_name + '*"'
         search_pattern += " or " + self._EA_TAG + "\"%s, %s *\"" % (author_surname, author_name[0])
         if NAME_IS_NOT_INITIAL:
             for i in range(1,len(author_name)):
                 search_pattern += ' or ' + self._EA_TAG + "\"%s, %s\"" % (author_surname, author_name[0:i])
 
         search_pattern += ' or %s"%s, *"' % (self._A_TAG, full_search)
 
         return search_pattern
 
     def _normalise_journal_page_format(self, query):
         """Phys.Lett, 0903, 024 -> Phys.Lett,0903,024"""
 
         def _is_triple(search):
             return (len(re.findall('\s+', search)) + len(re.findall(':', search))) == 2
 
         def _normalise_spaces_and_colons_to_commas_in_triple(search):
             if not _is_triple(search):
                 return search
             search = re.sub(',\s+', ',', search)
             search = re.sub('\s+', ',', search)
             search = re.sub(':', ',', search)
             return search
 
         result = ""
         current_position = 0
         for match in self._re_search_term_is_journal.finditer(query):
             result += query[current_position : match.start()]
             result += match.group('leading')
             search = match.group('search_content')
             search = _normalise_spaces_and_colons_to_commas_in_triple(search)
             result += search
             current_position = match.end()
         result += query[current_position : ]
         return result
 
     def _standardize_already_invenio_keywords(self, query):
         """Replaces invenio keywords kw with "and kw" in order to
            parse them correctly further down the line."""
 
         unique_invenio_keywords = set(self._SPIRES_TO_INVENIO_KEYWORDS_MATCHINGS.values()) |\
                                   set(self._SECOND_ORDER_KEYWORD_MATCHINGS.values())
         unique_invenio_keywords.remove('') # for the ones that don't have invenio equivalents
 
         for invenio_keyword in unique_invenio_keywords:
             query = re.sub("(?<!... \+|... -| and |. or | not |....:)"+invenio_keyword, "and "+invenio_keyword, query)
             query = re.sub("\+"+invenio_keyword, "and "+invenio_keyword, query)
             query = re.sub("-"+invenio_keyword, "and not "+invenio_keyword, query)
 
         return query
 
     def _replace_spires_keywords_with_invenio_keywords(self, query):
         """Replaces SPIRES keywords that have directly
         corresponding Invenio keywords
 
         Replacements are done only in content that is not in quotes."""
 
         # result of the replacement
         result = ""
         current_position = 0
 
         for match in self._re_quotes_match.finditer(query):
             # clean the content after the previous quotes and before current quotes
             cleanable_content = query[current_position : match.start()]
             cleanable_content = self._replace_all_spires_keywords_in_string(cleanable_content)
 
             # get the content in the quotes (group one matches double
             # quotes, group 2 singles)
             if match.group(1):
                 quoted_content = match.group(1)
             elif match.group(2):
                 quoted_content = match.group(2)
 
             # append the processed content to the result
             result = result + cleanable_content + quoted_content
 
             # move current position at the end of the processed content
             current_position = match.end()
 
         # clean the content from the last appearance of quotes till the end of the query
         cleanable_content = query[current_position : len(query)]
         cleanable_content = self._replace_all_spires_keywords_in_string(cleanable_content)
         result = result + cleanable_content
 
         return result
 
     def _replace_all_spires_keywords_in_string(self, query):
         """Replaces all SPIRES keywords in the string with their
         corresponding Invenio keywords"""
 
         for spires_keyword, invenio_keyword in self._SPIRES_TO_INVENIO_KEYWORDS_MATCHINGS.iteritems():
             query = self._replace_keyword(query, spires_keyword, invenio_keyword)
         for spires_keyword, invenio_keyword in self._SECOND_ORDER_KEYWORD_MATCHINGS.iteritems():
             query = self._replace_second_order_keyword(query, spires_keyword, invenio_keyword)
 
         return query
 
     def _replace_keyword(self, query, old_keyword, new_keyword):
         """Replaces old keyword in the query with a new keyword"""
 
         regex_string = r'(?P<operator>(^find|\band|\bor|\bnot|\brefersto|\bcitedby|^)\b[:\s\(]*)' + \
                        old_keyword + r'(?P<end>[\s\(]+|$)'
         regular_expression = re.compile(regex_string, re.IGNORECASE)
         result = regular_expression.sub(r'\g<operator>' + new_keyword + r'\g<end>', query)
         result = re.sub(':\s+', ':', result)
         return result
 
     def _replace_second_order_keyword(self, query, old_keyword, new_keyword):
         """Replaces old second-order keyword in the query with a new keyword"""
 
         regular_expression =\
                 re.compile(r'''(?ix)  # verbose, ignorecase
                             (?P<operator>
                                  (^find|\band|\bor|\bnot|\brefersto|\bcitedby|^)\b  # operator preceding our operator
                                  [:\s\(]*   # trailing colon, spaces, parens, etc. for that operator
                             )
                              %s  # the keyword we're searching for
                             (?P<endorop>
                                  \s*[a-z]+:|  # either an operator (like author:)
                                  [\s\(]+|     # or a paren opening
                                  $            # or the end of the string
                             )''' % old_keyword)
         result = regular_expression.sub(r'\g<operator>' + new_keyword + r'\g<endorop>', query)
         result = re.sub(':\s+', ':', result)
 
         return result
 
     def _distribute_keywords_across_combinations(self, query):
         """author:ellis and james -> author:ellis and author:james"""
         # method used for replacement with regular expression
 
         def create_replacement_pattern(match):
             return match.group('keyword') + match.group('content') + \
                    match.group('combination') + match.group('keyword') + \
                    match.group('last_content')
 
         still_matches = True
 
         while still_matches:
             query = self._re_distribute_keywords.sub(create_replacement_pattern, query)
             still_matches = self._re_distribute_keywords.search(query)
         query = re.sub(r'\s+', ' ', query)
         return query
 
     def _distribute_and_quote_second_order_ops(self, query):
         """refersto:s parke -> refersto:\"s parke\""""
         def create_replacement_pattern(match):
             return match.group('second_order_op') + '"' +\
                         match.group('search_terms') + '"' +\
                    match.group('conjunction_or_next_keyword')
 
         for match in self._re_second_order_op_no_index_match.finditer(query):
             query = self._re_second_order_op_no_index_match.sub(create_replacement_pattern, query)
         query = re.sub(r'\s+', ' ', query)
         return query
diff --git a/modules/websearch/lib/search_engine_query_parser_unit_tests.py b/modules/websearch/lib/search_engine_query_parser_unit_tests.py
index 83198bfc7..b4dda4404 100644
--- a/modules/websearch/lib/search_engine_query_parser_unit_tests.py
+++ b/modules/websearch/lib/search_engine_query_parser_unit_tests.py
@@ -1,1019 +1,1070 @@
 # -*- coding: utf-8 -*-
 ##
 ## This file is part of Invenio.
-## Copyright (C) 2008, 2010, 2011, 2012 CERN.
+## Copyright (C) 2008, 2010, 2011, 2012, 2013 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """Unit tests for the search engine query parsers."""
 
 
 import unittest
 import datetime
 
 from invenio import search_engine_query_parser
 
 from invenio.testutils import make_test_suite, run_test_suite
 from invenio.search_engine import create_basic_search_units, perform_request_search
 from invenio.config import CFG_WEBSEARCH_SPIRES_SYNTAX
 
 if search_engine_query_parser.GOT_DATEUTIL:
     import dateutil
+    from dateutil.relativedelta import relativedelta as du_delta
     DATEUTIL_AVAILABLE = True
 else:
     DATEUTIL_AVAILABLE = False
 
 
 class TestParserUtilityFunctions(unittest.TestCase):
     """Test utility functions for the parsing components"""
 
     def setUp(self):
         self.parser = search_engine_query_parser.SearchQueryParenthesisedParser()
         self.converter = search_engine_query_parser.SpiresToInvenioSyntaxConverter()
 
     def test_ndb_simple(self):
         """SQPP.test_nesting_depth_and_balance: ['p0']"""
         self.assertEqual((0, True, 0),
                          self.parser.nesting_depth_and_balance(['p0']))
 
     def test_ndb_simple_useful(self):
         """SQPP.test_nesting_depth_and_balance: ['(', 'p0', ')']"""
         self.assertEqual((1, True, 1),
                          self.parser.nesting_depth_and_balance(['(', 'p0', ')']))
 
     def test_ndb_slightly_complicated(self):
         """SQPP.test_nesting_depth_and_balance: ['(', 'p0', ')', '|', '(', 'p2', '+', 'p3', ')']"""
         self.assertEqual((1, True, 2),
                          self.parser.nesting_depth_and_balance(['(', 'p0', ')', '|', '(', 'p2', '+', 'p3', ')']))
 
     def test_ndb_sorta_hairy(self):
         """SQPP.test_nesting_depth_and_balance: ['(', '(', ')', ')', '(', '(', '(', ')', ')', ')']"""
         self.assertEqual((3, True, 2),
                          self.parser.nesting_depth_and_balance(['(', '(', ')', ')', '(', '(', '(', ')', ')', ')']))
 
     def test_ndb_broken_rhs(self):
         """SQPP.test_nesting_depth_and_balance: ['(', '(', ')', ')', '(', '(', '(', ')', ')', ]"""
         self.assertEqual((3, False, 2),
                          self.parser.nesting_depth_and_balance(['(', '(', ')', ')', '(', '(', '(', ')', ')', ]))
 
     def test_ndb_broken_lhs(self):
         """SQPP.test_nesting_depth_and_balance: ['(', ')', ')', '(', '(', '(', ')', ')', ')']"""
         self.assertEqual((3, False, 2),
                          self.parser.nesting_depth_and_balance(['(', ')', ')', '(', '(', '(', ')', ')', ]))
 
     def test_stisc(self):
         """Test whole convert/parse stack: SQPP.parse_query(STISC.convert_query('find a richter, burton and t quark'))"""
         self.assertEqual(self.parser.parse_query(self.converter.convert_query('find a richter, burton and t quark')),
                          ['+',
                           'author:"richter, burton*" | exactauthor:"richter, b *" | exactauthor:"richter, b" | exactauthor:"richter, bu" | exactauthor:"richter, bur" | exactauthor:"richter, burt" | exactauthor:"richter, burto" | author:"richter, burton, *"',
                           '+', 'title:quark'])
 
     def test_stisc_not_vs_and_not1(self):
         """Parse stack parses "find a ellis, j and not a enqvist" == "find a ellis, j not a enqvist" """
         self.assertEqual(self.parser.parse_query(self.converter.convert_query('find a ellis, j and not a enqvist')),
                          self.parser.parse_query(self.converter.convert_query('find a ellis, j not a enqvist')))
 
     def test_stisc_not_vs_and_not2(self):
         """Parse stack parses "find a mangano, m and not a ellis, j" == "find a mangano, m not a ellis, j" """
         self.assertEqual(self.parser.parse_query(self.converter.convert_query('find a mangano, m and not a ellis, j')),
                          self.parser.parse_query(self.converter.convert_query('find a mangano, m not a ellis, j')))
 
 
 class TestSearchQueryParenthesisedParser(unittest.TestCase):
     """Test parenthesis parsing."""
 
     def setUp(self):
         self.parser = search_engine_query_parser.SearchQueryParenthesisedParser()
 
     def test_sqpp_atom(self):
         """SearchQueryParenthesisedParser - expr1"""
         self.assertEqual(self.parser.parse_query('expr1'),
                          ['+', 'expr1'])
 
     def test_sqpp_parened_atom(self):
         """SearchQueryParenthesisedParser - (expr1)"""
         self.assertEqual(self.parser.parse_query('(expr1)'),
                          ['+', 'expr1'])
 
     def test_sqpp_expr1_minus_expr2(self):
         """SearchQueryParenthesisedParser - expr1 - (expr2)"""
         self.assertEqual(self.parser.parse_query("expr1 - (expr2)"),
                          ['+', 'expr1', '-', 'expr2'])
 
     def test_sqpp_plus_expr1_minus_paren_expr2(self):
         """SearchQueryParenthesisedParser - + expr1 - (expr2)"""
         self.assertEqual(self.parser.parse_query("+ expr1 - (expr2)"),
                          ['+', 'expr1', '-', 'expr2'])
 
     def test_sqpp_expr1_paren_expr2(self):
         """SearchQueryParenthesisedParser - expr1 (expr2)"""
         self.assertEqual(self.parser.parse_query("expr1 (expr2)"),
                          ['+', 'expr1', '+', 'expr2'])
 
     def test_sqpp_paren_expr1_minus_expr2(self):
         """SearchQueryParenthesisedParser - (expr1) - expr2"""
         self.assertEqual(self.parser.parse_query("(expr1) - expr2"),
                          ['+', 'expr1', '-', 'expr2'])
 
     def test_sqpp_paren_expr1_minus_paren_expr2(self):
         """SearchQueryParenthesisedParser - (expr1)-(expr2)"""
         self.assertEqual(self.parser.parse_query("(expr1)-(expr2)"),
                          ['+', 'expr1', '-', 'expr2'])
 
     def test_sqpp_minus_paren_expr1_minus_paren_expr2(self):
         """SearchQueryParenthesisedParser - -(expr1)-(expr2)"""
         self.assertEqual(self.parser.parse_query("-(expr1)-(expr2)"),
                          ['-', 'expr1', '-', 'expr2'])
 
     def test_sqpp_paren_expr1_minus_expr2_and_paren_expr3(self):
         """SearchQueryParenthesisedParser - (expr1) - expr2 + (expr3)"""
         self.assertEqual(self.parser.parse_query('(expr1) - expr2 + (expr3)'),
                          ['+', 'expr1', '-', 'expr2', '+', 'expr3'])
 
     def test_sqpp_paren_expr1_minus_expr2_and_paren_expr3_or_expr4(self):
         """SearchQueryParenthesisedParser - (expr1) - expr2 + (expr3) | expr4"""
         self.assertEqual(self.parser.parse_query('(expr1) - expr2 + (expr3) | expr4'),
                          ['+', 'expr1', '-', 'expr2', '+', 'expr3', '|', 'expr4'])
                          #['+', '+ expr1 | expr4', '+', '- expr2 | expr4', '+', '+ expr3 | expr4'])
 
     def test_sqpp_paren_expr1_minus_expr2_and_paren_expr3_or_expr4_or_quoted_expr5_and_expr6(self):
         """SearchQueryParenthesisedParser - (expr1) - expr2 + (expr3) | expr4 | \"expr5 + expr6\""""
         self.assertEqual(self.parser.parse_query('(expr1) - expr2 + (expr3 | expr4) | "expr5 + expr6"'),
                          ['+', 'expr1', '-', 'expr2', '+', 'expr3 | expr4', '|', '"expr5 + expr6"']),
                          #['+', '+ expr1 | "expr5 + expr6"', '+', '- expr2 | "expr5 + expr6"',
                          # '+', '+ expr3 | expr4 | "expr5 + expr6"'])
 
     def test_sqpp_quoted_expr1_and_paren_expr2_and_expr3(self):
         """SearchQueryParenthesisedParser - \"expr1\" (expr2) expr3"""
         self.assertEqual(self.parser.parse_query('"expr1" (expr2) expr3'),
                          ['+', '"expr1"', '+', 'expr2', '+', 'expr3'])
 
     def test_sqpp_quoted_expr1_arrow_quoted_expr2(self):
         """SearchQueryParenthesisedParser = \"expr1\"->\"expr2\""""
         self.assertEqual(self.parser.parse_query('"expr1"->"expr2"'),
                          ['+', '"expr1"->"expr2"'])
 
     def test_sqpp_paren_expr1_expr2_paren_expr3_or_expr4(self):
         """SearchQueryParenthesisedParser - (expr1) expr2 (expr3) | expr4"""
         # test parsing of queries with missing operators.
         # in this case default operator + should be included on place of the missing one
         self.assertEqual(self.parser.parse_query('(expr1) expr2 (expr3) | expr4'),
                           ['+', 'expr1', '+', 'expr2', '+', 'expr3', '|', 'expr4'])
                          #['+', '+ expr1 | expr4', '+', '+ expr2 | expr4', '+', '+ expr3 | expr4'])
 
     def test_sqpp_nested_paren_success(self):
         """SearchQueryParenthesizedParser - Arbitrarily nested parentheses: ((expr1)) + (expr2 - expr3)"""
         self.assertEqual(self.parser.parse_query('((expr1)) + (expr2 - expr3)'),
                          ['+', 'expr1', '+', 'expr2', '-', 'expr3'])
                          #['+', 'expr1', '+', 'expr2', '-', 'expr3'])
 
     def test_sqpp_nested_paren_really_nested(self):
         """SearchQueryParenthesisedParser - Nested parentheses where order matters: expr1 - (expr2 - (expr3 | expr4))"""
         self.assertEqual(self.parser.parse_query('expr1 - (expr2 - (expr3 | expr4))'),
                          ['+', 'expr1', '+', '- expr2 | expr3 | expr4'])
 
     def test_sqpp_paren_open_only_failure(self):
         """SearchQueryParenthesizedParser - Parentheses that only open should raise an exception"""
         self.failUnlessRaises(SyntaxError,
                               self.parser.parse_query,"(expr")
 
     def test_sqpp_paren_close_only_failure(self):
         """SearchQueryParenthesizedParser - Parentheses that only close should raise an exception"""
         self.failUnlessRaises(SyntaxError,
                               self.parser.parse_query,"expr)")
 
     def test_sqpp_paren_expr1_not_expr2_and_paren_expr3_or_expr4_WORDS(self):
         """SearchQueryParenthesisedParser - (expr1) not expr2 and (expr3) or expr4"""
         self.assertEqual(self.parser.parse_query('(expr1) not expr2 and (expr3) or expr4'),
                          ['+', 'expr1', '-', 'expr2', '+', 'expr3', '|', 'expr4'])
                          #['+', '+ expr1 | expr4', '+', '- expr2 | expr4', '+', '+ expr3 | expr4'])
 
     def test_sqpp_paren_expr1_not_expr2_or_quoted_string_not_expr3_or_expr4WORDS(self):
         """SearchQueryParenthesisedParser - (expr1) not expr2 | "expressions not in and quotes | (are) not - parsed " - (expr3) or expr4"""
         self.assertEqual(self.parser.parse_query('(expr1) not expr2 | "expressions not in and quotes | (are) not - parsed " - (expr3) or expr4'),
                          ['+', 'expr1', '-', 'expr2', '|', '"expressions not in and quotes | (are) not - parsed "', '-', 'expr3', '|', 'expr4'])
                          #['+', '+ "expressions not in and quotes | (are) not - parsed " | expr1 | expr4',
                          # '+', '- expr3 | expr1 | expr4',
                          # '+', '+ "expressions not in and quotes | (are) not - parsed " - expr2 | expr4',
                          # '+', '- expr3 - expr2 | expr4'])
 
     def test_sqpp_expr1_escaped_quoted_expr2_and_paren_expr3_not_expr4_WORDS(self):
         """SearchQueryParenthesisedParser - expr1 \\" expr2 foo(expr3) not expr4 \\" and (expr5)"""
         self.assertEqual(self.parser.parse_query('expr1 \\" expr2 foo(expr3) not expr4 \\" and (expr5)'),
                          ['+', 'expr1', '+', '\\"', '+', 'expr2', '+', 'foo(expr3)', '-', 'expr4', '+', '\\"', '+', 'expr5'])
 
     def test_sqpp_paren_expr1_and_expr2_or_expr3_WORDS(self):
         """SearchQueryParenthesisedParser - (expr1 and expr2) or expr3"""
         self.assertEqual(self.parser.parse_query('(expr1 and expr2) or expr3'),
                                      ['+', 'expr1 + expr2', '|', 'expr3'])
                          #['+', '+ expr1 | expr3', '+', '+ expr2 | expr3'])
 
     def test_sqpp_paren_expr1_and_expr2_or_expr3_WORDS_equiv(self):
         """SearchQueryParenthesisedParser - (expr1 and expr2) or expr3 == (expr1 + expr2) | expr3"""
         self.assertEqual(self.parser.parse_query('(expr1 and expr2) or expr3'),
                          self.parser.parse_query('(expr1 + expr2) | expr3'))
 
     def test_sqpp_paren_expr1_and_expr2_or_expr3_WORDS_equiv_SYMBOLS(self):
         """SearchQueryParenthesisedParser - (expr1 and expr2) or expr3 == (expr1 + expr2) or expr3"""
         self.assertEqual(self.parser.parse_query('(expr1 and expr2) or expr3'),
                          self.parser.parse_query('(expr1 + expr2) or expr3'))
 
     def test_sqpp_double_quotes(self):
         """SearchQueryParenthesisedParser - Test double quotes"""
         self.assertEqual(self.parser.parse_query(
                            '(expr1) - expr2 | "expressions - in + quotes | (are) not - parsed " - (expr3) | expr4'),
                           ['+', 'expr1', '-', 'expr2', '|', '"expressions - in + quotes | (are) not - parsed "', '-', 'expr3', '|', 'expr4'])
                          #['+', '+ "expressions - in + quotes | (are) not - parsed " | expr1 | expr4',
                          # '+', '- expr3 | expr1 | expr4',
                          # '+', '+ "expressions - in + quotes | (are) not - parsed " - expr2 | expr4',
                          # '+', '- expr3 - expr2 | expr4'])
 
     def test_sqpp_single_quotes(self):
         """SearchQueryParenthesisedParser - Test single quotes"""
         self.assertEqual(self.parser.parse_query("(expr1) - expr2 | 'expressions - in + quotes | (are) not - parsed ' - (expr3) | expr4"),
                          ['+', 'expr1', '-', 'expr2', '|', "'expressions - in + quotes | (are) not - parsed '", '-', 'expr3', '|', 'expr4'])
                          #['+', '+ \'expressions - in + quotes | (are) not - parsed \' | expr1 | expr4',
                          # '+', '- expr3 | expr1 | expr4',
                          # '+', '+ \'expressions - in + quotes | (are) not - parsed \' - expr2 | expr4',
                          # '+', '- expr3 - expr2 | expr4'])
 
     def test_sqpp_escape_single_quotes(self):
         """SearchQueryParenthesisedParser - Test escaping single quotes"""
         self.assertEqual(self.parser.parse_query("expr1 \\' expr2 +(expr3) -expr4 \\' + (expr5)"),
                          ['+', 'expr1', '+', "\\'", '+', 'expr2', '+', 'expr3', '-', 'expr4', '+', "\\'", '+', 'expr5'])
 
     def test_sqpp_escape_double_quotes(self):
         """SearchQueryParenthesisedParser - Test escaping double quotes"""
         self.assertEqual(self.parser.parse_query('expr1 \\" expr2 +(expr3) -expr4 \\" + (expr5)'),
                          ['+', 'expr1', '+', '\\"', '+', 'expr2', '+', 'expr3', '-', 'expr4', '+', '\\"', '+', 'expr5'])
 
     def test_sqpp_beginning_double_quotes(self):
         """SearchQueryParenthesisedParser - Test parsing double quotes at beginning"""
         self.assertEqual(self.parser.parse_query('"expr1" - (expr2)'),
                          ['+', '"expr1"', '-', 'expr2'])
 
     def test_sqpp_beginning_double_quotes_negated(self):
         """SearchQueryParenthesisedParser - Test parsing negated double quotes at beginning"""
         self.assertEqual(self.parser.parse_query('-"expr1" - (expr2)'),
                          ['-', '"expr1"', '-', 'expr2'])
 
     def test_sqpp_long_or_chain(self):
         """SearchQueryParenthesisedParser - Test long or chains being parsed flat"""
         self.assertEqual(self.parser.parse_query('p0 or p1 or p2 or p3 or p4'),
                          ['+', 'p0', '|', 'p1', '|', 'p2', '|', 'p3', '|', 'p4'])
 
     def test_sqpp_not_after_recursion(self):
         """SearchQueryParenthesisedParser - Test operations after recursive calls"""
         self.assertEqual(self.parser.parse_query('(p0 or p1) not p2'),
                          ['+', 'p0 | p1', '-', 'p2'])
                          #['+', '+ p0 | p1', '-', 'p2'])
 
     def test_sqpp_oddly_capped_operators(self):
         """SearchQueryParenthesisedParser - Test conjunctions in any case"""
         self.assertEqual(self.parser.parse_query('foo oR bar'),
                          ['+', 'foo', '|', 'bar'])
 
     def test_space_before_last_paren(self):
         """SearchQueryParenthesisedParser - Test (ellis )"""
         self.assertEqual(self.parser.parse_query('(ellis )'),
                          ['+', 'ellis'])
 
     def test_sqpp_nested_U1_or_SL2(self):
         """SearchQueryParenthesisedParser - Test (U(1) or SL(2,Z))"""
         self.assertEqual(self.parser.parse_query('(U(1) or SL(2,Z))'),
                          ['+', 'u(1) | sl(2,z)'])
 
     def test_sqpp_alternation_of_quote_marks_double(self):
         """SearchQueryParenthesisedParser - Test refersto:(author:"s parke" or author:ellis)"""
         self.assertEqual(self.parser.parse_query('refersto:(author:"s parke" or author:ellis)'),
                          ['+', 'refersto:\'author:"s parke" | author:ellis\''])
 
     def test_sqpp_alternation_of_quote_marks_single(self):
         """SearchQueryParenthesisedParser - Test refersto:(author:'s parke' or author:ellis)"""
         self.assertEqual(self.parser.parse_query('refersto:(author:\'s parke\' or author:ellis)'),
                          ['+', 'refersto:"author:\'s parke\' | author:ellis"'])
 
     def test_sqpp_alternation_of_quote_marks(self):
         """SearchQueryParenthesisedParser - Test refersto:(author:"s parke")"""
         self.assertEqual(self.parser.parse_query('refersto:(author:"s parke")'),
                          ['+', 'refersto:author:"s parke"'])
 
     def test_sqpp_distributed_ands_equivalent(self):
         """SearchQueryParenthesisedParser - ellis and (kaluza-klein or r-parity) == ellis and (r-parity or kaluza-klein)"""
         self.assertEqual(sorted(perform_request_search(p='ellis and (kaluza-klein or r-parity)')),
                          sorted(perform_request_search(p='ellis and (r-parity or kaluza-klein)')))
 
     def test_sqpp_e_plus_e_minus(self):
         """SearchQueryParenthesisedParser - e(+)e(-)"""
         self.assertEqual(self.parser.parse_query('e(+)e(-)'), ['+', 'e(+)e(-)'])
 
     def test_sqpp_fe_2_plus(self):
         """SearchQueryParenthesisedParser - Fe(2+)"""
         self.assertEqual(self.parser.parse_query('Fe(2+)'), ['+', 'fe(2+)'])
 
     def test_sqpp_giant_evil_title_string(self):
         """SearchQueryParenthesisedParser - Measurements of CP-conserving trilinear gauge boson couplings WWV (V gamma, Z) in e(+)e(-) collisions at LEP2"""
         self.assertEqual(self.parser.parse_query('Measurements of CP-conserving trilinear gauge boson couplings WWV (V gamma, Z) in e(+)e(-) collisions at LEP2'),
                          ['+', 'measurements', '+', 'of', '+', 'cp-conserving', '+', 'trilinear', '+', 'gauge', \
                           '+', 'boson', '+', 'couplings', '+', 'wwv', '+', 'v + gamma, + z', \
                           '+', 'in', '+', 'e(+)e(-)', '+', 'collisions', '+', 'at', '+', 'lep2'])
 
     def test_sqpp_second_order_operator_operates_on_parentheses(self):
         """SearchQueryParenthesisedParser - refersto:(author:ellis or author:hawking)"""
         self.assertEqual(self.parser.parse_query('refersto:(author:ellis or author:hawking)'),
                          ['+', 'refersto:"author:ellis | author:hawking"'])
 
 class TestSpiresToInvenioSyntaxConverter(unittest.TestCase):
     """Test SPIRES query parsing and translation to Invenio syntax."""
 
     def _compare_searches(self, invenio_syntax, spires_syntax):
         """Determine if two queries parse to the same search command.
 
         For comparison of actual search results (regression testing), see the
         tests in the Inspire module.
         """
         parser = search_engine_query_parser.SearchQueryParenthesisedParser()
         converter = search_engine_query_parser.SpiresToInvenioSyntaxConverter()
 
         parsed_query = parser.parse_query(converter.convert_query(spires_syntax))
         #parse_query removes any parens that convert_query added, but then
         #we have to rejoin the list it returns and create basic searches
 
         result_obtained = create_basic_search_units(
             None,
             ' '.join(parsed_query).replace('+ ',''),
             '',
             None
             )
 
         # incase the desired result has parens
         parsed_wanted = parser.parse_query(invenio_syntax)
         result_wanted = create_basic_search_units(
             None,
             ' '.join(parsed_wanted).replace('+ ',''),
             '',
             None)
 
         assert result_obtained == result_wanted, \
                                   """SPIRES parsed as %s instead of %s""" % \
                                   (repr(result_obtained), repr(result_wanted))
         return
 
     if CFG_WEBSEARCH_SPIRES_SYNTAX > 0:
         def test_operators(self):
             """SPIRES search syntax - find a ellis and t shapes"""
             invenio_search = "author:ellis and title:shapes"
             spires_search = "find a ellis and t shapes"
             self._compare_searches(invenio_search, spires_search)
 
         def test_nots(self):
             """SPIRES search syntax - find a ellis and not t hadronic and not t collisions"""
             invenio_search = "author:ellis and not title:hadronic and not title:collisions"
             spires_search = "find a ellis and not t hadronic and not t collisions"
             self._compare_searches(invenio_search, spires_search)
 
         def test_author_simplest(self):
             """SPIRES search syntax - find a ellis"""
             invenio_search = 'author:ellis'
             spires_search = 'find a ellis'
             self._compare_searches(invenio_search, spires_search)
 
         def test_author_simple(self):
             """SPIRES search syntax - find a ellis, j"""
             invenio_search = 'author:"ellis, j*"'
             spires_search = 'find a ellis, j'
             self._compare_searches(invenio_search, spires_search)
 
         def test_exactauthor_simple(self):
             """SPIRES search syntax - find ea ellis, j"""
             invenio_search = 'exactauthor:"ellis, j"'
             spires_search = 'find ea ellis, j'
             self._compare_searches(invenio_search, spires_search)
 
         def test_author_reverse(self):
             """SPIRES search syntax - find a j ellis"""
             invenio_search = 'author:"ellis, j*"'
             spires_search = 'find a j ellis'
             self._compare_searches(invenio_search, spires_search)
 
         def test_author_initials(self):
             """SPIRES search syntax - find a a m polyakov"""
             inv_search = 'author:"polyakov, a* m*"'
             spi_search = 'find a a m polyakov'
             self._compare_searches(inv_search, spi_search)
 
         def test_author_many_initials(self):
             """SPIRES search syntax - find a p d q bach"""
             inv_search = 'author:"bach, p* d* q*"'
             spi_search = 'find a p d q bach'
             self._compare_searches(inv_search, spi_search)
 
         def test_author_many_lastnames(self):
             """SPIRES search syntax - find a alvarez gaume, j r r"""
             inv_search = 'author:"alvarez gaume, j* r* r*"'
             spi_search = 'find a alvarez gaume, j r r'
             self._compare_searches(inv_search, spi_search)
 
         def test_author_full_initial(self):
             """SPIRES search syntax - find a klebanov, ig.r."""
             inv_search = 'author:"klebanov, ig* r*" or exactauthor:"klebanov, i r"'
             spi_search = "find a klebanov, ig.r."
             self._compare_searches(inv_search, spi_search)
 
         def test_author_full_first(self):
             """SPIRES search syntax - find a ellis, john"""
             invenio_search = 'author:"ellis, john*" or exactauthor:"ellis, j *" or exactauthor:"ellis, j" or exactauthor:"ellis, jo" or exactauthor:"ellis, joh" or author:"ellis, john, *"'
             spires_search = 'find a ellis, john'
             self._compare_searches(invenio_search, spires_search)
 
         def test_combine_multiple(self):
             """SPIRES search syntax - find a gattringer, c and k symmetry chiral and not title chiral"""
             inv_search = 'author:"gattringer, c*" keyword:chiral  keyword:symmetry -title:chiral'
             spi_search = "find a c gattringer and k chiral symmetry and not title chiral"
             self._compare_searches(inv_search, spi_search)
 
         def test_combine_multiple_or(self):
             """SPIRES search syntax - find a j ellis and (t report or k \"cross section\")"""
             inv_search = 'author:"ellis, j*" and (title:report or keyword:"cross section")'
             spi_search = 'find a j ellis and (t report or k "cross section")'
             self._compare_searches(inv_search, spi_search)
 
         def test_find_first_author(self):
             """SPIRES search syntax - find fa ellis"""
             inv_search = 'firstauthor:ellis'
             spi_search = 'find fa ellis'
             self._compare_searches(inv_search, spi_search)
 
         def test_find_first_author_initial(self):
             """SPIRES search syntax - find fa j ellis"""
             inv_search = 'firstauthor:"ellis, j*"'
             spi_search = 'find fa j ellis'
             self._compare_searches(inv_search, spi_search)
 
         def test_first_author_full_initial(self):
             """SPIRES search syntax - find fa klebanov, ig.r."""
             inv_search = 'firstauthor:"klebanov, ig* r*" or exactfirstauthor:"klebanov, i r"'
             spi_search = "find fa klebanov, ig.r."
             self._compare_searches(inv_search, spi_search)
 
         def test_citedby_author(self):
             """SPIRES search syntax - find citedby author doggy"""
             inv_search = 'citedby:author:doggy'
             spi_search = 'find citedby author doggy'
             self._compare_searches(inv_search, spi_search)
 
         def test_refersto_author(self):
             """SPIRES search syntax - find refersto author kitty"""
             inv_search = 'refersto:author:kitty'
             spi_search = 'find refersto author kitty'
             self._compare_searches(inv_search, spi_search)
 
         def test_refersto_author_multi_name(self):
             """SPIRES search syntax - find a ellis and refersto author \"parke, sj\""""
             inv_search = 'author:ellis refersto:author:"parke, s. j."'
             spi_search = 'find a ellis and refersto author "parke, s. j."'
             self._compare_searches(inv_search, spi_search)
 
         def test_refersto_author_multi_name_no_quotes(self):
             """SPIRES search syntax - find a ellis and refersto author parke, sj"""
             inv_search = 'author:ellis refersto:(author:"parke, sj*"  or exactauthor:"parke, s *"  or exactauthor:"parke, s" or author:"parke, sj, *")'
             spi_search = "find a ellis and refersto author parke, sj"
             self._compare_searches(inv_search, spi_search)
 
         def test_refersto_multi_word_no_quotes_no_index(self):
             """SPIRES search syntax - find refersto s parke"""
             inv_search = 'refersto:"s parke"'
             spi_search = 'find refersto s parke'
             self._compare_searches(inv_search, spi_search)
 
         def test_citedby_refersto_author(self):
             """SPIRES search syntax - find citedby refersto author penguin"""
             inv_search = 'refersto:citedby:author:penguin'
             spi_search = 'find refersto citedby author penguin'
             self._compare_searches(inv_search, spi_search)
 
         def test_irn_processing(self):
             """SPIRES search syntax - find irn 1360337 == find irn SPIRES-1360337"""
             # Added for trac-130
             with_spires = "fin irn SPIRES-1360337"
             with_result = perform_request_search(p=with_spires)
             without_spires = "fin irn 1360337"
             without_result = perform_request_search(p=without_spires)
             # We don't care if results are [], as long as they're the same
             # Uncovered corner case: parsing could be broken and also happen to
             # return [] twice.  Unlikely though.
             self.assertEqual(with_result, without_result)
 
         def test_topcite(self):
             """SPIRES search syntax - find topcite 50+"""
             inv_search = "cited:50->999999999"
             spi_search = "find topcite 50+"
             self._compare_searches(inv_search, spi_search)
 
         def test_topcit(self):
             """SPIRES search syntax - find topcit 50+"""
             inv_search = "cited:50->999999999"
             spi_search = "find topcit 50+"
             self._compare_searches(inv_search, spi_search)
 
         def test_caption(self):
             """SPIRES search syntax - find caption muon"""
             inv_search = "caption:muon"
             spi_search = "find caption muon"
             self._compare_searches(inv_search, spi_search)
 
         def test_caption_multi_word(self):
             """SPIRES search syntax - find caption quark mass"""
             inv_search = "caption:quark and caption:mass"
             spi_search = "find caption quark mass"
             self._compare_searches(inv_search, spi_search)
 
         def test_quotes(self):
             """SPIRES search syntax - find t 'compton scattering' and a mele"""
             inv_search = "title:'compton scattering' and author:mele"
             spi_search = "find t 'compton scattering' and a mele"
             self._compare_searches(inv_search, spi_search)
 
         def test_equals_sign(self):
             """SPIRES search syntax - find a beacom and date = 2000"""
             inv_search = "author:beacom year:2000"
             spi_search = "find a beacom and date = 2000"
             self._compare_searches(inv_search, spi_search)
 
         def test_type_code(self):
             """SPIRES search syntax - find tc/ps/scl review"""
             inv_search = "collection:review"
             spi_search = "find tc review"
             self._compare_searches(inv_search, spi_search)
             inv_search = "collection:review"
             spi_search = "find ps review"
             self._compare_searches(inv_search, spi_search)
             inv_search = "collection:review"
             spi_search = "find scl review"
             self._compare_searches(inv_search, spi_search)
 
         def test_field_code(self):
             """SPIRES search syntax - f f p"""
             inv_search = "subject:p"
             spi_search = "f f p"
             self._compare_searches(inv_search, spi_search)
 
         def test_coden(self):
             """SPIRES search syntax - find coden aphys"""
             inv_search = "journal:aphys"
             spi_search = "find coden aphys"
             self._compare_searches(inv_search, spi_search)
 
         def test_job_title(self):
             """SPIRES search syntax - find job engineer not position programmer"""
             inv_search = 'title:engineer not title:programmer'
             spi_search = 'find job engineer not position programmer'
             self._compare_searches(inv_search, spi_search)
 
         def test_job_rank(self):
             """SPIRES search syntax - find rank Postdoc"""
             inv_search = 'rank:Postdoc'
             spi_search = 'find rank Postdoc'
             self._compare_searches(inv_search, spi_search)
 
         def test_job_region(self):
             """SPIRES search syntax - find region EU not continent Europe"""
             inv_search = 'region:EU not region:Europe'
             spi_search = 'find region EU not continent Europe'
             self._compare_searches(inv_search, spi_search)
 
         def test_fin_to_find_trans(self):
             """SPIRES search syntax - fin a ellis, j == find a ellis, j"""
             fin_search = "fin a ellis, j"
             fin_result = perform_request_search(p=fin_search)
             find_search = "find a ellis, j"
             find_result = perform_request_search(p=find_search)
             # We don't care if results are [], as long as they're the same
             # Uncovered corner case: parsing could be broken and also happen to
             # return [] twice.  Unlikely though.
             self.assertEqual(fin_result, find_result)
 
         def test_distribution_of_notted_search_terms(self):
             """SPIRES search syntax - find t this and not that ->title:this and not title:that"""
             spi_search = "find t this and not that"
             inv_search = "title:this and not title:that"
             self._compare_searches(inv_search, spi_search)
 
         def test_distribution_without_spacing(self):
             """SPIRES search syntax - find aff SLAC and Stanford ->affiliation:SLAC and affiliation:Stanford"""
             # motivated by trac-187
             spi_search = "find aff SLAC and Stanford"
             inv_search = "affiliation:SLAC and affiliation:Stanford"
             self._compare_searches(inv_search, spi_search)
 
         def test_distribution_with_phrases(self):
             """SPIRES search syntax - find aff Penn State U -> affiliation:"Penn State U"""
             # motivated by trac-517
             spi_search = "find aff Penn State U"
             inv_search = "affiliation:\"Penn State U\""
             self._compare_searches(inv_search, spi_search)
 
         def test_distribution_with_many_clauses(self):
             """SPIRES search syntax - find a mele and brooks and holtkamp and o'connell"""
             spi_search = "find a mele and brooks and holtkamp and o'connell"
             inv_search = "author:mele author:brooks author:holtkamp author:o'connell"
             self._compare_searches(inv_search, spi_search)
 
         def test_keyword_as_kw(self):
             """SPIRES search syntax - find kw something ->keyword:something"""
             spi_search = "find kw meson"
             inv_search = "keyword:meson"
             self._compare_searches(inv_search, spi_search)
 
         def test_recid(self):
             """SPIRES search syntax - find recid 11111"""
             spi_search = 'find recid 111111'
             inv_search = 'recid:111111'
             self._compare_searches(inv_search, spi_search)
 
         def test_desy_keyword_translation(self):
             """SPIRES search syntax - find dk "B --> pi pi" """
             spi_search = "find dk \"B --> pi pi\""
             inv_search = "695__a:\"B --> pi pi\""
             self._compare_searches(inv_search, spi_search)
 
         def test_journal_section_joining(self):
             """SPIRES search syntax - journal Phys.Lett, 0903, 024 -> journal:Phys.Lett,0903,024"""
             spi_search = "find j Phys.Lett, 0903, 024"
             inv_search = "journal:Phys.Lett,0903,024"
             self._compare_searches(inv_search, spi_search)
 
         def test_journal_search_with_colon(self):
             """SPIRES search syntax - find j physics 1:195 -> journal:physics,1,195"""
             spi_search = "find j physics 1:195"
             inv_search = "journal:physics,1,195"
             self._compare_searches(inv_search, spi_search)
 
         def test_journal_non_triple_syntax(self):
             """SPIRES search syntax - find j physics jcap"""
             spi_search = "find j physics jcap"
             inv_search = "journal:physics and journal:jcap"
             self._compare_searches(inv_search, spi_search)
 
         def test_journal_triple_with_many_spaces(self):
             """SPIRES search syntax - find j physics        0903            024"""
             spi_search = 'find j physics        0903            024'
             inv_search = 'journal:physics,0903,024'
             self._compare_searches(inv_search, spi_search)
 
         def test_distribution_of_search_terms(self):
             """SPIRES search syntax - find t this and that ->title:this and title:that"""
             spi_search = "find t this and that"
             inv_search = "title:this and title:that"
             self._compare_searches(inv_search, spi_search)
 
         def test_syntax_converter_expand_search_patterns_alone(self):
             """SPIRES search syntax - simplest expansion"""
             spi_search = "find t bob sam"
             inv_search = "title:bob and title:sam"
             self._compare_searches(inv_search, spi_search)
 
         def test_syntax_converter_expand_fulltext(self):
             """SPIRES search syntax - fulltext support"""
             spi_search = "find ft The holographic RG is based on"
             inv_search = "fulltext:The and fulltext:holographic and fulltext:RG and fulltext:is and fulltext:based and fulltext:on"
             self._compare_searches(inv_search, spi_search)
 
         def test_syntax_converter_expand_fulltext_within_larger(self):
             """SPIRES search syntax - fulltext subsearch support"""
             spi_search = "find au taylor and ft The holographic RG is based on and t brane"
             inv_search = "author:taylor fulltext:The and fulltext:holographic and fulltext:RG and fulltext:is and fulltext:based and fulltext:on title:brane"
             self._compare_searches(inv_search, spi_search)
 
         def test_syntax_converter_expand_search_patterns_conjoined(self):
             """SPIRES search syntax - simplest distribution"""
             spi_search = "find t bob and sam"
             inv_search = "title:bob and title:sam"
             self._compare_searches(inv_search, spi_search)
 
         def test_syntax_converter_expand_search_patterns_multiple(self):
             """SPIRES search syntax - expansion (no distribution)"""
             spi_search = "find t bob sam and k couch"
             inv_search = "title:bob and title:sam and keyword:couch"
             self._compare_searches(inv_search, spi_search)
 
         def test_syntax_converter_expand_search_patterns_multiple_conjoined(self):
             """SPIRES search syntax - distribution and expansion"""
             spi_search = "find t bob sam and couch"
             inv_search = "title:bob and title:sam and title:couch"
             self._compare_searches(inv_search, spi_search)
 
+        def test_date_invalid(self):
+            """SPIRES search syntax - searching an invalid date"""
+            spi_search = "find date foo"
+            inv_search = "year:foo"
+            self._compare_searches(inv_search, spi_search)
+
         def test_date_by_yr(self):
             """SPIRES search syntax - searching by date year"""
             spi_search = "find date 2002"
             inv_search = "year:2002"
             self._compare_searches(inv_search, spi_search)
 
         def test_date_by_lt_yr(self):
             """SPIRES search syntax - searching by date < year"""
             spi_search = "find date < 2002"
             inv_search = 'year:0->2002'
             self._compare_searches(inv_search, spi_search)
 
         def test_date_by_gt_yr(self):
             """SPIRES search syntax - searching by date > year"""
             spi_search = "find date > 1980"
             inv_search = 'year:1980->9999'
             self._compare_searches(inv_search, spi_search)
 
         def test_date_by_yr_mo(self):
             """SPIRES search syntax - searching by date 1976-04"""
             spi_search = "find date 1976-04"
             inv_search = 'year:1976-04'
             self._compare_searches(inv_search, spi_search)
 
         def test_date_by_yr_mo_day_wholemonth_and_suffix(self):
             """SPIRES search syntax - searching by date 1976-04-01 and t dog"""
             spi_search = "find date 1976-04-01 and t dog"
-            inv_search = 'year:1976-04 and title:dog'
+            inv_search = 'year:1976-04-01 and title:dog'
             self._compare_searches(inv_search, spi_search)
 
         def test_date_by_yr_mo_day_and_suffix(self):
             """SPIRES search syntax - searching by date 1976-04-05 and t dog"""
             spi_search = "find date 1976-04-05 and t dog"
             inv_search = 'year:1976-04-05 and title:dog'
             self._compare_searches(inv_search, spi_search)
 
         def test_date_by_eq_yr_mo(self):
             """SPIRES search syntax - searching by date 1976-04"""
             spi_search = "find date 1976-04"
             inv_search = 'year:1976-04'
             self._compare_searches(inv_search, spi_search)
 
         def test_date_by_lt_yr_mo(self):
             """SPIRES search syntax - searching by date < 1978-10-21"""
             spi_search = "find date < 1978-10-21"
             inv_search = 'year:0->1978-10-21'
             self._compare_searches(inv_search, spi_search)
 
         def test_date_by_gt_yr_mo(self):
             """SPIRES search syntax - searching by date > 1978-10-21"""
             spi_search = "find date > 1978-10-21"
             inv_search = 'year:1978-10-21->9999'
             self._compare_searches(inv_search, spi_search)
 
+        if DATEUTIL_AVAILABLE:
+            def test_date_2_digits_year_month_day(self):
+                """SPIRES search syntax - searching by date > 78-10-21"""
+                spi_search = "find date 78-10-21"
+                inv_search = 'year:1978-10-21'
+                self._compare_searches(inv_search, spi_search)
+
+        if DATEUTIL_AVAILABLE:
+            def test_date_2_digits_year(self):
+                """SPIRES search syntax - searching by date 78"""
+                spi_search = "find date 78"
+                inv_search = 'year:1978'
+                self._compare_searches(inv_search, spi_search)
+
+        if DATEUTIL_AVAILABLE:
+            def test_date_2_digits_year_future(self):
+                """SPIRES search syntax - searching by date 2 years in the future"""
+                d = datetime.datetime.today() + datetime.timedelta(days=730)
+                spi_search = "find date %s" % d.strftime("%y")
+                inv_search = 'year:%s' % d.strftime("%Y")
+                self._compare_searches(inv_search, spi_search)
+
+        if DATEUTIL_AVAILABLE:
+            def test_date_2_digits_month_year(self):
+                """SPIRES search syntax - searching by date feb 12"""
+                # This should give us "feb 12" with us locale
+                d = datetime.datetime(year=2012, month=2, day=1)
+                date_str = d.strftime('%b %y')
+                spi_search = "find date %s" % date_str
+                inv_search = 'year:2012-02'
+                self._compare_searches(inv_search, spi_search)
+
         def test_spires_syntax_trailing_colon(self):
             """SPIRES search syntax - test for blowup with trailing colon"""
             spi_search = "find a watanabe:"
             invenio_search = "author:watanabe:"
             self._compare_searches(invenio_search, spi_search)
 
         if DATEUTIL_AVAILABLE:
             def test_date_by_lt_d_MO_yr(self):
                 """SPIRES search syntax - searching by date < 23 Sep 2010: will only work with dateutil installed"""
                 spi_search = "find date < 23 Sep 2010"
                 inv_search = 'year:0->2010-09-23'
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_by_gt_d_MO_yr(self):
                 """SPIRES search syntax - searching by date > 12 Jun 1960: will only work with dateutil installed"""
                 spi_search = "find date > 12 Jun 1960"
                 inv_search = 'year:1960-06-12->9999'
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_today(self):
                 """SPIRES search syntax - searching by today"""
                 spi_search = "find date today"
                 inv_search = "year:" + datetime.datetime.strftime(datetime.datetime.today(), '%Y-%m-%d')
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_yesterday(self):
                 """SPIRES search syntax - searching by yesterday"""
                 import dateutil.relativedelta
                 spi_search = "find date yesterday"
                 inv_search = "year:" + datetime.datetime.strftime(datetime.datetime.today()+dateutil.relativedelta.relativedelta(days=-1), '%Y-%m-%d')
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_this_month(self):
                 """SPIRES search syntax - searching by this month"""
                 spi_search = "find date this month"
                 inv_search = "year:" + datetime.datetime.strftime(datetime.datetime.today(), '%Y-%m')
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_last_month(self):
                 """SPIRES search syntax - searching by last month"""
                 spi_search = "find date last month"
                 inv_search = "year:" + datetime.datetime.strftime(datetime.datetime.today()\
                                                     +dateutil.relativedelta.relativedelta(months=-1), '%Y-%m')
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_this_week(self):
                 """SPIRES search syntax - searching by this week"""
                 spi_search = "find date this week"
-                inv_search = "year:" + datetime.datetime.strftime(datetime.datetime.today()\
-                             +dateutil.relativedelta.relativedelta(days=-(datetime.datetime.today().isoweekday()%7)), '%Y-%m-%d')
+                begin = datetime.datetime.today()
+                days_to_remove = datetime.datetime.today().isoweekday() % 7
+                begin += du_delta(days=-days_to_remove)
+                begin_str = datetime.datetime.strftime(begin, '%Y-%m-%d')
+                # Only 6 days cause the last day is included in the search
+                end = datetime.datetime.today()
+                end_str = datetime.datetime.strftime(end, '%Y-%m-%d')
+                inv_search = "year:%s->%s" % (begin_str, end_str)
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_last_week(self):
                 """SPIRES search syntax - searching by last week"""
                 spi_search = "find date last week"
-                inv_search = "year:" + datetime.datetime.strftime(datetime.datetime.today()\
-                             +dateutil.relativedelta.relativedelta(days=-(7+(datetime.datetime.today().isoweekday()%7))), '%Y-%m-%d')
+                begin = datetime.datetime.today()
+                days_to_remove = 7 + datetime.datetime.today().isoweekday() % 7
+                begin += du_delta(days=-days_to_remove)
+                begin_str = datetime.datetime.strftime(begin, '%Y-%m-%d')
+                # Only 6 days cause the last day is included in the search
+                end = begin + du_delta(days=6)
+                end_str = datetime.datetime.strftime(end, '%Y-%m-%d')
+                inv_search = "year:%s->%s" % (begin_str, end_str)
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_date_minus_days(self):
                 """SPIRES search syntax - searching by 2011-01-03 - 2"""
                 spi_search = "find date 2011-01-03 - 2"
-                inv_search = "year:2011-01"
+                inv_search = "year:2011-01-01"
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_date_minus_days_with_month_wrap(self):
                 """SPIRES search syntax - searching by 2011-03-01 - 1"""
                 spi_search = "find date 2011-03-01 - 1"
                 inv_search = "year:2011-02-28"
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_date_minus_days_with_year_wrap(self):
                 """SPIRES search syntax - searching by 2011-01-01 - 1"""
                 spi_search = "find date 2011-01-01 - 1"
                 inv_search = "year:2010-12-31"
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_date_minus_days_with_leapyear_february(self):
                 """SPIRES search syntax - searching by 2008-03-01 - 1"""
                 spi_search = "find date 2008-03-01 - 1"
                 inv_search = "year:2008-02-29"
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_date_minus_many_days(self):
                 """SPIRES search syntax - searching by 2011-02-24 - 946"""
                 spi_search = "find date 2011-02-24 - 946"
                 inv_search = "year:2008-07-23"
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_date_plus_days(self):
                 """SPIRES search syntax - searching by 2011-01-03 + 2"""
                 spi_search = "find date 2011-01-01 + 2"
                 inv_search = "year:2011-01-03"
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_plus_days_with_month_wrap(self):
                 """SPIRES search syntax - searching by 2011-03-31 + 2"""
                 spi_search = "find date 2011-03-31 + 2"
                 inv_search = "year:2011-04-02"
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_date_plus_days_with_year_wrap(self):
                 """SPIRES search syntax - searching by 2011-12-31 + 1"""
                 spi_search = "find date 2011-12-31 + 1"
-                inv_search = "year:2012-01"
+                inv_search = "year:2012-01-01"
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_date_plus_days_with_leapyear_february(self):
                 """SPIRES search syntax - searching by 2008-02-29 + 2"""
                 spi_search = "find date 2008-02-28 + 2"
-                inv_search = "year:2008-03"
+                inv_search = "year:2008-03-01"
                 self._compare_searches(inv_search, spi_search)
 
             def test_date_accept_date_plus_many_days(self):
                 """SPIRES search syntax - searching by 2011-02-24 + 666"""
                 spi_search = "find date 2011-02-24 + 666"
                 inv_search = "year:2012-12-21"
                 self._compare_searches(inv_search, spi_search)
 
         def test_spires_syntax_detected_f(self):
             """SPIRES search syntax - test detection f t p"""
             # trac #261
             converter = search_engine_query_parser.SpiresToInvenioSyntaxConverter()
             spi_search = converter.is_applicable("f t p")
             self.assertEqual(spi_search, True)
 
         def test_spires_syntax_detected_fin(self):
             """SPIRES search syntax - test detection fin t p"""
             # trac #261
             converter = search_engine_query_parser.SpiresToInvenioSyntaxConverter()
             spi_search = converter.is_applicable("fin t p")
             self.assertEqual(spi_search, True)
 
         def test_spires_keyword_distribution_before_conjunctions(self):
             """SPIRES search syntax - test find journal phys.lett. 0903 024"""
             spi_search = 'find journal phys.lett. 0903 024'
             inv_search = '(journal:phys.lett.,0903,024)'
             self._compare_searches(inv_search, spi_search)
 
         def test_spires_keyword_distribution_with_parens(self):
             """SPIRES search syntax - test find cn d0 and (a abachi or abbott or abazov)"""
             spi_search = "find cn d0 and (a abachi or abbott or abazov)"
             inv_search = "collaboration:d0 and (author:abachi or author:abbott or author:abazov)"
             self._compare_searches(inv_search, spi_search)
 
         def test_super_short_author_name(self):
             """SPIRES search syntax - test fin a er and cn cms"""
             spi_search = "fin a er and cn cms"
             inv_search = "author:er collaboration:cms"
             self._compare_searches(inv_search, spi_search)
 
         def test_simple_syntax_mixing(self):
             """SPIRES and invenio search syntax - find a ellis and citedby:hawking"""
             combo_search = "find a ellis and citedby:hawking"
             inv_search = "author:ellis citedby:hawking"
             self._compare_searches(inv_search, combo_search)
 
         def test_author_first_syntax_mixing(self):
             """SPIRES and invenio search syntax - find a dixon, l.j. cited:10->52"""
             combo_search = 'find a dixon, l.j. cited:10->52'
             inv_search = 'author:"dixon, l* j*" cited:10->52'
             self._compare_searches(inv_search, combo_search)
 
         def test_minus_boolean_syntax_mixing(self):
             """SPIRES and invenio search syntax - find a ellis -title:muon"""
             combo_search = 'find a ellis -title:muon'
             inv_search = 'author:ellis -title:muon'
             self._compare_searches(inv_search, combo_search)
 
         def test_plus_boolean_syntax_mixing(self):
             """SPIRES and invenio search syntax - find a ellis +title:muon"""
             combo_search = 'find a ellis +title:muon'
             inv_search = 'author:ellis title:muon'
             self._compare_searches(inv_search, combo_search)
 
         def test_second_level_syntax_mixing(self):
             """SPIRES and invenio search syntax - find a ellis refersto:author:hawking"""
             combo_search = 'find a ellis refersto:author:hawking'
             inv_search = 'author:ellis refersto:author:hawking'
             self._compare_searches(inv_search, combo_search)
 
     if CFG_WEBSEARCH_SPIRES_SYNTAX > 1:
         def test_absorbs_naked_a_search(self):
             """SPIRES search syntax - a ellis"""
             invenio_search = "author:ellis"
             naked_search = "a ellis"
             self._compare_searches(invenio_search, naked_search)
 
         def test_absorbs_naked_author_search(self):
             """SPIRES search syntax - author ellis"""
             invenio_search = "author:ellis"
             spi_search = "author ellis"
             self._compare_searches(invenio_search, spi_search)
 
         def test_spires_syntax_detected_naked_a(self):
             """SPIRES search syntax - test detection a ellis"""
             converter = search_engine_query_parser.SpiresToInvenioSyntaxConverter()
             spi_search = converter.is_applicable("a ellis")
             self.assertEqual(spi_search, True)
 
         def test_spires_syntax_detected_naked_author(self):
             """SPIRES search syntax - test detection author ellis"""
             converter = search_engine_query_parser.SpiresToInvenioSyntaxConverter()
             spi_search = converter.is_applicable("author ellis")
             self.assertEqual(spi_search, True)
 
         def test_spires_syntax_detected_naked_author_leading_spaces(self):
             """SPIRES search syntax - test detection              author ellis"""
             converter = search_engine_query_parser.SpiresToInvenioSyntaxConverter()
             spi_search = converter.is_applicable("             author ellis")
             self.assertEqual(spi_search, True)
 
         def test_spires_syntax_detected_naked_title(self):
             """SPIRES search syntax - test detection t muon"""
             converter = search_engine_query_parser.SpiresToInvenioSyntaxConverter()
             spi_search = converter.is_applicable("t muon")
             self.assertEqual(spi_search, True)
 
         def test_spires_syntax_detected_second_keyword(self):
             """SPIRES search syntax - test detection author:ellis and t muon"""
             converter = search_engine_query_parser.SpiresToInvenioSyntaxConverter()
             spi_search = converter.is_applicable("author:ellis and t muon")
             self.assertEqual(spi_search, True)
 
     def test_spires_syntax_detected_invenio(self):
         """SPIRES search syntax - test detection Not SPIRES"""
         # trac #261
         converter = search_engine_query_parser.SpiresToInvenioSyntaxConverter()
         inv_search = converter.is_applicable("t:p a:c")
         self.assertEqual(inv_search, False)
 
     def test_invenio_syntax_only_second_level(self):
         """invenio search syntax - citedby:reportnumber:hep-th/0205061"""
         inv_search = 'citedby:reportnumber:hep-th/0205061'
         self._compare_searches(inv_search, inv_search)
 
     def test_invenio_syntax_only_boolean(self):
         """invenio search syntax - author:ellis and not title:hadronic and not title:collisions"""
         inv_search = "author:ellis and not title:hadronic and not title:collisions"
         self._compare_searches(inv_search, inv_search)
 
 TEST_SUITE = make_test_suite(TestSearchQueryParenthesisedParser,
                              TestSpiresToInvenioSyntaxConverter,
                              TestParserUtilityFunctions)
 
 if __name__ == "__main__":
     run_test_suite(TEST_SUITE)
     #run_test_suite(make_test_suite(TestParserUtilityFunctions, TestSearchQueryParenthesisedParser))  # DEBUG
diff --git a/modules/websearch/lib/websearch_regression_tests.py b/modules/websearch/lib/websearch_regression_tests.py
index 4e603e870..660806c4a 100644
--- a/modules/websearch/lib/websearch_regression_tests.py
+++ b/modules/websearch/lib/websearch_regression_tests.py
@@ -1,2852 +1,2854 @@
 # -*- coding: utf-8 -*-
 ##
 ## This file is part of Invenio.
 ## Copyright (C) 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 # pylint: disable=C0301
 # pylint: disable=E1102
 
 """WebSearch module regression tests."""
 
 __revision__ = "$Id$"
 
 import unittest
 import re
 import urlparse, cgi
 import sys
 import cStringIO
 
 if sys.hexversion < 0x2040000:
     # pylint: disable=W0622
     from sets import Set as set
     # pylint: enable=W0622
 
 from mechanize import Browser, LinkNotFoundError
 
 from invenio.config import CFG_SITE_URL, CFG_SITE_NAME, CFG_SITE_LANG, \
     CFG_SITE_RECORD, CFG_SITE_LANGS, \
     CFG_SITE_SECURE_URL, CFG_WEBSEARCH_SPIRES_SYNTAX
 from invenio.testutils import make_test_suite, \
                               run_test_suite, \
                               nottest, \
                               make_url, make_surl, test_web_page_content, \
                               merge_error_messages
 from invenio.urlutils import same_urls_p
 from invenio.dbquery import run_sql
 from invenio.search_engine import perform_request_search, \
     guess_primary_collection_of_a_record, guess_collection_of_a_record, \
     collection_restricted_p, get_permitted_restricted_collections, \
     search_pattern, search_unit, search_unit_in_bibrec, \
     wash_colls, record_public_p
 from invenio import search_engine_summarizer
 from invenio.search_engine_utils import get_fieldvalues
 from invenio.intbitset import intbitset
 from invenio.search_engine import intersect_results_with_collrecs
 from invenio.bibrank_bridge_utils import get_external_word_similarity_ranker
+from invenio.search_engine_query_parser_unit_tests import DATEUTIL_AVAILABLE
 
 if 'fr' in CFG_SITE_LANGS:
     lang_french_configured = True
 else:
     lang_french_configured = False
 
 
 def parse_url(url):
     parts = urlparse.urlparse(url)
     query = cgi.parse_qs(parts[4], True)
 
     return parts[2].split('/')[1:], query
 
 def string_combinations(str_list):
     """Returns all the possible combinations of the strings in the list.
     Example: for the list ['A','B','Cd'], it will return
     [['Cd', 'B', 'A'], ['B', 'A'], ['Cd', 'A'], ['A'], ['Cd', 'B'], ['B'], ['Cd'], []]
     It adds "B", "H", "F" and "S" values to the results so different
     combinations of them are also checked.
     """
     out_list = []
     for i in range(len(str_list) + 1):
         out_list += list(combinations(str_list, i))
     for i in range(len(out_list)):
         out_list[i] = (list(out_list[i]) + {
             0: lambda: ["B", "H", "S"],
             1: lambda: ["B", "H", "F"],
             2: lambda: ["B", "F", "S"],
             3: lambda: ["B", "F"],
             4: lambda: ["B", "S"],
             5: lambda: ["B", "H"],
             6: lambda: ["B"]
         }[i % 7]())
     return out_list
 
 def combinations(iterable, r):
     """Return r length subsequences of elements from the input iterable."""
     # combinations('ABCD', 2) --> AB AC AD BC BD CD
     # combinations(range(4), 3) --> 012 013 023 123
     pool = tuple(iterable)
     n = len(pool)
     if r > n:
         return
     indices = range(r)
     yield tuple(pool[i] for i in indices)
     while True:
         for i in reversed(range(r)):
             if indices[i] != i + n - r:
                 break
         else:
             return
         indices[i] += 1
         for j in range(i+1, r):
             indices[j] = indices[j-1] + 1
         yield tuple(pool[i] for i in indices)
 
 class WebSearchWebPagesAvailabilityTest(unittest.TestCase):
     """Check WebSearch web pages whether they are up or not."""
 
     def test_search_interface_pages_availability(self):
         """websearch - availability of search interface pages"""
 
         baseurl = CFG_SITE_URL + '/'
 
         _exports = ['', 'collection/Poetry', 'collection/Poetry?as=1']
 
         error_messages = []
         for url in [baseurl + page for page in _exports]:
             error_messages.extend(test_web_page_content(url))
         if error_messages:
             self.fail(merge_error_messages(error_messages))
         return
 
     def test_search_results_pages_availability(self):
         """websearch - availability of search results pages"""
 
         baseurl = CFG_SITE_URL + '/search'
 
         _exports = ['', '?c=Poetry', '?p=ellis', '/cache', '/log']
 
         error_messages = []
         for url in [baseurl + page for page in _exports]:
             error_messages.extend(test_web_page_content(url))
         if error_messages:
             self.fail(merge_error_messages(error_messages))
         return
 
     def test_search_detailed_record_pages_availability(self):
         """websearch - availability of search detailed record pages"""
 
         baseurl = CFG_SITE_URL + '/'+ CFG_SITE_RECORD +'/'
 
         _exports = ['', '1', '1/', '1/files', '1/files/']
 
         error_messages = []
         for url in [baseurl + page for page in _exports]:
             error_messages.extend(test_web_page_content(url))
         if error_messages:
             self.fail(merge_error_messages(error_messages))
         return
 
     def test_browse_results_pages_availability(self):
         """websearch - availability of browse results pages"""
 
         baseurl = CFG_SITE_URL + '/search'
 
         _exports = ['?p=ellis&f=author&action_browse=Browse']
 
         error_messages = []
         for url in [baseurl + page for page in _exports]:
             error_messages.extend(test_web_page_content(url))
         if error_messages:
             self.fail(merge_error_messages(error_messages))
         return
 
     def test_help_page_availability(self):
         """websearch - availability of Help Central page"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/help',
                                                expected_text="Help Central"))
 
     if lang_french_configured:
         def test_help_page_availability_fr(self):
             """websearch - availability of Help Central page in french"""
             self.assertEqual([],
                              test_web_page_content(CFG_SITE_URL + '/help/?ln=fr',
                                                    expected_text="Centre d'aide"))
 
     def test_search_tips_page_availability(self):
         """websearch - availability of Search Tips"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/help/search-tips',
                                                expected_text="Search Tips"))
 
     if lang_french_configured:
         def test_search_tips_page_availability_fr(self):
             """websearch - availability of Search Tips in french"""
             self.assertEqual([],
                              test_web_page_content(CFG_SITE_URL + '/help/search-tips?ln=fr',
                                                    expected_text="Conseils de recherche"))
 
     def test_search_guide_page_availability(self):
         """websearch - availability of Search Guide"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/help/search-guide',
                                                expected_text="Search Guide"))
 
     if lang_french_configured:
         def test_search_guide_page_availability_fr(self):
             """websearch - availability of Search Guide in french"""
             self.assertEqual([],
                              test_web_page_content(CFG_SITE_URL + '/help/search-guide?ln=fr',
                                                    expected_text="Guide de recherche"))
 
 class WebSearchTestLegacyURLs(unittest.TestCase):
 
     """ Check that the application still responds to legacy URLs for
     navigating, searching and browsing."""
 
     def test_legacy_collections(self):
         """ websearch - collections handle legacy urls """
 
         browser = Browser()
 
         def check(legacy, new, browser=browser):
             browser.open(legacy)
             got = browser.geturl()
 
             self.failUnless(same_urls_p(got, new), got)
 
         # Use the root URL unless we need more
         check(make_url('/', c=CFG_SITE_NAME),
               make_url('/', ln=CFG_SITE_LANG))
 
         # Other collections are redirected in the /collection area
         check(make_url('/', c='Poetry'),
               make_url('/collection/Poetry', ln=CFG_SITE_LANG))
 
         # Drop unnecessary arguments, like ln and as (when they are
         # the default value)
         args = {'as': 0}
         check(make_url('/', c='Poetry', **args),
               make_url('/collection/Poetry', ln=CFG_SITE_LANG))
 
         # Otherwise, keep them
         args = {'as': 1, 'ln': CFG_SITE_LANG}
         check(make_url('/', c='Poetry', **args),
               make_url('/collection/Poetry', **args))
 
         # Support the /index.py addressing too
         check(make_url('/index.py', c='Poetry'),
               make_url('/collection/Poetry', ln=CFG_SITE_LANG))
 
 
     def test_legacy_search(self):
         """ websearch - search queries handle legacy urls """
 
         browser = Browser()
 
         def check(legacy, new, browser=browser):
             browser.open(legacy)
             got = browser.geturl()
 
             self.failUnless(same_urls_p(got, new), got)
 
         # /search.py is redirected on /search
         # Note that `as' is a reserved word in Python 2.5
         check(make_url('/search.py', p='nuclear', ln='en') + 'as=1',
               make_url('/search', p='nuclear', ln='en') + 'as=1')
 
     if lang_french_configured:
         def test_legacy_search_fr(self):
             """ websearch - search queries handle legacy urls """
 
             browser = Browser()
 
             def check(legacy, new, browser=browser):
                 browser.open(legacy)
                 got = browser.geturl()
 
                 self.failUnless(same_urls_p(got, new), got)
 
             # direct recid searches are redirected to /CFG_SITE_RECORD
             check(make_url('/search.py', recid=1, ln='fr'),
                   make_url('/%s/1' % CFG_SITE_RECORD, ln='fr'))
 
     def test_legacy_search_help_link(self):
         """websearch - legacy Search Help page link"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/help/search/index.en.html',
                                                expected_text="Help Central"))
 
     if lang_french_configured:
         def test_legacy_search_tips_link(self):
             """websearch - legacy Search Tips page link"""
             self.assertEqual([],
                              test_web_page_content(CFG_SITE_URL + '/help/search/tips.fr.html',
                                                    expected_text="Conseils de recherche"))
 
     def test_legacy_search_guide_link(self):
         """websearch - legacy Search Guide page link"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/help/search/guide.en.html',
                                                expected_text="Search Guide"))
 
 class WebSearchTestRecord(unittest.TestCase):
     """ Check the interface of the /CFG_SITE_RECORD results """
 
     def test_format_links(self):
         """ websearch - check format links for records """
 
         browser = Browser()
 
         # We open the record in all known HTML formats
         for hformat in ('hd', 'hx', 'hm'):
             browser.open(make_url('/%s/1' % CFG_SITE_RECORD, of=hformat))
 
             if hformat == 'hd':
                 # hd format should have a link to the following
                 # formats
                 for oformat in ('hx', 'hm', 'xm', 'xd'):
                     target = make_url('/%s/1/export/%s?ln=en' % (CFG_SITE_RECORD, oformat))
                     try:
                         browser.find_link(url=target)
                     except LinkNotFoundError:
                         self.fail('link %r should be in page' % target)
             else:
                 # non-hd HTML formats should have a link back to
                 # the main detailed record
                 target = make_url('/%s/1' % CFG_SITE_RECORD)
                 try:
                     browser.find_link(url=target)
                 except LinkNotFoundError:
                     self.fail('link %r should be in page' % target)
 
         return
 
     def test_exported_formats(self):
         """ websearch - check formats exported through /CFG_SITE_RECORD/1/export/ URLs"""
 
         self.assertEqual([],
                          test_web_page_content(make_url('/%s/1/export/hm' % CFG_SITE_RECORD),
                                                expected_text='245__ $$aALEPH experiment'))
         self.assertEqual([],
                          test_web_page_content(make_url('/%s/1/export/hd' % CFG_SITE_RECORD),
                                                expected_text='<strong>ALEPH experiment'))
         self.assertEqual([],
                          test_web_page_content(make_url('/%s/1/export/xm' % CFG_SITE_RECORD),
                                                expected_text='<subfield code="a">ALEPH experiment'))
         self.assertEqual([],
                          test_web_page_content(make_url('/%s/1/export/xd' % CFG_SITE_RECORD),
                                                expected_text='<dc:title>ALEPH experiment'))
         self.assertEqual([],
                          test_web_page_content(make_url('/%s/1/export/hs' % CFG_SITE_RECORD),
                                                expected_text='<a href="/%s/1?ln=%s">ALEPH experiment' % \
                                                (CFG_SITE_RECORD, CFG_SITE_LANG)))
         self.assertEqual([],
                          test_web_page_content(make_url('/%s/1/export/hx' % CFG_SITE_RECORD),
                                                expected_text='title        = "ALEPH experiment'))
         self.assertEqual([],
                          test_web_page_content(make_url('/%s/1/export/t?ot=245' % CFG_SITE_RECORD),
                                                expected_text='245__ $$aALEPH experiment'))
         self.assertNotEqual([],
                          test_web_page_content(make_url('/%s/1/export/t?ot=245' % CFG_SITE_RECORD),
                                                expected_text='001__'))
         self.assertEqual([],
                          test_web_page_content(make_url('/%s/1/export/h?ot=245' % CFG_SITE_RECORD),
                                                expected_text='245__ $$aALEPH experiment'))
         self.assertNotEqual([],
                          test_web_page_content(make_url('/%s/1/export/h?ot=245' % CFG_SITE_RECORD),
                                                expected_text='001__'))
         return
 
     def test_plots_tab(self):
         """ websearch - test to ensure the plots tab is working """
         self.assertEqual([],
                          test_web_page_content(make_url('/%s/8/plots' % CFG_SITE_RECORD),
                                                expected_text='div id="clip"',
                                                unexpected_text='Abstract'))
     def test_meta_header(self):
         """ websearch - test that metadata embedded in header of hd
         relies on hdm format and Default_HTML_meta bft, but hook is in
         websearch to display the format
         """
 
         self.assertEqual([],
                          test_web_page_content(make_url('/record/1'),
                                                expected_text='<meta content="ALEPH experiment: Candidate of Higgs boson production" name="citation_title" />'))
         return
 
 
 class WebSearchTestCollections(unittest.TestCase):
 
     def test_traversal_links(self):
         """ websearch - traverse all the publications of a collection """
 
         browser = Browser()
 
         try:
             for aas in (0, 1):
                 args = {'as': aas}
                 browser.open(make_url('/collection/Preprints', **args))
 
                 for jrec in (11, 21, 11, 27):
                     args = {'jrec': jrec, 'cc': 'Preprints'}
                     if aas:
                         args['as'] = aas
 
                     url = make_url('/search', **args)
                     try:
                         browser.follow_link(url=url)
                     except LinkNotFoundError:
                         args['ln'] = CFG_SITE_LANG
                         url = make_url('/search', **args)
                         browser.follow_link(url=url)
 
         except LinkNotFoundError:
             self.fail('no link %r in %r' % (url, browser.geturl()))
 
     def test_collections_links(self):
         """ websearch - enter in collections and subcollections """
 
         browser = Browser()
 
         def tryfollow(url):
             cur = browser.geturl()
             body = browser.response().read()
             try:
                 browser.follow_link(url=url)
             except LinkNotFoundError:
                 print body
                 self.fail("in %r: could not find %r" % (
                     cur, url))
             return
 
         for aas in (0, 1):
             if aas:
                 kargs = {'as': 1}
             else:
                 kargs = {}
 
             kargs['ln'] = CFG_SITE_LANG
 
             # We navigate from immediate son to immediate son...
             browser.open(make_url('/', **kargs))
             tryfollow(make_url('/collection/Articles%20%26%20Preprints',
                                **kargs))
             tryfollow(make_url('/collection/Articles', **kargs))
 
             # But we can also jump to a grandson immediately
             browser.back()
             browser.back()
             tryfollow(make_url('/collection/ALEPH', **kargs))
 
         return
 
     def test_records_links(self):
         """ websearch - check the links toward records in leaf collections """
 
         browser = Browser()
         browser.open(make_url('/collection/Preprints'))
 
         def harvest():
 
             """ Parse all the links in the page, and check that for
             each link to a detailed record, we also have the
             corresponding link to the similar records."""
 
             records = set()
             similar = set()
 
             for link in browser.links():
                 path, q = parse_url(link.url)
 
                 if not path:
                     continue
 
                 if path[0] == CFG_SITE_RECORD:
                     records.add(int(path[1]))
                     continue
 
                 if path[0] == 'search':
                     if not q.get('rm') == ['wrd']:
                         continue
 
                     recid = q['p'][0].split(':')[1]
                     similar.add(int(recid))
 
             self.failUnlessEqual(records, similar)
 
             return records
 
         # We must have 10 links to the corresponding /CFG_SITE_RECORD
         found = harvest()
         self.failUnlessEqual(len(found), 10)
 
         # When clicking on the "Search" button, we must also have
         # these 10 links on the records.
         browser.select_form(name="search")
         browser.submit()
 
         found = harvest()
         self.failUnlessEqual(len(found), 10)
         return
 
     def test_em_parameter(self):
         """ websearch - check different values of em return different parts of the collection page"""
         for combi in string_combinations(["L", "P", "Prt"]):
             url = '/collection/Articles?em=%s' % ','.join(combi)
             expected_text = ["<strong>Development of photon beam diagnostics for VUV radiation from a SASE FEL</strong>"]
             unexpected_text = []
             if "H" in combi:
                 expected_text.append(">Atlantis Institute of Fictive Science</a>")
             else:
                 unexpected_text.append(">Atlantis Institute of Fictive Science</a>")
             if "F" in combi:
                 expected_text.append("This site is also available in the following languages:")
             else:
                 unexpected_text.append("This site is also available in the following languages:")
             if "S" in combi:
                 expected_text.append('value="Search"')
             else:
                 unexpected_text.append('value="Search"')
             if "L" in combi:
                 expected_text.append('Search also:')
             else:
                 unexpected_text.append('Search also:')
             if "Prt" in combi or "P" in combi:
                 expected_text.append('<div class="portalboxheader">ABOUT ARTICLES</div>')
             else:
                 unexpected_text.append('<div class="portalboxheader">ABOUT ARTICLES</div>')
             self.assertEqual([], test_web_page_content(make_url(url),
                                            expected_text=expected_text,
                                            unexpected_text=unexpected_text))
         return
 
 class WebSearchTestBrowse(unittest.TestCase):
 
     def test_browse_field(self):
         """ websearch - check that browsing works """
 
         browser = Browser()
         browser.open(make_url('/'))
 
         browser.select_form(name='search')
         browser['f'] = ['title']
         browser.submit(name='action_browse')
 
         def collect():
             # We'll get a few links to search for the actual hits, plus a
             # link to the following results.
             res = []
             for link in browser.links(url_regex=re.compile(CFG_SITE_URL +
                                                            r'/search\?')):
                 if link.text == 'Advanced Search':
                     continue
 
                 dummy, q = parse_url(link.url)
                 res.append((link, q))
 
             return res
 
         # if we follow the last link, we should get another
         # batch. There is an overlap of one item.
         batch_1 = collect()
 
         browser.follow_link(link=batch_1[-1][0])
 
         batch_2 = collect()
 
         # FIXME: we cannot compare the whole query, as the collection
         # set is not equal
         self.failUnlessEqual(batch_1[-2][1]['p'], batch_2[0][1]['p'])
 
     def test_browse_restricted_record_as_unauthorized_user(self):
         """websearch - browse for a record that belongs to a restricted collection as an unauthorized user."""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?p=CERN-THESIS-99-074&f=088__a&action_browse=Browse&ln=en',
                                                username = 'guest',
                                                expected_text = ['Hits', '088__a'],
                                                unexpected_text = ['>CERN-THESIS-99-074</a>'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_browse_restricted_record_as_unauthorized_user_in_restricted_collection(self):
         """websearch - browse for a record that belongs to a restricted collection as an unauthorized user."""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?p=CERN-THESIS-99-074&f=088__a&action_browse=Browse&c=ALEPH+Theses&ln=en',
                                                username='guest',
                                                expected_text= ['This collection is restricted'],
                                                unexpected_text= ['Hits', '>CERN-THESIS-99-074</a>'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_browse_restricted_record_as_authorized_user(self):
         """websearch - browse for a record that belongs to a restricted collection as an authorized user."""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?p=CERN-THESIS-99-074&f=088__a&action_browse=Browse&ln=en',
                                                username='admin',
                                                password='',
                                                expected_text= ['Hits', '088__a'],
                                                unexpected_text = ['>CERN-THESIS-99-074</a>'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_browse_restricted_record_as_authorized_user_in_restricted_collection(self):
         """websearch - browse for a record that belongs to a restricted collection as an authorized user."""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?p=CERN-THESIS-99-074&f=088__a&action_browse=Browse&c=ALEPH+Theses&ln=en',
                                                username='admin',
                                                password='',
                                                expected_text= ['Hits', '>CERN-THESIS-99-074</a>'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_browse_exact_author_help_link(self):
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&p=Dasse%2C+Michel&f=author&action_browse=Browse',
                                                username = 'guest',
                                                expected_text = ['Did you mean to browse in', 'index?'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&p=Dasse%2C+Michel&f=firstauthor&action_browse=Browse',
                                                username = 'guest',
                                                expected_text = ['Did you mean to browse in', 'index?'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&as=1&m1=a&p1=Dasse%2C+Michel&f1=author&op1=a&m2=a&p2=&f2=firstauthor&op2=a&m3=a&p3=&f3=&action_browse=Browse',
                                                username = 'guest',
                                                expected_text = ['Did you mean to browse in', 'index?'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
 
 class WebSearchTestOpenURL(unittest.TestCase):
 
     def test_isbn_01(self):
         """ websearch - isbn query via OpenURL 0.1"""
 
         browser = Browser()
 
         # We do a precise search in an isolated collection
         browser.open(make_url('/openurl', isbn='0387940758'))
 
         dummy, current_q = parse_url(browser.geturl())
 
         self.failUnlessEqual(current_q, {
             'sc' : ['1'],
             'p' : ['isbn:"0387940758"'],
             'of' : ['hd']
         })
 
     def test_isbn_10_rft_id(self):
         """ websearch - isbn query via OpenURL 1.0 - rft_id"""
 
         browser = Browser()
 
         # We do a precise search in an isolated collection
         browser.open(make_url('/openurl', rft_id='urn:ISBN:0387940758'))
 
         dummy, current_q = parse_url(browser.geturl())
 
         self.failUnlessEqual(current_q, {
             'sc' : ['1'],
             'p' : ['isbn:"0387940758"'],
             'of' : ['hd']
         })
 
     def test_isbn_10(self):
         """ websearch - isbn query via OpenURL 1.0"""
 
         browser = Browser()
 
         # We do a precise search in an isolated collection
         browser.open(make_url('/openurl?rft.isbn=0387940758'))
 
         dummy, current_q = parse_url(browser.geturl())
 
         self.failUnlessEqual(current_q, {
             'sc' : ['1'],
             'p' : ['isbn:"0387940758"'],
             'of' : ['hd']
         })
 
 
 class WebSearchTestSearch(unittest.TestCase):
 
     def test_hits_in_other_collection(self):
         """ websearch - check extension of a query to the home collection """
 
         browser = Browser()
 
         # We do a precise search in an isolated collection
         browser.open(make_url('/collection/ISOLDE', ln='en'))
 
         browser.select_form(name='search')
         browser['f'] = ['author']
         browser['p'] = 'matsubara'
         browser.submit()
 
         dummy, current_q = parse_url(browser.geturl())
 
         link = browser.find_link(text_regex=re.compile('.*hit', re.I))
         dummy, target_q = parse_url(link.url)
 
         # the target query should be the current query without any c
         # or cc specified.
         for f in ('cc', 'c', 'action_search'):
             if f in current_q:
                 del current_q[f]
 
         self.failUnlessEqual(current_q, target_q)
 
     def test_nearest_terms(self):
         """ websearch - provide a list of nearest terms """
 
         browser = Browser()
         browser.open(make_url(''))
 
         # Search something weird
         browser.select_form(name='search')
         browser['p'] = 'gronf'
         browser.submit()
 
         dummy, original = parse_url(browser.geturl())
 
         for to_drop in ('cc', 'action_search', 'f'):
             if to_drop in original:
                 del original[to_drop]
 
         if 'ln' not in original:
             original['ln'] = [CFG_SITE_LANG]
 
         # we should get a few searches back, which are identical
         # except for the p field being substituted (and the cc field
         # being dropped).
         if 'cc' in original:
             del original['cc']
 
         for link in browser.links(url_regex=re.compile(CFG_SITE_URL + r'/search\?')):
             if link.text == 'Advanced Search':
                 continue
 
             dummy, target = parse_url(link.url)
 
             if 'ln' not in target:
                 target['ln'] = [CFG_SITE_LANG]
 
             original['p'] = [link.text]
             self.failUnlessEqual(original, target)
 
         return
 
     def test_switch_to_simple_search(self):
         """ websearch - switch to simple search """
 
         browser = Browser()
         args = {'as': 1}
         browser.open(make_url('/collection/ISOLDE', **args))
 
         browser.select_form(name='search')
         browser['p1'] = 'tandem'
         browser['f1'] = ['title']
         browser.submit()
 
         browser.follow_link(text='Simple Search')
 
         dummy, q = parse_url(browser.geturl())
 
         self.failUnlessEqual(q, {'cc': ['ISOLDE'],
                                  'p': ['tandem'],
                                  'f': ['title'],
                                  'ln': ['en']})
 
     def test_switch_to_advanced_search(self):
         """ websearch - switch to advanced search """
 
         browser = Browser()
         browser.open(make_url('/collection/ISOLDE'))
 
         browser.select_form(name='search')
         browser['p'] = 'tandem'
         browser['f'] = ['title']
         browser.submit()
 
         browser.follow_link(text='Advanced Search')
 
         dummy, q = parse_url(browser.geturl())
 
         self.failUnlessEqual(q, {'cc': ['ISOLDE'],
                                  'p1': ['tandem'],
                                  'f1': ['title'],
                                  'as': ['1'],
                                  'ln' : ['en']})
 
     def test_no_boolean_hits(self):
         """ websearch - check the 'no boolean hits' proposed links """
 
         browser = Browser()
         browser.open(make_url(''))
 
         browser.select_form(name='search')
         browser['p'] = 'quasinormal muon'
         browser.submit()
 
         dummy, q = parse_url(browser.geturl())
 
         for to_drop in ('cc', 'action_search', 'f'):
             if to_drop in q:
                 del q[to_drop]
 
         for bsu in ('quasinormal', 'muon'):
             l = browser.find_link(text=bsu)
             q['p'] = bsu
 
             if not same_urls_p(l.url, make_url('/search', **q)):
                 self.fail(repr((l.url, make_url('/search', **q))))
 
     def test_similar_authors(self):
         """ websearch - test similar authors box """
 
         browser = Browser()
         browser.open(make_url(''))
 
         browser.select_form(name='search')
         browser['p'] = 'Ellis, R K'
         browser['f'] = ['author']
         browser.submit()
 
         l = browser.find_link(text="Ellis, R S")
         self.failUnless(same_urls_p(l.url, make_url('/search',
                                                     p="Ellis, R S",
                                                     f='author',
                                                     ln='en')))
 
     def test_em_parameter(self):
         """ websearch - check different values of em return different parts of the search page"""
         for combi in string_combinations(["K", "A", "I", "O"]):
             url = '/search?ln=en&cc=Articles+%%26+Preprints&sc=1&c=Articles&c=Preprints&em=%s' % ','.join(combi)
             expected_text = ["<strong>Development of photon beam diagnostics for VUV radiation from a SASE FEL</strong>"]
             unexpected_text = []
             if "H" in combi:
                 expected_text.append(">Atlantis Institute of Fictive Science</a>")
             else:
                 unexpected_text.append(">Atlantis Institute of Fictive Science</a>")
             if "F" in combi:
                 expected_text.append("This site is also available in the following languages:")
             else:
                 unexpected_text.append("This site is also available in the following languages:")
             if "S" in combi:
                 expected_text.append('value="Search"')
             else:
                 unexpected_text.append('value="Search"')
             if "K" in combi:
                 expected_text.append('value="Add to basket"')
             else:
                 unexpected_text.append('value="Add to basket"')
             if "A" in combi:
                 expected_text.append('Interested in being notified about new results for this query?')
             else:
                 unexpected_text.append('Interested in being notified about new results for this query?')
             if "I" in combi:
                 expected_text.append('jump to record:')
             else:
                 unexpected_text.append('jump to record:')
             if "O" in combi:
                 expected_text.append('<th class="searchresultsboxheader"><strong>Results overview:</strong> Found <strong>')
             else:
                 unexpected_text.append('<th class="searchresultsboxheader"><strong>Results overview:</strong> Found <strong>')
             self.assertEqual([], test_web_page_content(make_url(url),
                                            expected_text=expected_text,
                                            unexpected_text=unexpected_text))
         return
 
 class WebSearchTestWildcardLimit(unittest.TestCase):
     """Checks if the wildcard limit is correctly passed and that
     users without autorization can not exploit it"""
 
     def test_wildcard_limit_correctly_passed_when_not_set(self):
         """websearch - wildcard limit is correctly passed when default"""
         self.assertEqual(search_pattern(p='e*', f='author'),
                          search_pattern(p='e*', f='author', wl=1000))
 
     def test_wildcard_limit_correctly_passed_when_set(self):
         """websearch - wildcard limit is correctly passed when set"""
         self.assertEqual([],
             test_web_page_content(CFG_SITE_URL + '/search?p=e*&f=author&of=id&wl=5&rg=100',
                                   expected_text="[9, 10, 11, 17, 46, 48, 50, 51, 52, 53, 54, 67, 72, 74, 81, 88, 92, 96]"))
 
     def test_wildcard_limit_correctly_not_active(self):
         """websearch - wildcard limit is not active when there is no wildcard query"""
         self.assertEqual(search_pattern(p='ellis', f='author'),
                          search_pattern(p='ellis', f='author', wl=1))
 
     def test_wildcard_limit_increased_by_authorized_users(self):
         """websearch - wildcard limit increased by authorized user"""
 
         browser = Browser()
 
         #try a search query, with no wildcard limit set by the user
         browser.open(make_url('/search?p=a*&of=id'))
         recid_list_guest_no_limit = browser.response().read() # so the limit is CGF_WEBSEARCH_WILDCARD_LIMIT
 
         #try a search query, with a wildcard limit imposed by the user
         #wl=1000000 - a very high limit,higher then what the CFG_WEBSEARCH_WILDCARD_LIMIT might be
         browser.open(make_url('/search?p=a*&of=id&wl=1000000'))
         recid_list_guest_with_limit = browser.response().read()
 
         #same results should be returned for a search without the wildcard limit set by the user
         #and for a search with a large limit set by the user
         #in this way we know that nomatter how large the limit is, the wildcard query will be
         #limitted by CFG_WEBSEARCH_WILDCARD_LIMIT (for a guest user)
         self.failIf(len(recid_list_guest_no_limit.split(',')) != len(recid_list_guest_with_limit.split(',')))
 
         ##login as admin
         browser.open(make_surl('/youraccount/login'))
         browser.select_form(nr=0)
         browser['p_un'] = 'admin'
         browser['p_pw'] = ''
         browser.submit()
 
         #try a search query, with a wildcard limit imposed by an authorized user
         #wl = 10000 a very high limit, higher then what the CFG_WEBSEARCH_WILDCARD_LIMIT might be
         browser.open(make_surl('/search?p=a*&of=id&wl=10000'))
         recid_list_authuser_with_limit = browser.response().read()
 
         #the authorized user can set whatever limit he might wish
         #so, the results returned for the auth. users should exceed the results returned for unauth. users
         self.failUnless(len(recid_list_guest_no_limit.split(',')) <= len(recid_list_authuser_with_limit.split(',')))
 
         #logout
         browser.open(make_surl('/youraccount/logout'))
         browser.response().read()
         browser.close()
 
 class WebSearchNearestTermsTest(unittest.TestCase):
     """Check various alternatives of searches leading to the nearest
     terms box."""
 
     def test_nearest_terms_box_in_okay_query(self):
         """ websearch - no nearest terms box for a successful query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis',
                                                expected_text="jump to record"))
 
     def test_nearest_terms_box_in_unsuccessful_simple_query(self):
         """ websearch - nearest terms box for unsuccessful simple query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellisz',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=embed",
                                                expected_link_label='embed'))
 
     def test_nearest_terms_box_in_unsuccessful_simple_accented_query(self):
         """ websearch - nearest terms box for unsuccessful accented query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=elliszà',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=embed",
                                                expected_link_label='embed'))
 
     def test_nearest_terms_box_in_unsuccessful_structured_query(self):
         """ websearch - nearest terms box for unsuccessful structured query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellisz&f=author',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=fabbro&f=author",
                                                expected_link_label='fabbro'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=author%3Aellisz',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=author%3Afabbro",
                                                expected_link_label='fabbro'))
 
     def test_nearest_terms_box_in_query_with_invalid_index(self):
         """ websearch - nearest terms box for queries with invalid indexes specified """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=bednarz%3Aellis',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=bednarz",
                                                expected_link_label='bednarz'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=1%3Aellis',
                                                expected_text="no index 1.",
                                                expected_link_target=CFG_SITE_URL+"/record/47?ln=en",
                                                expected_link_label="Detailed record"))
 
     def test_nearest_terms_box_in_unsuccessful_phrase_query(self):
         """ websearch - nearest terms box for unsuccessful phrase query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=author%3A%22Ellis%2C+Z%22',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=author%3A%22Enqvist%2C+K%22",
                                                expected_link_label='Enqvist, K'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%22ellisz%22&f=author',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=%22Enqvist%2C+K%22&f=author",
                                                expected_link_label='Enqvist, K'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%22elliszà%22&f=author',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=%22Enqvist%2C+K%22&f=author",
                                                expected_link_label='Enqvist, K'))
 
     def test_nearest_terms_box_in_unsuccessful_partial_phrase_query(self):
         """ websearch - nearest terms box for unsuccessful partial phrase query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=author%3A%27Ellis%2C+Z%27',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=author%3A%27Enqvist%2C+K%27",
                                                expected_link_label='Enqvist, K'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%27ellisz%27&f=author',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=%27Enqvist%2C+K%27&f=author",
                                                expected_link_label='Enqvist, K'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%27elliszà%27&f=author',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=%27Enqvist%2C+K%27&f=author",
                                                expected_link_label='Enqvist, K'))
 
     def test_nearest_terms_box_in_unsuccessful_partial_phrase_advanced_query(self):
         """ websearch - nearest terms box for unsuccessful partial phrase advanced search query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p1=aaa&f1=title&m1=p&as=1',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&f1=title&as=1&p1=A+simple+functional+form+for+proton-nucleus+total+reaction+cross+sections&m1=p",
                                                expected_link_label='A simple functional form for proton-nucleus total reaction cross sections'))
 
     def test_nearest_terms_box_in_unsuccessful_exact_phrase_advanced_query(self):
         """ websearch - nearest terms box for unsuccessful exact phrase advanced search query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p1=aaa&f1=title&m1=e&as=1',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&f1=title&as=1&p1=A+simple+functional+form+for+proton-nucleus+total+reaction+cross+sections&m1=e",
                                                expected_link_label='A simple functional form for proton-nucleus total reaction cross sections'))
 
     def test_nearest_terms_box_in_unsuccessful_boolean_query(self):
         """ websearch - nearest terms box for unsuccessful boolean query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=title%3Aellisz+author%3Aellisz',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=title%3Aenergi+author%3Aellisz",
                                                expected_link_label='energi'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=title%3Aenergi+author%3Aenergie',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=title%3Aenergi+author%3Aenqvist",
                                                expected_link_label='enqvist'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?ln=en&p=title%3Aellisz+author%3Aellisz&f=keyword',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=title%3Aenergi+author%3Aellisz&f=keyword",
                                                expected_link_label='energi'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?ln=en&p=title%3Aenergi+author%3Aenergie&f=keyword',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=title%3Aenergi+author%3Aenqvist&f=keyword",
                                                expected_link_label='enqvist'))
 
     def test_nearest_terms_box_in_unsuccessful_uppercase_query(self):
         """ websearch - nearest terms box for unsuccessful uppercase query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=fOo%3Atest',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=food",
                                                expected_link_label='food'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=arXiv%3A1007.5048',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=artist",
                                                expected_link_label='artist'))
 
     def test_nearest_terms_box_in_unsuccessful_spires_query(self):
         """ websearch - nearest terms box for unsuccessful spires query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?ln=en&p=find+a+foobar',
                                                expected_text="Nearest terms in any collection are",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=find+a+finch",
                                                expected_link_label='finch'))
 
 
 class WebSearchBooleanQueryTest(unittest.TestCase):
     """Check various boolean queries."""
 
     def test_successful_boolean_query(self):
         """ websearch - successful boolean query """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis+muon',
                                                expected_text="records found",
                                                expected_link_label="Detailed record"))
 
     def test_unsuccessful_boolean_query_where_all_individual_terms_match(self):
         """ websearch - unsuccessful boolean query where all individual terms match """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis+muon+letter',
                                                expected_text="Boolean query returned no hits. Please combine your search terms differently."))
 
     def test_unsuccessful_boolean_query_in_advanced_search_where_all_individual_terms_match(self):
         """ websearch - unsuccessful boolean query in advanced search where all individual terms match """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?m1=a&p1=ellis&op1=a&m2=a&p2=muon&op2=a&p3=letter',
                                                expected_text="Boolean query returned no hits. Please combine your search terms differently."))
 
 
 class WebSearchAuthorQueryTest(unittest.TestCase):
     """Check various author-related queries."""
 
     def test_propose_similar_author_names_box(self):
         """ websearch - propose similar author names box """
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=Ellis%2C+R&f=author',
                                                expected_text="See also: similar author names",
                                                expected_link_target=CFG_SITE_URL+"/search?ln=en&p=Ellis%2C+R+K&f=author",
                                                expected_link_label="Ellis, R K"))
 
     def test_do_not_propose_similar_author_names_box(self):
         """ websearch - do not propose similar author names box """
         errmsgs = test_web_page_content(CFG_SITE_URL + '/search?p=author%3A%22Ellis%2C+R%22',
                                         expected_link_target=CFG_SITE_URL+"/search?ln=en&p=Ellis%2C+R+K&f=author",
                                         expected_link_label="Ellis, R K")
         if errmsgs[0].find("does not contain link to") > -1:
             pass
         else:
             self.fail("Should not propose similar author names box.")
         return
 
 class WebSearchSearchEnginePythonAPITest(unittest.TestCase):
     """Check typical search engine Python API calls on the demo data."""
 
     def test_search_engine_python_api_for_failed_query(self):
         """websearch - search engine Python API for failed query"""
         self.assertEqual([],
                          perform_request_search(p='aoeuidhtns'))
 
     def test_search_engine_python_api_for_successful_query(self):
         """websearch - search engine Python API for successful query"""
         self.assertEqual([8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 47],
                          perform_request_search(p='ellis'))
 
     def test_search_engine_web_api_ignore_paging_parameter(self):
         """websearch - search engine Python API for successful query, ignore paging parameters"""
         self.assertEqual([8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 47],
                          perform_request_search(p='ellis', rg=5, jrec=3))
 
     def test_search_engine_web_api_respect_sorting_parameter(self):
         """websearch - search engine Python API for successful query, respect sorting parameters"""
         self.assertEqual([77, 84, 85],
                          perform_request_search(p='klebanov'))
         self.assertEqual([77, 85, 84],
                          perform_request_search(p='klebanov', sf='909C4v'))
 
     def test_search_engine_web_api_respect_ranking_parameter(self):
         """websearch - search engine Python API for successful query, respect ranking parameters"""
         self.assertEqual([77, 84, 85],
                          perform_request_search(p='klebanov'))
         self.assertEqual([85, 77, 84],
                          perform_request_search(p='klebanov', rm='citation'))
 
     def test_search_engine_python_api_for_existing_record(self):
         """websearch - search engine Python API for existing record"""
         self.assertEqual([8],
                          perform_request_search(recid=8))
 
     def test_search_engine_python_api_for_nonexisting_record(self):
         """websearch - search engine Python API for non-existing record"""
         self.assertEqual([],
                          perform_request_search(recid=16777215))
 
     def test_search_engine_python_api_for_nonexisting_collection(self):
         """websearch - search engine Python API for non-existing collection"""
         self.assertEqual([],
                          perform_request_search(c='Foo'))
 
     def test_search_engine_python_api_for_range_of_records(self):
         """websearch - search engine Python API for range of records"""
         self.assertEqual([1, 2, 3, 4, 5, 6, 7, 8, 9],
                          perform_request_search(recid=1, recidb=10))
 
     def test_search_engine_python_api_ranked_by_citation(self):
         """websearch - search engine Python API for citation ranking"""
         self.assertEqual([82, 83, 87, 89],
                 perform_request_search(p='recid:81', rm='citation'))
 
     def test_search_engine_python_api_textmarc(self):
         """websearch - search engine Python API for Text MARC output"""
         # we are testing example from /help/hacking/search-engine-api
         tmp = cStringIO.StringIO()
         perform_request_search(req=tmp, p='higgs', of='tm', ot=['100', '700'])
         out = tmp.getvalue()
         tmp.close()
         self.assertEqual(out, """\
 000000085 100__ $$aGirardello, L$$uINFN$$uUniversita di Milano-Bicocca
 000000085 700__ $$aPorrati, Massimo
 000000085 700__ $$aZaffaroni, A
 000000001 100__ $$aPhotolab
 """)
 
     def test_search_engine_python_api_for_intersect_results_with_one_collrec(self):
         """websearch - search engine Python API for intersect results with one collrec"""
         self.assertEqual({'Books & Reports': intbitset([19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])},
                          intersect_results_with_collrecs(None, intbitset(range(0,110)), ['Books & Reports'], 0, 'id', 0, 'en', False))
 
     def test_search_engine_python_api_for_intersect_results_with_several_collrecs(self):
         """websearch - search engine Python API for intersect results with several collrecs"""
         self.assertEqual({'Books': intbitset([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]),
                           'Reports': intbitset([19, 20]),
                           'Theses': intbitset([35, 36, 37, 38, 39, 40, 41, 42, 105])},
                          intersect_results_with_collrecs(None, intbitset(range(0,110)), ['Books', 'Theses', 'Reports'], 0, 'id', 0, 'en', False))
 
 class WebSearchSearchEngineWebAPITest(unittest.TestCase):
     """Check typical search engine Web API calls on the demo data."""
 
     def test_search_engine_web_api_for_failed_query(self):
         """websearch - search engine Web API for failed query"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=aoeuidhtns&of=id',
                                                expected_text="[]"))
 
 
     def test_search_engine_web_api_for_successful_query(self):
         """websearch - search engine Web API for successful query"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis&of=id',
                                                expected_text="[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 47]"))
 
     def test_search_engine_web_api_ignore_paging_parameter(self):
         """websearch - search engine Web API for successful query, ignore paging parameters"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis&of=id&rg=5&jrec=3',
                                                expected_text="[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 47]"))
 
     def test_search_engine_web_api_respect_sorting_parameter(self):
         """websearch - search engine Web API for successful query, respect sorting parameters"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=klebanov&of=id',
                                                expected_text="[84, 85]"))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=klebanov&of=id',
                                                username="admin",
                                                expected_text="[77, 84, 85]"))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=klebanov&of=id&sf=909C4v',
                                                expected_text="[85, 84]"))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=klebanov&of=id&sf=909C4v',
                                                username="admin",
                                                expected_text="[77, 85, 84]"))
 
     def test_search_engine_web_api_respect_ranking_parameter(self):
         """websearch - search engine Web API for successful query, respect ranking parameters"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=klebanov&of=id',
                                                expected_text="[84, 85]"))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=klebanov&of=id',
                                                username="admin",
                                                expected_text="[77, 84, 85]"))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=klebanov&of=id&rm=citation',
                                                expected_text="[85, 84]"))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=klebanov&of=id&rm=citation',
                                                username="admin",
                                                expected_text="[85, 77, 84]"))
 
     def test_search_engine_web_api_for_existing_record(self):
         """websearch - search engine Web API for existing record"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?recid=8&of=id',
                                                expected_text="[8]"))
 
     def test_search_engine_web_api_for_nonexisting_record(self):
         """websearch - search engine Web API for non-existing record"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?recid=123456789&of=id',
                                                expected_text="[]"))
 
     def test_search_engine_web_api_for_nonexisting_collection(self):
         """websearch - search engine Web API for non-existing collection"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?c=Foo&of=id',
                                                expected_text="[]"))
 
     def test_search_engine_web_api_for_range_of_records(self):
         """websearch - search engine Web API for range of records"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?recid=1&recidb=10&of=id',
                                                expected_text="[1, 2, 3, 4, 5, 6, 7, 8, 9]"))
 
 class WebSearchRestrictedCollectionTest(unittest.TestCase):
     """Test of the restricted collections behaviour."""
 
     def test_restricted_collection_interface_page(self):
         """websearch - restricted collection interface page body"""
         # there should be no Latest additions box for restricted collections
         self.assertNotEqual([],
                             test_web_page_content(CFG_SITE_URL + '/collection/Theses',
                                                   expected_text="Latest additions"))
 
     def test_restricted_search_as_anonymous_guest(self):
         """websearch - restricted collection not searchable by anonymous guest"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/search?c=Theses')
         response = browser.response().read()
         if response.find("If you think you have right to access it, please authenticate yourself.") > -1:
             pass
         else:
             self.fail("Oops, searching restricted collection without password should have redirected to login dialog.")
         return
 
     def test_restricted_search_as_authorized_person(self):
         """websearch - restricted collection searchable by authorized person"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/search?c=Theses')
         browser.select_form(nr=0)
         browser['p_un'] = 'jekyll'
         browser['p_pw'] = 'j123ekyll'
         browser.submit()
         if browser.response().read().find("records found") > -1:
             pass
         else:
             self.fail("Oops, Dr. Jekyll should be able to search Theses collection.")
 
     def test_restricted_search_as_unauthorized_person(self):
         """websearch - restricted collection not searchable by unauthorized person"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/search?c=Theses')
         browser.select_form(nr=0)
         browser['p_un'] = 'hyde'
         browser['p_pw'] = 'h123yde'
         browser.submit()
         # Mr. Hyde should not be able to connect:
         if browser.response().read().find("Authorization failure") <= -1:
             # if we got here, things are broken:
             self.fail("Oops, Mr.Hyde should not be able to search Theses collection.")
 
     def test_restricted_detailed_record_page_as_anonymous_guest(self):
         """websearch - restricted detailed record page not accessible to guests"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/%s/35' % CFG_SITE_RECORD)
         if browser.response().read().find("You can use your nickname or your email address to login.") > -1:
             pass
         else:
             self.fail("Oops, searching restricted collection without password should have redirected to login dialog.")
         return
 
     def test_restricted_detailed_record_page_as_authorized_person(self):
         """websearch - restricted detailed record page accessible to authorized person"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/youraccount/login')
         browser.select_form(nr=0)
         browser['p_un'] = 'jekyll'
         browser['p_pw'] = 'j123ekyll'
         browser.submit()
         browser.open(CFG_SITE_URL + '/%s/35' % CFG_SITE_RECORD)
         # Dr. Jekyll should be able to connect
         # (add the pw to the whole CFG_SITE_URL because we shall be
         # redirected to '/reordrestricted/'):
         if browser.response().read().find("A High-performance Video Browsing System") > -1:
             pass
         else:
             self.fail("Oops, Dr. Jekyll should be able to access restricted detailed record page.")
 
     def test_restricted_detailed_record_page_as_unauthorized_person(self):
         """websearch - restricted detailed record page not accessible to unauthorized person"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/youraccount/login')
         browser.select_form(nr=0)
         browser['p_un'] = 'hyde'
         browser['p_pw'] = 'h123yde'
         browser.submit()
         browser.open(CFG_SITE_URL + '/%s/35' % CFG_SITE_RECORD)
         # Mr. Hyde should not be able to connect:
         if browser.response().read().find('You are not authorized') <= -1:
             # if we got here, things are broken:
             self.fail("Oops, Mr.Hyde should not be able to access restricted detailed record page.")
 
     def test_collection_restricted_p(self):
         """websearch - collection_restricted_p"""
         self.failUnless(collection_restricted_p('Theses'), True)
         self.failIf(collection_restricted_p('Books & Reports'))
 
     def test_get_permitted_restricted_collections(self):
         """websearch - get_permitted_restricted_collections"""
         from invenio.webuser import get_uid_from_email, collect_user_info
         self.assertEqual(get_permitted_restricted_collections(collect_user_info(get_uid_from_email('jekyll@cds.cern.ch'))), ['Theses', 'Drafts'])
         self.assertEqual(get_permitted_restricted_collections(collect_user_info(get_uid_from_email('hyde@cds.cern.ch'))), [])
         self.assertEqual(get_permitted_restricted_collections(collect_user_info(get_uid_from_email('balthasar.montague@cds.cern.ch'))), ['ALEPH Theses', 'ALEPH Internal Notes', 'Atlantis Times Drafts'])
         self.assertEqual(get_permitted_restricted_collections(collect_user_info(get_uid_from_email('dorian.gray@cds.cern.ch'))), ['ISOLDE Internal Notes'])
 
     def test_restricted_record_has_restriction_flag(self):
         """websearch - restricted record displays a restriction flag"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/%s/42/files/' % CFG_SITE_RECORD)
         browser.select_form(nr=0)
         browser['p_un'] = 'jekyll'
         browser['p_pw'] = 'j123ekyll'
         browser.submit()
         if browser.response().read().find("Restricted") > -1:
             pass
         else:
             self.fail("Oops, a 'Restricted' flag should appear on restricted records.")
 
         browser.open(CFG_SITE_URL + '/%s/42/files/comments' % CFG_SITE_RECORD)
         if browser.response().read().find("Restricted") > -1:
             pass
         else:
             self.fail("Oops, a 'Restricted' flag should appear on restricted records.")
 
         # Flag also appear on records that exist both in a public and
         # restricted collection:
         error_messages = test_web_page_content(CFG_SITE_URL + '/%s/109' % CFG_SITE_RECORD,
                                                username='admin',
                                                password='',
                                                expected_text=['Restricted'])
         if error_messages:
             self.fail("Oops, a 'Restricted' flag should appear on restricted records.")
 
 
 class WebSearchRestrictedCollectionHandlingTest(unittest.TestCase):
     """
     Check how the restricted or restricted and "hidden" collection
     handling works: (i)user has or not rights to access to specific
     records or collections, (ii)public and restricted results are displayed
     in the right position in the collection tree, (iii)display the right
     warning depending on the case.
 
     Changes in the collection tree used for testing (are showed the records used for testing as well):
 
                   Articles & Preprints                                           Books & Reports
               _____________|________________                               ____________|_____________
               |        |          |        |                               |           |            |
           Articles   Drafts(r)  Notes   Preprints                         Books      Theses(r)    Reports
             69        77         109      10                                           105
             77        98                  98
            108       105
 
                                                       CERN Experiments
                                     _________________________|___________________________
                                     |                                                   |
                                   ALEPH                                              ISOLDE
                    _________________|_________________                      ____________|_____________
                    |                |                |                      |                        |
                  ALEPH            ALEPH            ALEPH                   ISOLDE                  ISOLDE
                 Papers       Internal Notes(r)    Theses(r)                Papers               Internal Notes(r&h)
                   10               109              105                      69                       110
                  108                                106
 
     Authorized users:
         jekyll -> Drafts, Theses
         balthasar -> ALEPH Internal Notes, ALEPH Theses
         dorian -> ISOLDE Internal Notes
     """
 
     def test_show_public_colls_in_warning_as_unauthorizad_user(self):
         """websearch - show public daugther collections in warning to unauthorized user"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=Articles+%26+Preprints&sc=1&p=recid:20',
                                                username='hyde',
                                                password='h123yde',
                                                expected_text=['No match found in collection <em>Articles, Preprints, Notes</em>.'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
 
     def test_show_public_and_restricted_colls_in_warning_as_authorized_user(self):
         """websearch - show public and restricted daugther collections in warning to authorized user"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=Articles+%26+Preprints&sc=1&p=recid:20',
                                                username='jekyll',
                                                password='j123ekyll',
                                                expected_text=['No match found in collection <em>Articles, Preprints, Notes, Drafts</em>.'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_restricted_record_in_different_colls_as_unauthorized_user(self):
         """websearch - record belongs to different restricted collections with different rights, user not has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?p=105&f=recid',
                                                username='hyde',
                                                password='h123yde',
                                                expected_text=['No public collection matched your query.'],
                                                unexpected_text=['records found'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_restricted_record_in_different_colls_as_authorized_user_of_one_coll(self):
         """websearch - record belongs to different restricted collections with different rights, balthasar has rights to one of them"""
         from invenio.config import CFG_WEBSEARCH_VIEWRESTRCOLL_POLICY
         policy = CFG_WEBSEARCH_VIEWRESTRCOLL_POLICY.strip().upper()
         if policy == 'ANY':
             error_messages = test_web_page_content(CFG_SITE_URL + '/search?&sc=1&p=recid:105&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts',
                                                    username='balthasar',
                                                    password='b123althasar',
                                                    expected_text=['[CERN-THESIS-99-074]'],
                                                    unexpected_text=['No public collection matched your query.'])
         else:
             error_messages = test_web_page_content(CFG_SITE_URL + '/search?&sc=1&p=recid:105&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts',
                                                    username='balthasar',
                                                    password='b123althasar',
                                                    expected_text=['No public collection matched your query.'],
                                                    unexpected_text=['[CERN-THESIS-99-074]'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
 
     def test_restricted_record_in_different_colls_as_authorized_user_of_two_colls(self):
         """websearch - record belongs to different restricted collections with different rights, jekyll has rights to two of them"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?&sc=1&p=recid:105&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts',
                                                username='jekyll',
                                                password='j123ekyll',
                                                expected_text=['Articles &amp; Preprints', 'Books &amp; Reports'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_restricted_record_in_different_colls_as_authorized_user_of_all_colls(self):
         """websearch - record belongs to different restricted collections with different rights, admin has rights to all of them"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?&sc=1&p=recid:105&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts',
                                                username='admin',
                                                expected_text=['Articles &amp; Preprints', 'Books &amp; Reports', 'ALEPH Theses'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_search_restricted_record_from_not_dad_coll(self):
         """websearch - record belongs to different restricted collections with different rights, search from a not dad collection"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=Multimedia+%26+Arts&sc=1&p=recid%3A105&f=&action_search=Search&c=Pictures&c=Poetry&c=Atlantis+Times',
                                                username='admin',
                                                expected_text='No match found in collection',
                                                expected_link_label='1 hits')
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_public_and_restricted_record_as_unauthorized_user(self):
         """websearch - record belongs to different public and restricted collections, user not has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?&sc=1&p=geometry&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts&of=id',
                                                username='guest',
                                                expected_text='[80, 86]',
                                                unexpected_text='[40, 80, 86]')
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_public_and_restricted_record_as_authorized_user(self):
         """websearch - record belongs to different public and restricted collections, admin has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?&sc=1&p=geometry&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts&of=id',
                                                username='admin',
                                                password='',
                                                expected_text='[40, 80, 86]')
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_public_and_restricted_record_of_focus_as_unauthorized_user(self):
         """websearch - record belongs to both a public and a restricted collection of "focus on", user not has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=Articles+%26+Preprints&sc=1&p=109&f=recid',
                                                username='hyde',
                                                password='h123yde',
                                                expected_text=['No public collection matched your query'],
                                                unexpected_text=['LEP Center-of-Mass Energies in Presence of Opposite'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_public_and_restricted_record_of_focus_as_authorized_user(self):
         """websearch - record belongs to both a public and a restricted collection of "focus on", user has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?&sc=1&p=109&f=recid&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts',
                                                username='balthasar',
                                                password='b123althasar',
                                                expected_text=['Articles &amp; Preprints', 'ALEPH Internal Notes', 'LEP Center-of-Mass Energies in Presence of Opposite'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_search_public_and_restricted_record_from_not_dad_coll_as_authorized_user(self):
         """websearch - record belongs to both a public and a restricted collection, search from a not dad collection, admin has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=Books+%26+Reports&sc=1&p=recid%3A98&f=&action_search=Search&c=Books&c=Reports',
                                                username='admin',
                                                password='',
                                                expected_text='No match found in collection <em>Books, Theses, Reports</em>',
                                                expected_link_label='1 hits')
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_search_public_and_restricted_record_from_not_dad_coll_as_unauthorized_user(self):
         """websearch - record belongs to both a public and a restricted collection, search from a not dad collection, hyde not has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=Books+%26+Reports&sc=1&p=recid%3A98&f=&action_search=Search&c=Books&c=Reports',
                                                username='hyde',
                                                password='h123yde',
                                                expected_text='No public collection matched your query',
                                                unexpected_text='No match found in collection')
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_restricted_record_of_focus_as_authorized_user(self):
         """websearch - record belongs to a restricted collection of "focus on", balthasar has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?&sc=1&p=106&f=recid&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts&of=id',
                                                username='balthasar',
                                                password='b123althasar',
                                                expected_text='[106]',
                                                unexpected_text='[]')
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_display_dad_coll_of_restricted_coll_as_unauthorized_user(self):
         """websearch - unauthorized user displays a collection that contains a restricted collection"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=Articles+%26+Preprints&sc=1&p=&f=&action_search=Search&c=Articles&c=Drafts&c=Preprints',
                                                username='guest',
                                                expected_text=['This collection is restricted.'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_display_dad_coll_of_restricted_coll_as_authorized_user(self):
         """websearch - authorized user displays a collection that contains a restricted collection"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=Articles+%26+Preprints&sc=1&p=&f=&action_search=Search&c=Articles&c=Drafts&c=Notes&c=Preprints',
                                                username='jekyll',
                                                password='j123ekyll',
                                                expected_text=['Articles', 'Drafts', 'Notes', 'Preprints'],
                                                unexpected_text=['This collection is restricted.'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_search_restricted_record_from_coll_of_focus_as_unauthorized_user(self):
         """websearch - search for a record that belongs to a restricted collection from a collection of "focus on" , jekyll not has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=CERN+Divisions&sc=1&p=recid%3A106&f=&action_search=Search&c=Experimental+Physics+(EP)&c=Theoretical+Physics+(TH)',
                                                username='jekyll',
                                                password='j123ekyll',
                                                expected_text=['No public collection matched your query.'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_search_restricted_record_from_coll_of_focus_as_authorized_user(self):
         """websearch - search for a record that belongs to a restricted collection from a collection of "focus on" , admin has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=CERN+Divisions&sc=1&p=recid%3A106&f=&action_search=Search&c=Experimental+Physics+(EP)&c=Theoretical+Physics+(TH)',
                                                username='admin',
                                                password='',
                                                expected_text='No match found in collection <em>Experimental Physics (EP), Theoretical Physics (TH)</em>.',
                                                expected_link_label='1 hits')
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_search_restricted_record_from_not_direct_dad_coll_and_display_in_right_position_in_tree(self):
         """websearch - search for a restricted record from not direct dad collection and display it on its right position in the tree"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&sc=1&p=recid%3A40&f=&action_search=Search&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts',
                                                username='admin',
                                                password='',
                                                expected_text=['Books &amp; Reports','[LBL-22304]'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_search_restricted_record_from_direct_dad_coll_and_display_in_right_position_in_tree(self):
         """websearch - search for a restricted record from the direct dad collection and display it on its right position in the tree"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=Books+%26+Reports&sc=1&p=recid%3A40&f=&action_search=Search&c=Books&c=Reports',
                                                username='admin',
                                                password='',
                                                expected_text=['Theses',  '[LBL-22304]'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_restricted_and_hidden_record_as_unauthorized_user(self):
         """websearch - search for a "hidden" record, user not has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&sc=1&p=recid%3A110&f=&action_search=Search&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts',
                                                username='guest',
                                                expected_text=['If you were looking for a non-public document'],
                                                unexpected_text=['If you were looking for a hidden document'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_restricted_and_hidden_record_as_authorized_user(self):
         """websearch - search for a "hidden" record, admin has rights"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&sc=1&p=recid%3A110&f=&action_search=Search&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts',
                                                username='admin',
                                                password='',
                                                expected_text=['If you were looking for a hidden document, please type the correct URL for this record.'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_enter_url_of_restricted_and_hidden_coll_as_unauthorized_user(self):
         """websearch - unauthorized user types the concret URL of a "hidden" collection"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=ISOLDE+Internal+Notes&sc=1&p=&f=&action_search=Search',
                                                username='guest',
                                                expected_text=['This collection is restricted.'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_enter_url_of_restricted_and_hidden_coll_as_authorized_user(self):
         """websearch - authorized user types the concret URL of a "hidden" collection"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=ISOLDE+Internal+Notes&sc=1&p=&f=&action_search=Search',
                                                username='dorian',
                                                password='d123orian',
                                                expected_text=['ISOLDE Internal Notes', '[CERN-PS-PA-Note-93-04]'],
                                                unexpected_text=['This collection is restricted.'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_search_for_pattern_from_the_top_as_unauthorized_user(self):
         """websearch - unauthorized user searches for a pattern from the top"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&sc=1&p=of&f=&action_search=Search&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts',
                                                username='guest',
                                                expected_text=['Articles &amp; Preprints', '61', 'records found',
                                                               'Books &amp; Reports', '2', 'records found',
                                                               'Multimedia &amp; Arts', '14', 'records found'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_search_for_pattern_from_the_top_as_authorized_user(self):
         """websearch - authorized user searches for a pattern from the top"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&sc=1&p=of&f=&action_search=Search&c=Articles+%26+Preprints&c=Books+%26+Reports&c=Multimedia+%26+Arts',
                                                username='admin',
                                                password='',
                                                expected_text=['Articles &amp; Preprints', '61', 'records found',
                                                               'Books &amp; Reports', '6', 'records found',
                                                               'Multimedia &amp; Arts', '14', 'records found',
                                                               'ALEPH Theses', '1', 'records found',
                                                               'ALEPH Internal Notes', '1', 'records found'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_search_for_pattern_from_an_specific_coll_as_unauthorized_user(self):
         """websearch - unauthorized user searches for a pattern from one specific collection"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=Books+%26+Reports&sc=1&p=of&f=&action_search=Search&c=Books&c=Reports',
                                                username='guest',
                                                expected_text=['Books', '1', 'records found',
                                                               'Reports', '1', 'records found'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_search_for_pattern_from_an_specific_coll_as_authorized_user(self):
         """websearch - authorized user searches for a pattern from one specific collection"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/search?ln=en&cc=Books+%26+Reports&sc=1&p=of&f=&action_search=Search&c=Books&c=Reports',
                                                username='admin',
                                                password='',
                                                expected_text=['Books', '1', 'records found',
                                                               'Reports', '1', 'records found',
                                                               'Theses', '4', 'records found'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
 
 class WebSearchRestrictedPicturesTest(unittest.TestCase):
     """
     Check whether restricted pictures on the demo site can be accessed
     well by people who have rights to access them.
     """
 
     def test_restricted_pictures_guest(self):
         """websearch - restricted pictures not available to guest"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/%s/1/files/0106015_01.jpg' % CFG_SITE_RECORD,
                                                expected_text=['This file is restricted.  If you think you have right to access it, please authenticate yourself.'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_restricted_pictures_romeo(self):
         """websearch - restricted pictures available to Romeo"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/%s/1/files/0106015_01.jpg' % CFG_SITE_RECORD,
                                                username='romeo',
                                                password='r123omeo',
                                                expected_text=[],
                                                unexpected_text=['This file is restricted',
                                                                 'You are not authorized'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_restricted_pictures_hyde(self):
         """websearch - restricted pictures not available to Mr. Hyde"""
 
         error_messages = test_web_page_content(CFG_SITE_URL + '/%s/1/files/0106015_01.jpg' % CFG_SITE_RECORD,
                                                username='hyde',
                                                password='h123yde',
                                                expected_text=['This file is restricted',
                                                               'You are not authorized'])
         if error_messages:
             self.failUnless("HTTP Error 401: Unauthorized" in merge_error_messages(error_messages))
 
 class WebSearchRestrictedWebJournalFilesTest(unittest.TestCase):
     """
     Check whether files attached to a WebJournal article are well
     accessible when the article is published
     """
     def test_restricted_files_guest(self):
         """websearch - files of unreleased articles are not available to guest"""
 
         # Record is not public...
         self.assertEqual(record_public_p(112), False)
 
         # ... and guest cannot access attached files
         error_messages = test_web_page_content(CFG_SITE_URL + '/%s/112/files/journal_galapagos_archipelago.jpg' % CFG_SITE_RECORD,
                                                expected_text=['This file is restricted.  If you think you have right to access it, please authenticate yourself.'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_restricted_files_editor(self):
         """websearch - files of unreleased articles are available to editor"""
 
         # Record is not public...
         self.assertEqual(record_public_p(112), False)
 
         # ... but editor can access attached files
         error_messages = test_web_page_content(CFG_SITE_URL + '/%s/112/files/journal_galapagos_archipelago.jpg' % CFG_SITE_RECORD,
                                                username='balthasar',
                                                password='b123althasar',
                                                expected_text=[],
                                                unexpected_text=['This file is restricted',
                                                                 'You are not authorized'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_public_files_guest(self):
         """websearch - files of released articles are available to guest"""
 
         # Record is not public...
         self.assertEqual(record_public_p(111), False)
 
         # ... but user can access attached files, as article is released
         error_messages = test_web_page_content(CFG_SITE_URL + '/%s/111/files/journal_scissor_beak.jpg' % CFG_SITE_RECORD,
                                                expected_text=[],
                                                 unexpected_text=['This file is restricted',
                                                                  'You are not authorized'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_really_restricted_files_guest(self):
         """websearch - restricted files of released articles are not available to guest"""
 
         # Record is not public...
         self.assertEqual(record_public_p(111), False)
 
         # ... and user cannot access restricted attachements, even if
         # article is released
         error_messages = test_web_page_content(CFG_SITE_URL + '/%s/111/files/restricted-journal_scissor_beak.jpg' % CFG_SITE_RECORD,
                                                expected_text=['This file is restricted.  If you think you have right to access it, please authenticate yourself.'])
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_restricted_picture_has_restriction_flag(self):
         """websearch - restricted files displays a restriction flag"""
         error_messages = test_web_page_content(CFG_SITE_URL + '/%s/1/files/' % CFG_SITE_RECORD,
                                                   expected_text="Restricted")
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
 class WebSearchRSSFeedServiceTest(unittest.TestCase):
     """Test of the RSS feed service."""
 
     def test_rss_feed_service(self):
         """websearch - RSS feed service"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/rss',
                                                expected_text='<rss version="2.0"'))
 
 class WebSearchXSSVulnerabilityTest(unittest.TestCase):
     """Test possible XSS vulnerabilities of the search engine."""
 
     def test_xss_in_collection_interface_page(self):
         """websearch - no XSS vulnerability in collection interface pages"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/?c=%3CSCRIPT%3Ealert%28%22XSS%22%29%3B%3C%2FSCRIPT%3E',
                                                expected_text='Collection &amp;lt;SCRIPT&amp;gt;alert("XSS");&amp;lt;/SCRIPT&amp;gt; Not Found'))
 
     def test_xss_in_collection_search_page(self):
         """websearch - no XSS vulnerability in collection search pages"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?c=%3CSCRIPT%3Ealert%28%22XSS%22%29%3B%3C%2FSCRIPT%3E',
                                                expected_text='Collection &lt;SCRIPT&gt;alert("XSS");&lt;/SCRIPT&gt; Not Found'))
 
     def test_xss_in_simple_search(self):
         """websearch - no XSS vulnerability in simple search"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%3CSCRIPT%3Ealert%28%22XSS%22%29%3B%3C%2FSCRIPT%3E',
                                                expected_text='Search term <em>&lt;SCRIPT&gt;alert("XSS");&lt;/SCRIPT&gt;</em> did not match any record.'))
 
     def test_xss_in_structured_search(self):
         """websearch - no XSS vulnerability in structured search"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%3CSCRIPT%3Ealert%28%22XSS%22%29%3B%3C%2FSCRIPT%3E&f=%3CSCRIPT%3Ealert%28%22XSS%22%29%3B%3C%2FSCRIPT%3E',
                                                expected_text='No word index is available for <em>&lt;script&gt;alert("xss");&lt;/script&gt;</em>.'))
 
     def test_xss_in_advanced_search(self):
         """websearch - no XSS vulnerability in advanced search"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?as=1&p1=ellis&f1=author&op1=a&p2=%3CSCRIPT%3Ealert%28%22XSS%22%29%3B%3C%2FSCRIPT%3E&f2=%3CSCRIPT%3Ealert%28%22XSS%22%29%3B%3C%2FSCRIPT%3E&m2=e',
                                                expected_text='Search term <em>&lt;SCRIPT&gt;alert("XSS");&lt;/SCRIPT&gt;</em> inside index <em>&lt;script&gt;alert("xss");&lt;/script&gt;</em> did not match any record.'))
 
     def test_xss_in_browse(self):
         """websearch - no XSS vulnerability in browse"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%3CSCRIPT%3Ealert%28%22XSS%22%29%3B%3C%2FSCRIPT%3E&f=%3CSCRIPT%3Ealert%28%22XSS%22%29%3B%3C%2FSCRIPT%3E&action_browse=Browse',
                                                expected_text='&lt;SCRIPT&gt;alert("XSS");&lt;/SCRIPT&gt;'))
 
 class WebSearchResultsOverview(unittest.TestCase):
     """Test of the search results page's Results overview box and links."""
 
     def test_results_overview_split_off(self):
         """websearch - results overview box when split by collection is off"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/search?p=of&sc=0')
         body = browser.response().read()
         if body.find("Results overview") > -1:
             self.fail("Oops, when split by collection is off, "
                       "results overview should not be present.")
         if body.find('<a name="1"></a>') == -1:
             self.fail("Oops, when split by collection is off, "
                       "Atlantis collection should be found.")
         if body.find('<a name="15"></a>') > -1:
             self.fail("Oops, when split by collection is off, "
                       "Multimedia & Arts should not be found.")
         try:
             browser.find_link(url='#15')
             self.fail("Oops, when split by collection is off, "
                       "a link to Multimedia & Arts should not be found.")
         except LinkNotFoundError:
             pass
 
     def test_results_overview_split_on(self):
         """websearch - results overview box when split by collection is on"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/search?p=of&sc=1')
         body = browser.response().read()
         if body.find("Results overview") == -1:
             self.fail("Oops, when split by collection is on, "
                       "results overview should be present.")
         if body.find('<a name="Atlantis%20Institute%20of%20Fictive%20Science"></a>') > -1:
             self.fail("Oops, when split by collection is on, "
                       "Atlantis collection should not be found.")
         if body.find('<a name="15"></a>') == -1:
             self.fail("Oops, when split by collection is on, "
                       "Multimedia & Arts should be found.")
         try:
             browser.find_link(url='#15')
         except LinkNotFoundError:
             self.fail("Oops, when split by collection is on, "
                       "a link to Multimedia & Arts should be found.")
 
 class WebSearchSortResultsTest(unittest.TestCase):
     """Test of the search results page's sorting capability."""
 
     def test_sort_results_default(self):
         """websearch - search results sorting, default method"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=of&f=title&rg=1',
                                                expected_text="CMS animation of the high-energy collisions"))
 
     def test_sort_results_ascending(self):
         """websearch - search results sorting, ascending field"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=of&f=title&rg=2&sf=reportnumber&so=a',
                                                expected_text="[astro-ph/0104076]"))
 
     def test_sort_results_descending(self):
         """websearch - search results sorting, descending field"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=of&f=title&rg=1&sf=reportnumber&so=d',
                                                expected_text=" [TESLA-FEL-99-07]"))
 
     def test_sort_results_sort_pattern(self):
         """websearch - search results sorting, preferential sort pattern"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=of&f=title&rg=1&sf=reportnumber&so=d&sp=cern',
                                                expected_text="[CERN-TH-2002-069]"))
 
 class WebSearchSearchResultsXML(unittest.TestCase):
     """Test search results in various output"""
 
     def test_search_results_xm_output_split_on(self):
         """ websearch - check document element of search results in xm output (split by collection on)"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/search?sc=1&of=xm')
         body = browser.response().read()
 
         num_doc_element = body.count("<collection "
                                      "xmlns=\"http://www.loc.gov/MARC21/slim\">")
         if num_doc_element == 0:
             self.fail("Oops, no document element <collection "
                       "xmlns=\"http://www.loc.gov/MARC21/slim\">"
                       "found in search results.")
         elif num_doc_element > 1:
             self.fail("Oops, multiple document elements <collection> "
                       "found in search results.")
 
         num_doc_element = body.count("</collection>")
         if num_doc_element == 0:
             self.fail("Oops, no document element </collection> "
                       "found in search results.")
         elif num_doc_element > 1:
             self.fail("Oops, multiple document elements </collection> "
                       "found in search results.")
 
 
     def test_search_results_xm_output_split_off(self):
         """ websearch - check document element of search results in xm output (split by collection off)"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/search?sc=0&of=xm')
         body = browser.response().read()
 
         num_doc_element = body.count("<collection "
                                      "xmlns=\"http://www.loc.gov/MARC21/slim\">")
         if num_doc_element == 0:
             self.fail("Oops, no document element <collection "
                       "xmlns=\"http://www.loc.gov/MARC21/slim\">"
                       "found in search results.")
         elif num_doc_element > 1:
             self.fail("Oops, multiple document elements <collection> "
                       "found in search results.")
 
         num_doc_element = body.count("</collection>")
         if num_doc_element == 0:
             self.fail("Oops, no document element </collection> "
                       "found in search results.")
         elif num_doc_element > 1:
             self.fail("Oops, multiple document elements </collection> "
                       "found in search results.")
 
     def test_search_results_xd_output_split_on(self):
         """ websearch - check document element of search results in xd output (split by collection on)"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/search?sc=1&of=xd')
         body = browser.response().read()
 
         num_doc_element = body.count("<collection")
         if num_doc_element == 0:
             self.fail("Oops, no document element <collection "
                       "xmlns=\"http://www.loc.gov/MARC21/slim\">"
                       "found in search results.")
         elif num_doc_element > 1:
             self.fail("Oops, multiple document elements <collection> "
                       "found in search results.")
 
         num_doc_element = body.count("</collection>")
         if num_doc_element == 0:
             self.fail("Oops, no document element </collection> "
                       "found in search results.")
         elif num_doc_element > 1:
             self.fail("Oops, multiple document elements </collection> "
                       "found in search results.")
 
 
     def test_search_results_xd_output_split_off(self):
         """ websearch - check document element of search results in xd output (split by collection off)"""
         browser = Browser()
         browser.open(CFG_SITE_URL + '/search?sc=0&of=xd')
         body = browser.response().read()
 
         num_doc_element = body.count("<collection>")
         if num_doc_element == 0:
             self.fail("Oops, no document element <collection "
                       "xmlns=\"http://www.loc.gov/MARC21/slim\">"
                       "found in search results.")
         elif num_doc_element > 1:
             self.fail("Oops, multiple document elements <collection> "
                       "found in search results.")
 
         num_doc_element = body.count("</collection>")
         if num_doc_element == 0:
             self.fail("Oops, no document element </collection> "
                       "found in search results.")
         elif num_doc_element > 1:
             self.fail("Oops, multiple document elements </collection> "
                       "found in search results.")
 
 class WebSearchUnicodeQueryTest(unittest.TestCase):
     """Test of the search results for queries containing Unicode characters."""
 
     def test_unicode_word_query(self):
         """websearch - Unicode word query"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=title%3A%CE%99%CE%B8%CE%AC%CE%BA%CE%B7',
                                                expected_text="[76]"))
 
     def test_unicode_word_query_not_found_term(self):
         """websearch - Unicode word query, not found term"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=title%3A%CE%99%CE%B8',
                                                expected_text="ιθάκη"))
 
     def test_unicode_exact_phrase_query(self):
         """websearch - Unicode exact phrase query"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=title%3A%22%CE%99%CE%B8%CE%AC%CE%BA%CE%B7%22',
                                                expected_text="[76]"))
 
     def test_unicode_partial_phrase_query(self):
         """websearch - Unicode partial phrase query"""
         # no hit here for example title partial phrase query due to
         # removed difference between double-quoted and single-quoted
         # search:
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=title%3A%27%CE%B7%27',
                                                expected_text="[]"))
 
     def test_unicode_regexp_query(self):
         """websearch - Unicode regexp query"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=title%3A%2F%CE%B7%2F',
                                                expected_text="[76]"))
 
 class WebSearchMARCQueryTest(unittest.TestCase):
     """Test of the search results for queries containing physical MARC tags."""
 
     def test_single_marc_tag_exact_phrase_query(self):
         """websearch - single MARC tag, exact phrase query (100__a)"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=100__a%3A%22Ellis%2C+J%22',
                                                expected_text="[9, 14, 18]"))
 
     def test_single_marc_tag_partial_phrase_query(self):
         """websearch - single MARC tag, partial phrase query (245__b)"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=245__b%3A%27and%27',
                                                expected_text="[28]"))
 
     def test_many_marc_tags_partial_phrase_query(self):
         """websearch - many MARC tags, partial phrase query (245)"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=245%3A%27and%27&rg=100',
                                                expected_text="[1, 8, 9, 14, 15, 20, 22, 24, 28, 33, 47, 48, 49, 51, 53, 64, 69, 71, 79, 82, 83, 85, 91, 96, 108]"))
 
     def test_single_marc_tag_regexp_query(self):
         """websearch - single MARC tag, regexp query"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=245%3A%2Fand%2F&rg=100',
                                                expected_text="[1, 8, 9, 14, 15, 20, 22, 24, 28, 33, 47, 48, 49, 51, 53, 64, 69, 71, 79, 82, 83, 85, 91, 96, 108]"))
 
 class WebSearchExtSysnoQueryTest(unittest.TestCase):
     """Test of queries using external system numbers."""
 
     def test_existing_sysno_html_output(self):
         """websearch - external sysno query, existing sysno, HTML output"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?sysno=000289446CER',
                                                expected_text="The wall of the cave"))
 
     def test_existing_sysno_id_output(self):
         """websearch - external sysno query, existing sysno, ID output"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?sysno=000289446CER&of=id',
                                                expected_text="[95]"))
 
     def test_nonexisting_sysno_html_output(self):
         """websearch - external sysno query, non-existing sysno, HTML output"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?sysno=000289446CERRRR',
                                                expected_text="Requested record does not seem to exist."))
 
     def test_nonexisting_sysno_id_output(self):
         """websearch - external sysno query, non-existing sysno, ID output"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?sysno=000289446CERRRR&of=id',
                                                expected_text="[]"))
 
 class WebSearchResultsRecordGroupingTest(unittest.TestCase):
     """Test search results page record grouping (rg)."""
 
     def test_search_results_rg_guest(self):
         """websearch - search results, records in groups of, guest"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?rg=17',
                                                expected_text="1 - 17"))
 
     def test_search_results_rg_nonguest(self):
         """websearch - search results, records in groups of, non-guest"""
         # This test used to fail due to saved user preference fetching
         # not overridden by URL rg argument.
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?rg=17',
                                                username='admin',
                                                expected_text="1 - 17"))
 
 class WebSearchSpecialTermsQueryTest(unittest.TestCase):
     """Test of the search results for queries containing special terms."""
 
     def test_special_terms_u1(self):
         """websearch - query for special terms, U(1)"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=U%281%29',
                                                expected_text="[57, 79, 80, 88]"))
 
     def test_special_terms_u1_and_sl(self):
         """websearch - query for special terms, U(1) SL(2,Z)"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=U%281%29+SL%282%2CZ%29',
                                                expected_text="[88]"))
 
     def test_special_terms_u1_and_sl_or(self):
         """websearch - query for special terms, U(1) OR SL(2,Z)"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=U%281%29+OR+SL%282%2CZ%29',
                                                expected_text="[57, 79, 80, 88]"))
 
     @nottest
     def FIXME_TICKET_453_test_special_terms_u1_and_sl_or_parens(self):
         """websearch - query for special terms, (U(1) OR SL(2,Z))"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=%28U%281%29+OR+SL%282%2CZ%29%29',
                                                expected_text="[57, 79, 80, 88]"))
 
     def test_special_terms_u1_and_sl_in_quotes(self):
         """websearch - query for special terms, ('SL(2,Z)' OR 'U(1)')"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + "/search?of=id&p=%28%27SL%282%2CZ%29%27+OR+%27U%281%29%27%29",
                                                expected_text="[57, 79, 80, 88, 96]"))
 
 
 class WebSearchJournalQueryTest(unittest.TestCase):
     """Test of the search results for journal pubinfo queries."""
 
     def test_query_journal_title_only(self):
         """websearch - journal publication info query, title only"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&f=journal&p=Phys.+Lett.+B',
                                                expected_text="[78, 85, 87]"))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&f=journal&p=Phys.+Lett.+B',
                                                username='admin',
                                                expected_text="[77, 78, 85, 87]"))
 
     def test_query_journal_full_pubinfo(self):
         """websearch - journal publication info query, full reference"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&f=journal&p=Phys.+Lett.+B+531+%282002%29+301',
                                                expected_text="[78]"))
 
 class WebSearchStemmedIndexQueryTest(unittest.TestCase):
     """Test of the search results for queries using stemmed indexes."""
 
     def test_query_stemmed_lowercase(self):
         """websearch - stemmed index query, lowercase"""
         # note that dasse/Dasse is stemmed into dass/Dass, as expected
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=dasse',
                                                expected_text="[25, 26]"))
 
     def test_query_stemmed_uppercase(self):
         """websearch - stemmed index query, uppercase"""
         # ... but note also that DASSE is stemmed into DASSE(!); so
         # the test would fail if the search engine would not lower the
         # query term.  (Something that is not necessary for
         # non-stemmed indexes.)
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?of=id&p=DASSE',
                                                expected_text="[25, 26]"))
 
 class WebSearchSummarizerTest(unittest.TestCase):
     """Test of the search results summarizer functions."""
 
     def test_most_popular_field_values_singletag(self):
         """websearch - most popular field values, simple tag"""
         from invenio.search_engine import get_most_popular_field_values
         self.assertEqual([('PREPRINT', 37), ('ARTICLE', 28), ('BOOK', 14), ('THESIS', 8), ('PICTURE', 7),
                          ('DRAFT', 2), ('POETRY', 2), ('REPORT', 2), ('ALEPHPAPER', 1), ('ATLANTISTIMESNEWS', 1),
                          ('ISOLDEPAPER', 1)],
                          get_most_popular_field_values(range(0,100), '980__a'))
 
     def test_most_popular_field_values_singletag_multiexclusion(self):
         """websearch - most popular field values, simple tag, multiple exclusions"""
         from invenio.search_engine import get_most_popular_field_values
         self.assertEqual([('PREPRINT', 37), ('ARTICLE', 28), ('BOOK', 14), ('DRAFT', 2), ('REPORT', 2),
                           ('ALEPHPAPER', 1), ('ATLANTISTIMESNEWS', 1), ('ISOLDEPAPER', 1)],
                          get_most_popular_field_values(range(0,100), '980__a', ('THESIS', 'PICTURE', 'POETRY')))
 
     def test_most_popular_field_values_multitag(self):
         """websearch - most popular field values, multiple tags"""
         from invenio.search_engine import get_most_popular_field_values
         self.assertEqual([('Ellis, J', 3), ('Enqvist, K', 1), ('Ibanez, L E', 1), ('Nanopoulos, D V', 1), ('Ross, G G', 1)],
                          get_most_popular_field_values((9, 14, 18), ('100__a', '700__a')))
 
     def test_most_popular_field_values_multitag_singleexclusion(self):
         """websearch - most popular field values, multiple tags, single exclusion"""
         from invenio.search_engine import get_most_popular_field_values
         self.assertEqual([('Enqvist, K', 1), ('Ibanez, L E', 1), ('Nanopoulos, D V', 1), ('Ross, G G', 1)],
                          get_most_popular_field_values((9, 14, 18), ('100__a', '700__a'), ('Ellis, J')))
 
     def test_most_popular_field_values_multitag_countrepetitive(self):
         """websearch - most popular field values, multiple tags, counting repetitive occurrences"""
         from invenio.search_engine import get_most_popular_field_values
         self.assertEqual([('THESIS', 2), ('REPORT', 1)],
                          get_most_popular_field_values((41,), ('690C_a', '980__a'), count_repetitive_values=True))
         self.assertEqual([('REPORT', 1), ('THESIS', 1)],
                          get_most_popular_field_values((41,), ('690C_a', '980__a'), count_repetitive_values=False))
 
     def test_ellis_citation_summary(self):
         """websearch - query ellis, citation summary output format"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis&of=hcs',
                                                expected_text="Less known papers (1-9)",
                                                expected_link_target=CFG_SITE_URL+"/search?p=ellis%20AND%20cited%3A1-%3E9",
                                                expected_link_label='1'))
 
     def test_ellis_not_quark_citation_summary_advanced(self):
         """websearch - ellis and not quark, citation summary format advanced"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?ln=en&as=1&m1=a&p1=ellis&f1=author&op1=n&m2=a&p2=quark&f2=&op2=a&m3=a&p3=&f3=&action_search=Search&sf=&so=a&rm=&rg=10&sc=1&of=hcs',
                                                expected_text="Less known papers (1-9)",
                                                expected_link_target=CFG_SITE_URL+'/search?p=author%3Aellis%20and%20not%20quark%20AND%20cited%3A1-%3E9',
                                                expected_link_label='1'))
 
     def test_ellis_not_quark_citation_summary_regular(self):
         """websearch - ellis and not quark, citation summary format advanced"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?ln=en&p=author%3Aellis+and+not+quark&f=&action_search=Search&sf=&so=d&rm=&rg=10&sc=0&of=hcs',
                                                expected_text="Less known papers (1-9)",
                                                expected_link_target=CFG_SITE_URL+'/search?p=author%3Aellis%20and%20not%20quark%20AND%20cited%3A1-%3E9',
                                                expected_link_label='1'))
 
 
 class WebSearchRecordCollectionGuessTest(unittest.TestCase):
     """Primary collection guessing tests."""
 
     def test_guess_primary_collection_of_a_record(self):
         """websearch - guess_primary_collection_of_a_record"""
         self.assertEqual(guess_primary_collection_of_a_record(96), 'Articles')
 
     def test_guess_collection_of_a_record(self):
         """websearch - guess_collection_of_a_record"""
         self.assertEqual(guess_collection_of_a_record(96), 'Articles')
         self.assertEqual(guess_collection_of_a_record(96, '%s/collection/Theoretical Physics (TH)?ln=en' % CFG_SITE_URL), 'Articles')
         self.assertEqual(guess_collection_of_a_record(12, '%s/collection/Theoretical Physics (TH)?ln=en' % CFG_SITE_URL), 'Theoretical Physics (TH)')
         self.assertEqual(guess_collection_of_a_record(12, '%s/collection/Theoretical%%20Physics%%20%%28TH%%29?ln=en' % CFG_SITE_URL), 'Theoretical Physics (TH)')
 
 class WebSearchGetFieldValuesTest(unittest.TestCase):
     """Testing get_fieldvalues() function."""
 
     def test_get_fieldvalues_001(self):
         """websearch - get_fieldvalues() for bibxxx-agnostic tags"""
         self.assertEqual(get_fieldvalues(10, '001___'), ['10'])
 
     def test_get_fieldvalues_980(self):
         """websearch - get_fieldvalues() for bibxxx-powered tags"""
         self.assertEqual(get_fieldvalues(18, '700__a'), ['Enqvist, K', 'Nanopoulos, D V'])
         self.assertEqual(get_fieldvalues(18, '909C1u'), ['CERN'])
 
     def test_get_fieldvalues_wildcard(self):
         """websearch - get_fieldvalues() for tag wildcards"""
         self.assertEqual(get_fieldvalues(18, '%'), [])
         self.assertEqual(get_fieldvalues(18, '7%'), [])
         self.assertEqual(get_fieldvalues(18, '700%'), ['Enqvist, K', 'Nanopoulos, D V'])
         self.assertEqual(get_fieldvalues(18, '909C0%'), ['1985', '13','TH'])
 
     def test_get_fieldvalues_recIDs(self):
         """websearch - get_fieldvalues() for list of recIDs"""
         self.assertEqual(get_fieldvalues([], '001___'), [])
         self.assertEqual(get_fieldvalues([], '700__a'), [])
         self.assertEqual(get_fieldvalues([10, 13], '001___'), ['10', '13'])
         self.assertEqual(get_fieldvalues([18, 13], '700__a'),
                          ['Dawson, S', 'Ellis, R K', 'Enqvist, K', 'Nanopoulos, D V'])
 
     def test_get_fieldvalues_repetitive(self):
         """websearch - get_fieldvalues() for repetitive values"""
         self.assertEqual(get_fieldvalues([17, 18], '909C1u'),
                          ['CERN', 'CERN'])
         self.assertEqual(get_fieldvalues([17, 18], '909C1u', repetitive_values=True),
                          ['CERN', 'CERN'])
         self.assertEqual(get_fieldvalues([17, 18], '909C1u', repetitive_values=False),
                          ['CERN'])
 
 class WebSearchAddToBasketTest(unittest.TestCase):
     """Test of the add-to-basket presence depending on user rights."""
 
     def test_add_to_basket_guest(self):
         """websearch - add-to-basket facility allowed for guests"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10',
                                                expected_text='Add to basket'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10',
                                                expected_text='<input name="recid" type="checkbox" value="10" />'))
 
     def test_add_to_basket_jekyll(self):
         """websearch - add-to-basket facility allowed for Dr. Jekyll"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10',
                                                expected_text='Add to basket',
                                                username='jekyll',
                                                password='j123ekyll'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10',
                                                expected_text='<input name="recid" type="checkbox" value="10" />',
                                                username='jekyll',
                                                password='j123ekyll'))
 
     def test_add_to_basket_hyde(self):
         """websearch - add-to-basket facility denied to Mr. Hyde"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10',
                                                unexpected_text='Add to basket',
                                                username='hyde',
                                                password='h123yde'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=recid%3A10',
                                                unexpected_text='<input name="recid" type="checkbox" value="10" />',
                                                username='hyde',
                                                password='h123yde'))
 
 class WebSearchAlertTeaserTest(unittest.TestCase):
     """Test of the alert teaser presence depending on user rights."""
 
     def test_alert_teaser_guest(self):
         """websearch - alert teaser allowed for guests"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis',
                                                expected_link_label='email alert'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis',
                                                expected_text='RSS feed'))
 
     def test_alert_teaser_jekyll(self):
         """websearch - alert teaser allowed for Dr. Jekyll"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis',
                                                expected_text='email alert',
                                                username='jekyll',
                                                password='j123ekyll'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis',
                                                expected_text='RSS feed',
                                                username='jekyll',
                                                password='j123ekyll'))
 
     def test_alert_teaser_hyde(self):
         """websearch - alert teaser allowed for Mr. Hyde"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis',
                                                expected_text='email alert',
                                                username='hyde',
                                                password='h123yde'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=ellis',
                                                expected_text='RSS feed',
                                                username='hyde',
                                                password='h123yde'))
 
 
 class WebSearchSpanQueryTest(unittest.TestCase):
     """Test of span queries."""
 
     def test_span_in_word_index(self):
         """websearch - span query in a word index"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=year%3A1992-%3E1996&of=id&ap=0',
                                                expected_text='[17, 66, 69, 71]'))
 
     def test_span_in_phrase_index(self):
         """websearch - span query in a phrase index"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=year%3A%221992%22-%3E%221996%22&of=id&ap=0',
                                                expected_text='[17, 66, 69, 71]'))
 
     def test_span_in_bibxxx(self):
         """websearch - span query in MARC tables"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=909C0y%3A%221992%22-%3E%221996%22&of=id&ap=0',
                                                expected_text='[17, 66, 69, 71]'))
 
     def test_span_with_spaces(self):
         """websearch - no span query when a space is around"""
         # useful for reaction search
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=title%3A%27mu%20--%3E%20e%27&of=id&ap=0',
                                                expected_text='[67]'))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=245%3A%27mu%20--%3E%20e%27&of=id&ap=0',
                                                expected_text='[67]'))
 
     def test_span_in_author(self):
         """websearch - span query in special author index"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=author%3A%22Ellis,%20K%22-%3E%22Ellis,%20RZ%22&of=id&ap=0',
                                                expected_text='[8, 11, 13, 17, 47]'))
 
 
 class WebSearchReferstoCitedbyTest(unittest.TestCase):
     """Test of refersto/citedby search operators."""
 
     def test_refersto_recid(self):
         'websearch - refersto:recid:84'
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=refersto%3Arecid%3A84&of=id&ap=0',
                                                expected_text='[85, 88, 91]'))
 
     def test_refersto_repno(self):
         'websearch - refersto:reportnumber:hep-th/0205061'
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=refersto%3Areportnumber%3Ahep-th/0205061&of=id&ap=0',
                                                expected_text='[91]'))
 
     def test_refersto_author_word(self):
         'websearch - refersto:author:klebanov'
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=refersto%3Aauthor%3Aklebanov&of=id&ap=0',
                                                expected_text='[85, 86, 88, 91]'))
 
     def test_refersto_author_phrase(self):
         'websearch - refersto:author:"Klebanov, I"'
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=refersto%3Aauthor%3A%22Klebanov,%20I%22&of=id&ap=0',
                                                expected_text='[85, 86, 88, 91]'))
 
     def test_citedby_recid(self):
         'websearch - citedby:recid:92'
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=citedby%3Arecid%3A92&of=id&ap=0',
                                                expected_text='[74, 91]'))
 
     def test_citedby_repno(self):
         'websearch - citedby:reportnumber:hep-th/0205061'
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=citedby%3Areportnumber%3Ahep-th/0205061&of=id&ap=0',
                                                expected_text='[78]'))
 
     def test_citedby_author_word(self):
         'websearch - citedby:author:klebanov'
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=citedby%3Aauthor%3Aklebanov&of=id&ap=0',
                                                expected_text='[95]'))
 
     def test_citedby_author_phrase(self):
         'websearch - citedby:author:"Klebanov, I"'
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=citedby%3Aauthor%3A%22Klebanov,%20I%22&of=id&ap=0',
                                                expected_text='[95]'))
 
     def test_refersto_bad_query(self):
         'websearch - refersto:title:'
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=refersto%3Atitle%3A',
                                                expected_text='There are no records referring to title:.'))
 
     def test_citedby_bad_query(self):
         'websearch - citedby:title:'
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=citedby%3Atitle%3A',
                                                expected_text='There are no records cited by title:.'))
 
 
 class WebSearchSPIRESSyntaxTest(unittest.TestCase):
     """Test of SPIRES syntax issues"""
 
     if CFG_WEBSEARCH_SPIRES_SYNTAX > 0:
         def test_and_not_parens(self):
             'websearch - find a ellis, j and not a enqvist'
             self.assertEqual([],
                              test_web_page_content(CFG_SITE_URL +'/search?p=find+a+ellis%2C+j+and+not+a+enqvist&of=id&ap=0',
                                                    expected_text='[9, 12, 14, 47]'))
 
+    if DATEUTIL_AVAILABLE:
         def test_dadd_search(self):
             'websearch - find da > today - 3650'
             # XXX: assumes we've reinstalled our site in the last 10 years
             # should return every document in the system
             self.assertEqual([],
                              test_web_page_content(CFG_SITE_URL +'/search?ln=en&p=find+da+%3E+today+-+3650&f=&of=id',
                                                    expected_text='[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 99, 100, 101, 102, 103, 104, 107, 108, 113]'))
 
 
 class WebSearchDateQueryTest(unittest.TestCase):
     """Test various date queries."""
 
     def setUp(self):
         """Establish variables we plan to re-use"""
         self.empty = intbitset()
 
     def test_search_unit_hits_for_datecreated_previous_millenia(self):
         """websearch - search_unit with datecreated returns >0 hits for docs in the last 1000 years"""
         self.assertNotEqual(self.empty, search_unit('1000-01-01->9999-12-31', 'datecreated'))
 
     def test_search_unit_hits_for_datemodified_previous_millenia(self):
         """websearch - search_unit with datemodified returns >0 hits for docs in the last 1000 years"""
         self.assertNotEqual(self.empty, search_unit('1000-01-01->9999-12-31', 'datemodified'))
 
     def test_search_unit_in_bibrec_for_datecreated_previous_millenia(self):
         """websearch - search_unit_in_bibrec with creationdate gets >0 hits for past 1000 years"""
         self.assertNotEqual(self.empty, search_unit_in_bibrec("1000-01-01", "9999-12-31", 'creationdate'))
 
     def test_search_unit_in_bibrec_for_datecreated_next_millenia(self):
         """websearch - search_unit_in_bibrec with creationdate gets 0 hits for after year 3000"""
         self.assertEqual(self.empty, search_unit_in_bibrec("3000-01-01", "9999-12-31", 'creationdate'))
 
 
 class WebSearchSynonymQueryTest(unittest.TestCase):
     """Test of queries using synonyms."""
 
     def test_journal_phrvd(self):
         """websearch - search-time synonym search, journal title"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=PHRVD&f=journal&of=id',
                                                expected_text="[66, 72]"))
 
     def test_journal_phrvd_54_1996_4234(self):
         """websearch - search-time synonym search, journal article"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=PHRVD%2054%20%281996%29%204234&f=journal&of=id',
                                                expected_text="[66]"))
 
     def test_journal_beta_decay_title(self):
         """websearch - index-time synonym search, beta decay in title"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=beta+decay&f=title&of=id',
                                                expected_text="[59]"))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%CE%B2+decay&f=title&of=id',
                                                expected_text="[59]"))
 
     def test_journal_beta_decay_global(self):
         """websearch - index-time synonym search, beta decay in any field"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=beta+decay&of=id',
                                                expected_text="[52, 59]"))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%CE%B2+decay&of=id',
                                                expected_text="[52, 59]"))
 
     def test_journal_beta_title(self):
         """websearch - index-time synonym search, beta in title"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=beta&f=title&of=id',
                                                expected_text="[59]"))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%CE%B2&f=title&of=id',
                                                expected_text="[59]"))
 
     def test_journal_beta_global(self):
         """websearch - index-time synonym search, beta in any field"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=beta&of=id',
                                                expected_text="[52, 59]"))
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%CE%B2&of=id',
                                                expected_text="[52, 59]"))
 
 class WebSearchWashCollectionsTest(unittest.TestCase):
     """Test if the collection argument is washed correctly"""
 
     def test_wash_coll_when_coll_restricted(self):
         """websearch - washing of restricted daughter collections"""
         self.assertEqual(
             sorted(wash_colls(cc='', c=['Books & Reports', 'Theses'])[1]),
             ['Books & Reports', 'Theses'])
         self.assertEqual(
             sorted(wash_colls(cc='', c=['Books & Reports', 'Theses'])[2]),
             ['Books & Reports', 'Theses'])
 
 
 class WebSearchAuthorCountQueryTest(unittest.TestCase):
     """Test of queries using authorcount fields."""
 
     def test_journal_authorcount_word(self):
         """websearch - author count, word query"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=4&f=authorcount&of=id',
                                                expected_text="[51, 54, 59, 66, 92, 96]"))
 
     def test_journal_authorcount_phrase(self):
         """websearch - author count, phrase query"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=%224%22&f=authorcount&of=id',
                                                expected_text="[51, 54, 59, 66, 92, 96]"))
 
     def test_journal_authorcount_span(self):
         """websearch - author count, span query"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=authorcount%3A9-%3E16&of=id',
                                                expected_text="[69, 71]"))
 
     def test_journal_authorcount_plus(self):
         """websearch - author count, plus query"""
         self.assertEqual([],
                          test_web_page_content(CFG_SITE_URL + '/search?p=50%2B&f=authorcount&of=id',
                                                expected_text="[10, 17]"))
 
 class WebSearchPerformRequestSearchRefactoringTest(unittest.TestCase):
     """Tests the perform request search API after refactoring."""
 
     def _run_test(self, test_args, expected_results):
         params = {}
 
         params.update(map(lambda y: (y[0], ',' in y[1] and ', ' not in y[1] and y[1].split(',') or y[1]), map(lambda x: x.split('=', 1), test_args.split(';'))))
 
         #params.update(map(lambda x: x.split('=', 1), test_args.split(';')))
 
         req = cStringIO.StringIO()
         params['req'] = req
 
         recs = perform_request_search(**params)
 
         if isinstance(expected_results, str):
             req.seek(0)
             recs = req.read()
 
         # this is just used to generate the results from the seearch engine before refactoring
         #if recs != expected_results:
         #    print test_args
         #    print params
         #    print recs
 
 
         self.assertEqual(recs, expected_results, "Error, we expect: %s, and we received: %s" % (expected_results, recs))
 
 
 
     def test_queries(self):
         """websearch - testing p_r_s standard arguments and their combinations"""
 
         self._run_test('p=ellis;f=author;action=Search', [8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 47])
 
         self._run_test('p=ellis;f=author;sf=title;action=Search', [8, 16, 14, 9, 11, 17, 18, 12, 10, 47, 13])
 
         self._run_test('p=ellis;f=author;sf=title;wl=5;action=Search', [8, 16, 14, 9, 11, 17, 18, 12, 10, 47, 13])
 
         self._run_test('p=ellis;f=author;sf=title;wl=5;so=a', [13, 47, 10, 12, 18, 17, 11, 9, 14, 16, 8])
 
         self._run_test('p=ellis;f=author;sf=title;wl=5;so=d', [8, 16, 14, 9, 11, 17, 18, 12, 10, 47, 13])
 
         self._run_test('p=ell*;sf=title;wl=5', [8, 15, 16, 14, 9, 11, 17, 18, 12, 10, 47, 13])
 
         self._run_test('p=ell*;sf=title;wl=1', [10])
 
         self._run_test('p=ell*;sf=title;wl=100', [8, 15, 16, 14, 9, 11, 17, 18, 12, 10, 47, 13])
 
         self._run_test('p=muon OR kaon;f=author;sf=title;wl=5;action=Search', [])
 
         self._run_test('p=muon OR kaon;sf=title;wl=5;action=Search', [67, 12])
 
         self._run_test('p=muon OR kaon;sf=title;wl=5;c=Articles,Preprints', [67, 12])
 
         self._run_test('p=muon OR kaon;sf=title;wl=5;c=Articles', [67])
 
         self._run_test('p=muon OR kaon;sf=title;wl=5;c=Preprints', [12])
 
         # FIXME_TICKET_1174
         # self._run_test('p=el*;rm=citation', [2, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 23, 30, 32, 34, 47, 48, 51, 52, 54, 56, 58, 59, 92, 97, 100, 103, 18, 74, 91, 94, 81])
 
         if not get_external_word_similarity_ranker():
             self._run_test('p=el*;rm=wrd', [2, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 23, 30, 32, 34, 47, 48, 51, 52, 54, 56, 58, 59, 74, 81, 91, 92, 94, 97, 100, 103, 109])
 
         self._run_test('p=el*;sf=title', [100, 32, 8, 15, 16, 81, 97, 34, 23, 58, 2, 14, 9, 11, 30, 109, 52, 48, 94, 17, 56, 18, 91, 59, 12, 92, 74, 54, 103, 10, 51, 47, 13])
 
         self._run_test('p=boson;rm=citation', [1, 47, 50, 107, 108, 77, 95])
 
         if not get_external_word_similarity_ranker():
             self._run_test('p=boson;rm=wrd', [108, 77, 47, 50, 95, 1, 107])
 
         self._run_test('p1=ellis;f1=author;m1=a;op1=a;p2=john;f2=author;m2=a', [])
 
         self._run_test('p1=ellis;f1=author;m1=o;op1=a;p2=john;f2=author;m2=o', [])
 
         self._run_test('p1=ellis;f1=author;m1=e;op1=a;p2=john;f2=author;m2=e', [])
 
         self._run_test('p1=ellis;f1=author;m1=a;op1=o;p2=john;f2=author;m2=a', [8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 47])
 
         self._run_test('p1=ellis;f1=author;m1=o;op1=o;p2=john;f2=author;m2=o', [8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 47])
 
         self._run_test('p1=ellis;f1=author;m1=e;op1=o;p2=john;f2=author;m2=e', [])
 
         self._run_test('p1=ellis;f1=author;m1=a;op1=n;p2=john;f2=author;m2=a', [8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 47])
 
         self._run_test('p1=ellis;f1=author;m1=o;op1=n;p2=john;f2=author;m2=o', [8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 47])
 
         self._run_test('p1=ellis;f1=author;m1=e;op1=n;p2=john;f2=author;m2=e', [])
 
         self._run_test('p=Ellis, J;ap=1', [9, 10, 11, 12, 14, 17, 18, 47])
 
         self._run_test('p=Ellis, J;ap=0', [9, 10, 11, 12, 14, 17, 18, 47])
 
         self._run_test('p=recid:148x', [])
 
         self._run_test('p=recid:148x;of=xm;rg=200', "<collection xmlns=\"http://www.loc.gov/MARC21/slim\">\n\n</collection>")
 
 
 class WebSearchGetRecordTests(unittest.TestCase):
     def setUp(self):
         self.recid = run_sql("INSERT INTO bibrec(creation_date, modification_date) VALUES(NOW(), NOW())")
 
     def tearDown(self):
         run_sql("DELETE FROM bibrec WHERE id=%s", (self.recid,))
 
     def test_get_record(self):
         """bibformat - test print_record and get_record of empty record"""
         from invenio.search_engine import print_record, get_record
         self.assertEqual(print_record(self.recid, 'xm'), '    <record>\n        <controlfield tag="001">%s</controlfield>\n    </record>\n\n    ' % self.recid)
         self.assertEqual(get_record(self.recid), {'001': [([], ' ', ' ', str(self.recid), 1)]})
 
 
 class WebSearchExactTitleIndexTest(unittest.TestCase):
     """Checks if exact title index works correctly """
 
     def test_exacttitle_query_solves_problems(self):
         """websearch - check exacttitle query solves problems"""
         error_messages = []
         error_messages.extend(test_web_page_content(CFG_SITE_URL + "/search?ln=en&p=exacttitle%3A'solves+problems'&f=&action_search=Search",
                                                     expected_text = "Non-compact supergravity solves problems"))
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_exacttitle_query_solve_problems(self):
         """websearch - check exacttitle query solve problems"""
         error_messages = []
         error_messages.extend(test_web_page_content(CFG_SITE_URL + "/search?ln=en&p=exacttitle%3A'solve+problems'&f=&action_search=Search",
                                                     expected_text = ['Search term', 'solve problems', 'did not match']))
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_exacttitle_query_photon_beam(self):
         """websearch - check exacttitle search photon beam"""
         error_messages = []
         error_messages.extend(test_web_page_content(CFG_SITE_URL + "/search?ln=en&p=exacttitle%3A'photon+beam'&f=&action_search=Search",
                                                     expected_text = "Development of photon beam diagnostics"))
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
     def test_exacttitle_query_photons_beam(self):
         """websearch - check exacttitle search photons beam"""
         error_messages = []
         error_messages.extend(test_web_page_content(CFG_SITE_URL + "/search?ln=en&p=exacttitle%3A'photons+beam'&f=&action_search=Search",
                                                     expected_text = ['Search term', 'photons beam', 'did not match']))
         if error_messages:
             self.fail(merge_error_messages(error_messages))
 
 
 TEST_SUITE = make_test_suite(WebSearchWebPagesAvailabilityTest,
                              WebSearchTestSearch,
                              WebSearchTestBrowse,
                              WebSearchTestOpenURL,
                              WebSearchTestCollections,
                              WebSearchTestRecord,
                              WebSearchTestLegacyURLs,
                              WebSearchNearestTermsTest,
                              WebSearchBooleanQueryTest,
                              WebSearchAuthorQueryTest,
                              WebSearchSearchEnginePythonAPITest,
                              WebSearchSearchEngineWebAPITest,
                              WebSearchRestrictedCollectionTest,
                              WebSearchRestrictedCollectionHandlingTest,
                              WebSearchRestrictedPicturesTest,
                              WebSearchRestrictedWebJournalFilesTest,
                              WebSearchRSSFeedServiceTest,
                              WebSearchXSSVulnerabilityTest,
                              WebSearchResultsOverview,
                              WebSearchSortResultsTest,
                              WebSearchSearchResultsXML,
                              WebSearchUnicodeQueryTest,
                              WebSearchMARCQueryTest,
                              WebSearchExtSysnoQueryTest,
                              WebSearchResultsRecordGroupingTest,
                              WebSearchSpecialTermsQueryTest,
                              WebSearchJournalQueryTest,
                              WebSearchStemmedIndexQueryTest,
                              WebSearchSummarizerTest,
                              WebSearchRecordCollectionGuessTest,
                              WebSearchGetFieldValuesTest,
                              WebSearchAddToBasketTest,
                              WebSearchAlertTeaserTest,
                              WebSearchSpanQueryTest,
                              WebSearchReferstoCitedbyTest,
                              WebSearchSPIRESSyntaxTest,
                              WebSearchDateQueryTest,
                              WebSearchTestWildcardLimit,
                              WebSearchSynonymQueryTest,
                              WebSearchWashCollectionsTest,
                              WebSearchAuthorCountQueryTest,
                              WebSearchPerformRequestSearchRefactoringTest,
                              WebSearchGetRecordTests,
                              WebSearchExactTitleIndexTest)
 
 if __name__ == "__main__":
     run_test_suite(TEST_SUITE, warn_user=True)
diff --git a/modules/websubmit/lib/websubmit_file_converter.py b/modules/websubmit/lib/websubmit_file_converter.py
index e57752856..091ef0a50 100644
--- a/modules/websubmit/lib/websubmit_file_converter.py
+++ b/modules/websubmit/lib/websubmit_file_converter.py
@@ -1,1462 +1,1465 @@
 # -*- coding: utf-8 -*-
 ## This file is part of Invenio.
 ## Copyright (C) 2009, 2010, 2011, 2012 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """
 This module implement fulltext conversion between many different file formats.
 """
 
 import os
 import stat
 import re
 import sys
 import shutil
 import tempfile
 import HTMLParser
 import time
 import subprocess
 import atexit
 import signal
 import threading
 
 from logging import DEBUG, getLogger
 from htmlentitydefs import entitydefs
 from optparse import OptionParser
 
 try:
     from invenio.hocrlib import create_pdf, extract_hocr, CFG_PPM_RESOLUTION
-    from pyPdf import PdfFileReader, PdfFileWriter
+    try:
+        from PyPDF2 import PdfFileReader, PdfFileWriter
+    except ImportError:
+        from pyPdf import PdfFileReader, PdfFileWriter
     CFG_CAN_DO_OCR = True
 except ImportError:
     CFG_CAN_DO_OCR = False
 
 from invenio.textutils import wrap_text_in_a_box
 from invenio.shellutils import run_process_with_timeout, run_shell_command
 from invenio.config import CFG_TMPDIR, CFG_ETCDIR, CFG_PYLIBDIR, \
     CFG_PATH_ANY2DJVU, \
     CFG_PATH_PDFINFO, \
     CFG_PATH_GS, \
     CFG_PATH_PDFOPT, \
     CFG_PATH_PDFTOPS, \
     CFG_PATH_GZIP, \
     CFG_PATH_GUNZIP, \
     CFG_PATH_PDFTOTEXT, \
     CFG_PATH_PDFTOPPM, \
     CFG_PATH_OCROSCRIPT, \
     CFG_PATH_DJVUPS, \
     CFG_PATH_DJVUTXT, \
     CFG_PATH_OPENOFFICE_PYTHON, \
     CFG_PATH_PSTOTEXT, \
     CFG_PATH_TIFF2PDF, \
     CFG_PATH_PS2PDF, \
     CFG_OPENOFFICE_SERVER_HOST, \
     CFG_OPENOFFICE_SERVER_PORT, \
     CFG_OPENOFFICE_USER, \
     CFG_PATH_CONVERT, \
     CFG_PATH_PAMFILE, \
     CFG_BINDIR, \
     CFG_LOGDIR, \
     CFG_BIBSCHED_PROCESS_USER, \
     CFG_BIBDOCFILE_BEST_FORMATS_TO_EXTRACT_TEXT_FROM, \
     CFG_BIBDOCFILE_DESIRED_CONVERSIONS
 
 from invenio.errorlib import register_exception
 
 def get_file_converter_logger():
     return getLogger("InvenioWebSubmitFileConverterLogger")
 
 CFG_TWO2THREE_LANG_CODES = {
     'en': 'eng',
     'nl': 'nld',
     'es': 'spa',
     'de': 'deu',
     'it': 'ita',
     'fr': 'fra',
 }
 
 CFG_OPENOFFICE_TMPDIR = os.path.join(CFG_TMPDIR, 'ooffice-tmp-files')
 CFG_GS_MINIMAL_VERSION_FOR_PDFA = "8.65"
 CFG_GS_MINIMAL_VERSION_FOR_PDFX = "8.52"
 CFG_ICC_PATH = os.path.join(CFG_ETCDIR, 'websubmit', 'file_converter_templates', 'ISOCoatedsb.icc')
 CFG_PDFA_DEF_PATH = os.path.join(CFG_ETCDIR, 'websubmit', 'file_converter_templates', 'PDFA_def.ps')
 CFG_PDFX_DEF_PATH = os.path.join(CFG_ETCDIR, 'websubmit', 'file_converter_templates', 'PDFX_def.ps')
 
 CFG_UNOCONV_LOG_PATH = os.path.join(CFG_LOGDIR, 'unoconv.log')
 _RE_CLEAN_SPACES = re.compile(r'\s+')
 
 
 class InvenioWebSubmitFileConverterError(Exception):
     pass
 
 
 def get_conversion_map():
     """Return a dictionary of the form:
     '.pdf' : {'.ps.gz' : ('pdf2ps', {param1 : value1...})
     """
     ret = {
         '.csv': {},
         '.djvu': {},
         '.doc': {},
         '.docx': {},
         '.sxw': {},
         '.htm': {},
         '.html': {},
         '.odp': {},
         '.ods': {},
         '.odt': {},
         '.pdf': {},
         '.ppt': {},
         '.pptx': {},
         '.sxi': {},
         '.ps': {},
         '.ps.gz': {},
         '.rtf': {},
         '.tif': {},
         '.tiff': {},
         '.txt': {},
         '.xls': {},
         '.xlsx': {},
         '.sxc': {},
         '.xml': {},
         '.hocr': {},
         '.pdf;pdfa': {},
         '.asc': {},
     }
     if CFG_PATH_GZIP:
         ret['.ps']['.ps.gz'] = (gzip, {})
     if CFG_PATH_GUNZIP:
         ret['.ps.gz']['.ps'] = (gunzip, {})
     if CFG_PATH_ANY2DJVU:
         ret['.pdf']['.djvu'] = (any2djvu, {})
         ret['.ps']['.djvu'] = (any2djvu, {})
     if CFG_PATH_DJVUPS:
         ret['.djvu']['.ps'] = (djvu2ps, {'compress': False})
         if CFG_PATH_GZIP:
             ret['.djvu']['.ps.gz'] = (djvu2ps, {'compress': True})
     if CFG_PATH_DJVUTXT:
         ret['.djvu']['.txt'] = (djvu2text, {})
     if CFG_PATH_PSTOTEXT:
         ret['.ps']['.txt'] = (pstotext, {})
         if CFG_PATH_GUNZIP:
             ret['.ps.gz']['.txt'] = (pstotext, {})
     if can_pdfa():
         ret['.ps']['.pdf;pdfa'] = (ps2pdfa, {})
         ret['.pdf']['.pdf;pdfa'] = (pdf2pdfa, {})
         if CFG_PATH_GUNZIP:
             ret['.ps.gz']['.pdf;pdfa'] = (ps2pdfa, {})
     else:
         if CFG_PATH_PS2PDF:
             ret['.ps']['.pdf;pdfa'] = (ps2pdf, {})
             if CFG_PATH_GUNZIP:
                 ret['.ps.gz']['.pdf'] = (ps2pdf, {})
     if can_pdfx():
         ret['.ps']['.pdf;pdfx'] = (ps2pdfx, {})
         ret['.pdf']['.pdf;pdfx'] = (pdf2pdfx, {})
         if CFG_PATH_GUNZIP:
             ret['.ps.gz']['.pdf;pdfx'] = (ps2pdfx, {})
     if CFG_PATH_PDFTOPS:
         ret['.pdf']['.ps'] = (pdf2ps, {'compress': False})
         ret['.pdf;pdfa']['.ps'] = (pdf2ps, {'compress': False})
         if CFG_PATH_GZIP:
             ret['.pdf']['.ps.gz'] = (pdf2ps, {'compress': True})
             ret['.pdf;pdfa']['.ps.gz'] = (pdf2ps, {'compress': True})
     if CFG_PATH_PDFTOTEXT:
         ret['.pdf']['.txt'] = (pdf2text, {})
         ret['.pdf;pdfa']['.txt'] = (pdf2text, {})
     ret['.asc']['.txt'] = (txt2text, {})
     ret['.txt']['.txt'] = (txt2text, {})
     ret['.csv']['.txt'] = (txt2text, {})
     ret['.html']['.txt'] = (html2text, {})
     ret['.htm']['.txt'] = (html2text, {})
     ret['.xml']['.txt'] = (html2text, {})
     if CFG_PATH_TIFF2PDF:
         ret['.tiff']['.pdf'] = (tiff2pdf, {})
         ret['.tif']['.pdf'] = (tiff2pdf, {})
     if CFG_PATH_OPENOFFICE_PYTHON and CFG_OPENOFFICE_SERVER_HOST:
         ret['.rtf']['.odt'] = (unoconv, {'output_format': 'odt'})
         ret['.rtf']['.pdf;pdfa'] = (unoconv, {'output_format': 'pdf'})
         ret['.rtf']['.txt'] = (unoconv, {'output_format': 'txt'})
         ret['.rtf']['.docx'] = (unoconv, {'output_format': 'docx'})
         ret['.doc']['.odt'] = (unoconv, {'output_format': 'odt'})
         ret['.doc']['.pdf;pdfa'] = (unoconv, {'output_format': 'pdf'})
         ret['.doc']['.txt'] = (unoconv, {'output_format': 'txt'})
         ret['.doc']['.docx'] = (unoconv, {'output_format': 'docx'})
         ret['.docx']['.odt'] = (unoconv, {'output_format': 'odt'})
         ret['.docx']['.pdf;pdfa'] = (unoconv, {'output_format': 'pdf'})
         ret['.docx']['.txt'] = (unoconv, {'output_format': 'txt'})
         ret['.sxw']['.odt'] = (unoconv, {'output_format': 'odt'})
         ret['.sxw']['.pdf;pdfa'] = (unoconv, {'output_format': 'pdf'})
         ret['.sxw']['.txt'] = (unoconv, {'output_format': 'txt'})
         ret['.docx']['.docx'] = (unoconv, {'output_format': 'docx'})
         ret['.odt']['.doc'] = (unoconv, {'output_format': 'doc'})
         ret['.odt']['.pdf;pdfa'] = (unoconv, {'output_format': 'pdf'})
         ret['.odt']['.txt'] = (unoconv, {'output_format': 'txt'})
         ret['.odt']['.docx'] = (unoconv, {'output_format': 'docx'})
         ret['.ppt']['.odp'] = (unoconv, {'output_format': 'odp'})
         ret['.ppt']['.pdf;pdfa'] = (unoconv, {'output_format': 'pdf'})
         ret['.ppt']['.txt'] = (unoconv, {'output_format': 'txt'})
         ret['.ppt']['.pptx'] = (unoconv, {'output_format': 'pptx'})
         ret['.pptx']['.odp'] = (unoconv, {'output_format': 'odp'})
         ret['.pptx']['.pdf;pdfa'] = (unoconv, {'output_format': 'pdf'})
         ret['.pptx']['.txt'] = (unoconv, {'output_format': 'txt'})
         ret['.sxi']['.odp'] = (unoconv, {'output_format': 'odp'})
         ret['.sxi']['.pdf;pdfa'] = (unoconv, {'output_format': 'pdf'})
         ret['.sxi']['.txt'] = (unoconv, {'output_format': 'txt'})
         ret['.sxi']['.pptx'] = (unoconv, {'output_format': 'pptx'})
         ret['.odp']['.ppt'] = (unoconv, {'output_format': 'ppt'})
         ret['.odp']['.pptx'] = (unoconv, {'output_format': 'pptx'})
         ret['.odp']['.pdf;pdfa'] = (unoconv, {'output_format': 'pdf'})
         ret['.odp']['.txt'] = (unoconv, {'output_format': 'txt'})
         ret['.odp']['.pptx'] = (unoconv, {'output_format': 'pptx'})
         ret['.xls']['.ods'] = (unoconv, {'output_format': 'ods'})
         ret['.xls']['.xlsx'] = (unoconv, {'output_format': 'xslx'})
         ret['.xlsx']['.ods'] = (unoconv, {'output_format': 'ods'})
         ret['.sxc']['.ods'] = (unoconv, {'output_format': 'ods'})
         ret['.sxc']['.xlsx'] = (unoconv, {'output_format': 'xslx'})
         ret['.ods']['.xls'] = (unoconv, {'output_format': 'xls'})
         ret['.ods']['.pdf;pdfa'] = (unoconv, {'output_format': 'pdf'})
         ret['.ods']['.csv'] = (unoconv, {'output_format': 'csv'})
         ret['.ods']['.xlsx'] = (unoconv, {'output_format': 'xslx'})
     ret['.csv']['.txt'] = (txt2text, {})
 
     ## Let's add all the existing output formats as potential input formats.
     for value in ret.values():
         for key in value.keys():
             if key not in ret:
                 ret[key] = {}
     return ret
 
 
 def get_best_format_to_extract_text_from(filelist, best_formats=CFG_BIBDOCFILE_BEST_FORMATS_TO_EXTRACT_TEXT_FROM):
     """
     Return among the filelist the best file whose format is best suited for
     extracting text.
     """
     from invenio.bibdocfile import decompose_file, normalize_format
     best_formats = [normalize_format(aformat) for aformat in best_formats if can_convert(aformat, '.txt')]
     for aformat in best_formats:
         for filename in filelist:
             if decompose_file(filename, skip_version=True)[2].endswith(aformat):
                 return filename
     raise InvenioWebSubmitFileConverterError("It's not possible to extract valuable text from any of the proposed files.")
 
 
 def get_missing_formats(filelist, desired_conversion=None):
     """Given a list of files it will return a dictionary of the form:
     file1 : missing formats to generate from it...
     """
     from invenio.bibdocfile import normalize_format, decompose_file
 
     def normalize_desired_conversion():
         ret = {}
         for key, value in desired_conversion.iteritems():
             ret[normalize_format(key)] = [normalize_format(aformat) for aformat in value]
         return ret
 
     if desired_conversion is None:
         desired_conversion = CFG_BIBDOCFILE_DESIRED_CONVERSIONS
 
     available_formats = [decompose_file(filename, skip_version=True)[2] for filename in filelist]
     missing_formats = []
     desired_conversion = normalize_desired_conversion()
     ret = {}
     for filename in filelist:
         aformat = decompose_file(filename, skip_version=True)[2]
         if aformat in desired_conversion:
             for desired_format in desired_conversion[aformat]:
                 if desired_format not in available_formats and desired_format not in missing_formats:
                     missing_formats.append(desired_format)
                     if filename not in ret:
                         ret[filename] = []
                     ret[filename].append(desired_format)
     return ret
 
 
 def can_convert(input_format, output_format, max_intermediate_conversions=4):
     """Return the chain of conversion to transform input_format into output_format, if any."""
     from invenio.bibdocfile import normalize_format
     if max_intermediate_conversions <= 0:
         return []
     input_format = normalize_format(input_format)
     output_format = normalize_format(output_format)
     if input_format in __CONVERSION_MAP:
         if output_format in __CONVERSION_MAP[input_format]:
             return [__CONVERSION_MAP[input_format][output_format]]
         best_res = []
         best_intermediate = ''
         for intermediate_format in __CONVERSION_MAP[input_format]:
             res = can_convert(intermediate_format, output_format, max_intermediate_conversions-1)
             if res and (len(res) < best_res or not best_res):
                 best_res = res
                 best_intermediate = intermediate_format
         if best_res:
             return [__CONVERSION_MAP[input_format][best_intermediate]] + best_res
     return []
 
 
 def can_pdfopt(verbose=False):
     """Return True if it's possible to optimize PDFs."""
     if CFG_PATH_PDFOPT:
         return True
     elif verbose:
         print >> sys.stderr, "PDF linearization is not supported because the pdfopt executable is not available"
     return False
 
 
 def can_pdfx(verbose=False):
     """Return True if it's possible to generate PDF/Xs."""
     if not CFG_PATH_PDFTOPS:
         if verbose:
             print >> sys.stderr, "Conversion of PS or PDF to PDF/X is not possible because the pdftops executable is not available"
         return False
     if not CFG_PATH_GS:
         if verbose:
             print >> sys.stderr, "Conversion of PS or PDF to PDF/X is not possible because the gs executable is not available"
         return False
     else:
         try:
             output = run_shell_command("%s --version" % CFG_PATH_GS)[1].strip()
             if not output:
                 raise ValueError("No version information returned")
             if [int(number) for number in output.split('.')] < [int(number) for number in CFG_GS_MINIMAL_VERSION_FOR_PDFX.split('.')]:
                 print >> sys.stderr, "Conversion of PS or PDF to PDF/X is not possible because the minimal gs version for the executable %s is not met: it should be %s but %s has been found" % (CFG_PATH_GS, CFG_GS_MINIMAL_VERSION_FOR_PDFX, output)
                 return False
         except Exception, err:
             print >> sys.stderr, "Conversion of PS or PDF to PDF/X is not possible because it's not possible to retrieve the gs version using the executable %s: %s" % (CFG_PATH_GS, err)
             return False
     if not CFG_PATH_PDFINFO:
         if verbose:
             print >> sys.stderr, "Conversion of PS or PDF to PDF/X is not possible because the pdfinfo executable is not available"
         return False
     if not os.path.exists(CFG_ICC_PATH):
         if verbose:
             print >> sys.stderr, "Conversion of PS or PDF to PDF/X is not possible because %s does not exists. Have you run make install-pdfa-helper-files?" % CFG_ICC_PATH
         return False
     return True
 
 
 def can_pdfa(verbose=False):
     """Return True if it's possible to generate PDF/As."""
     if not CFG_PATH_PDFTOPS:
         if verbose:
             print >> sys.stderr, "Conversion of PS or PDF to PDF/A is not possible because the pdftops executable is not available"
         return False
     if not CFG_PATH_GS:
         if verbose:
             print >> sys.stderr, "Conversion of PS or PDF to PDF/A is not possible because the gs executable is not available"
         return False
     else:
         try:
             output = run_shell_command("%s --version" % CFG_PATH_GS)[1].strip()
             if not output:
                 raise ValueError("No version information returned")
             if [int(number) for number in output.split('.')] < [int(number) for number in CFG_GS_MINIMAL_VERSION_FOR_PDFA.split('.')]:
                 print >> sys.stderr, "Conversion of PS or PDF to PDF/A is not possible because the minimal gs version for the executable %s is not met: it should be %s but %s has been found" % (CFG_PATH_GS, CFG_GS_MINIMAL_VERSION_FOR_PDFA, output)
                 return False
         except Exception, err:
             print >> sys.stderr, "Conversion of PS or PDF to PDF/A is not possible because it's not possible to retrieve the gs version using the executable %s: %s" % (CFG_PATH_GS, err)
             return False
     if not CFG_PATH_PDFINFO:
         if verbose:
             print >> sys.stderr, "Conversion of PS or PDF to PDF/A is not possible because the pdfinfo executable is not available"
         return False
     if not os.path.exists(CFG_ICC_PATH):
         if verbose:
             print >> sys.stderr, "Conversion of PS or PDF to PDF/A is not possible because %s does not exists. Have you run make install-pdfa-helper-files?" % CFG_ICC_PATH
         return False
     return True
 
 
 def can_perform_ocr(verbose=False):
     """Return True if it's possible to perform OCR."""
     if not CFG_CAN_DO_OCR:
         if verbose:
             print >> sys.stderr, "OCR is not supported because either the pyPdf of ReportLab Python libraries are missing"
         return False
     if not CFG_PATH_OCROSCRIPT:
         if verbose:
             print >> sys.stderr, "OCR is not supported because the ocroscript executable is not available"
         return False
     if not CFG_PATH_PDFTOPPM:
         if verbose:
             print >> sys.stderr, "OCR is not supported because the pdftoppm executable is not available"
         return False
     return True
 
 
 def guess_ocropus_produced_garbage(input_file, hocr_p):
     """Return True if the output produced by OCROpus in hocr format contains
     only garbage instead of text. This is implemented via an heuristic:
     if the most common length for sentences encoded in UTF-8 is 1 then
     this is Garbage (tm).
     """
 
     def _get_words_from_text():
         ret = []
         for row in open(input_file):
             for word in row.strip().split(' '):
                 ret.append(word.strip())
         return ret
 
     def _get_words_from_hocr():
         ret = []
         hocr = extract_hocr(open(input_file).read())
         for dummy, dummy, lines in hocr:
             for dummy, line in lines:
                 for word in line.split():
                     ret.append(word.strip())
         return ret
 
     if hocr_p:
         words = _get_words_from_hocr()
     else:
         words = _get_words_from_text()
     #stats = {}
     #most_common_len = 0
     #most_common_how_many = 0
     #for word in words:
         #if word:
             #word_length = len(word.decode('utf-8'))
             #stats[word_length] = stats.get(word_length, 0) + 1
             #if stats[word_length] > most_common_how_many:
                 #most_common_len = word_length
                 #most_common_how_many = stats[word_length]
     goods = 0
     bads = 0
     for word in words:
         for char in word.decode('utf-8'):
             if (u'a' <= char <= u'z') or (u'A' <= char <= u'Z'):
                 goods += 1
             else:
                 bads += 1
     if bads > goods:
         get_file_converter_logger().debug('OCROpus produced garbage')
         return True
     else:
         return False
 
 
 def guess_is_OCR_needed(input_file, ln='en'):
     """
     Tries to see if enough text is retrievable from input_file.
     Return True if OCR is needed, False if it's already
     possible to retrieve information from the document.
     """
     ## FIXME: a way to understand if pdftotext has returned garbage
     ## shuould be found. E.g. 1.0*len(text)/len(zlib.compress(text)) < 2.1
     ## could be a good hint for garbage being found.
     return True
 
 
 def convert_file(input_file, output_file=None, output_format=None, **params):
     """
     Convert files from one format to another.
     @param input_file [string] the path to an existing file
     @param output_file [string] the path to the desired ouput. (if None a
         temporary file is generated)
     @param output_format [string] the desired format (if None it is taken from
         output_file)
     @param params other paramaters to pass to the particular converter
     @return [string] the final output_file
     """
     from invenio.bibdocfile import decompose_file, normalize_format
     if output_format is None:
         if output_file is None:
             raise ValueError("At least output_file or format should be specified.")
         else:
             output_ext = decompose_file(output_file, skip_version=True)[2]
     else:
         output_ext = normalize_format(output_format)
     input_ext = decompose_file(input_file, skip_version=True)[2]
     conversion_chain = can_convert(input_ext, output_ext)
     if conversion_chain:
         get_file_converter_logger().debug("Conversion chain from %s to %s: %s" % (input_ext, output_ext, conversion_chain))
         current_input = input_file
         for i, (converter, final_params) in enumerate(conversion_chain):
             current_output = None
             if i == (len(conversion_chain) - 1):
                 current_output = output_file
             final_params = dict(final_params)
             final_params.update(params)
             try:
                 get_file_converter_logger().debug("Converting from %s to %s using %s with params %s" % (current_input, current_output, converter, final_params))
                 current_output = converter(current_input, current_output, **final_params)
                 get_file_converter_logger().debug("... current_output %s" % (current_output, ))
             except InvenioWebSubmitFileConverterError, err:
                 raise InvenioWebSubmitFileConverterError("Error when converting from %s to %s: %s" % (input_file, output_ext, err))
             except Exception, err:
                 register_exception(alert_admin=True)
                 raise InvenioWebSubmitFileConverterError("Unexpected error when converting from %s to %s (%s): %s" % (input_file, output_ext, type(err), err))
             if current_input != input_file:
                 os.remove(current_input)
             current_input = current_output
         return current_output
     else:
         raise InvenioWebSubmitFileConverterError("It's impossible to convert from %s to %s" % (input_ext, output_ext))
 
 
 try:
     _UNOCONV_DAEMON
 except NameError:
     _UNOCONV_DAEMON = None
 
 _UNOCONV_DAEMON_LOCK = threading.Lock()
 
 def _register_unoconv():
     global _UNOCONV_DAEMON
     if CFG_OPENOFFICE_SERVER_HOST != 'localhost':
         return
     _UNOCONV_DAEMON_LOCK.acquire()
     try:
         if not _UNOCONV_DAEMON:
             output_log = open(CFG_UNOCONV_LOG_PATH, 'a')
             _UNOCONV_DAEMON = subprocess.Popen(['sudo', '-S', '-u', CFG_OPENOFFICE_USER, os.path.join(CFG_BINDIR, 'inveniounoconv'), '-vvv', '-s', CFG_OPENOFFICE_SERVER_HOST, '-p', str(CFG_OPENOFFICE_SERVER_PORT), '-l'], stdin=open('/dev/null', 'r'), stdout=output_log, stderr=output_log)
             time.sleep(3)
     finally:
         _UNOCONV_DAEMON_LOCK.release()
 
 def _unregister_unoconv():
     global _UNOCONV_DAEMON
     if CFG_OPENOFFICE_SERVER_HOST != 'localhost':
         return
     _UNOCONV_DAEMON_LOCK.acquire()
     try:
         if _UNOCONV_DAEMON:
             output_log = open(CFG_UNOCONV_LOG_PATH, 'a')
             subprocess.call(['sudo', '-S', '-u', CFG_OPENOFFICE_USER, os.path.join(CFG_BINDIR, 'inveniounoconv'), '-k', '-vvv'], stdin=open('/dev/null', 'r'), stdout=output_log, stderr=output_log)
             time.sleep(1)
             if _UNOCONV_DAEMON.poll():
                 try:
                     os.kill(_UNOCONV_DAEMON.pid, signal.SIGTERM)
                 except OSError:
                     pass
                 if _UNOCONV_DAEMON.poll():
                     try:
                         os.kill(_UNOCONV_DAEMON.pid, signal.SIGKILL)
                     except OSError:
                         pass
     finally:
         _UNOCONV_DAEMON_LOCK.release()
 
 ## NOTE: in case we switch back keeping LibreOffice running, uncomment
 ## the following line.
 #atexit.register(_unregister_unoconv)
 
 def unoconv(input_file, output_file=None, output_format='txt', pdfopt=True, **dummy):
     """Use unconv to convert among OpenOffice understood documents."""
     from invenio.bibdocfile import normalize_format
 
     ## NOTE: in case we switch back keeping LibreOffice running, uncomment
     ## the following line.
     #_register_unoconv()
     input_file, output_file, dummy = prepare_io(input_file, output_file, output_format, need_working_dir=False)
     if output_format == 'txt':
         unoconv_format = 'text'
     else:
         unoconv_format = output_format
     try:
         try:
             ## We copy the input file and we make it available to OpenOffice
             ## with the user nobody
             from invenio.bibdocfile import decompose_file
             input_format = decompose_file(input_file, skip_version=True)[2]
             fd, tmpinputfile = tempfile.mkstemp(dir=CFG_TMPDIR, suffix=normalize_format(input_format))
             os.close(fd)
             shutil.copy(input_file, tmpinputfile)
             get_file_converter_logger().debug("Prepared input file %s" % tmpinputfile)
             os.chmod(tmpinputfile, stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP | stat.S_IROTH)
             tmpoutputfile = tempfile.mktemp(dir=CFG_OPENOFFICE_TMPDIR, suffix=normalize_format(output_format))
             get_file_converter_logger().debug("Prepared output file %s" % tmpoutputfile)
             try:
                 execute_command(os.path.join(CFG_BINDIR, 'inveniounoconv'), '-vvv', '-s', CFG_OPENOFFICE_SERVER_HOST, '-p', str(CFG_OPENOFFICE_SERVER_PORT), '--output', tmpoutputfile, '-f', unoconv_format, tmpinputfile, sudo=CFG_OPENOFFICE_USER)
             except:
                 register_exception(alert_admin=True)
                 raise
         except InvenioWebSubmitFileConverterError:
             ## Ok maybe OpenOffice hanged. Let's better kill it and restarted!
             if CFG_OPENOFFICE_SERVER_HOST != 'localhost':
                 ## There's not that much that we can do. Let's bail out
                 if not os.path.exists(tmpoutputfile) or not os.path.getsize(tmpoutputfile):
                     raise
                 else:
                     ## Sometimes OpenOffice crashes but we don't care :-)
                     ## it still have created a nice file.
                     pass
             else:
                 execute_command(os.path.join(CFG_BINDIR, 'inveniounoconv'), '-vvv', '-k', sudo=CFG_OPENOFFICE_USER)
                 ## NOTE: in case we switch back keeping LibreOffice running, uncomment
                 ## the following lines.
                 #_unregister_unoconv()
                 #_register_unoconv()
                 time.sleep(5)
                 try:
                     execute_command(os.path.join(CFG_BINDIR, 'inveniounoconv'), '-vvv', '-s', CFG_OPENOFFICE_SERVER_HOST, '-p', str(CFG_OPENOFFICE_SERVER_PORT), '--output', tmpoutputfile, '-f', unoconv_format, tmpinputfile, sudo=CFG_OPENOFFICE_USER)
                 except InvenioWebSubmitFileConverterError:
                     execute_command(os.path.join(CFG_BINDIR, 'inveniounoconv'), '-vvv', '-k', sudo=CFG_OPENOFFICE_USER)
                     if not os.path.exists(tmpoutputfile) or not os.path.getsize(tmpoutputfile):
                         raise InvenioWebSubmitFileConverterError('No output was generated by OpenOffice')
                     else:
                         ## Sometimes OpenOffice crashes but we don't care :-)
                         ## it still have created a nice file.
                         pass
     except Exception, err:
         raise InvenioWebSubmitFileConverterError(get_unoconv_installation_guideline(err))
 
     output_format = normalize_format(output_format)
 
     if output_format == '.pdf' and pdfopt:
         pdf2pdfopt(tmpoutputfile, output_file)
     else:
         shutil.copy(tmpoutputfile, output_file)
     execute_command(os.path.join(CFG_BINDIR, 'inveniounoconv'), '-r', tmpoutputfile, sudo=CFG_OPENOFFICE_USER)
     os.remove(tmpinputfile)
     return output_file
 
 def get_unoconv_installation_guideline(err):
     """Return the Libre/OpenOffice installation guideline (embedding the
     current error message).
     """
     from invenio.bibtask import guess_apache_process_user
     return wrap_text_in_a_box("""\
 OpenOffice.org can't properly create files in the OpenOffice.org temporary
 directory %(tmpdir)s, as the user %(nobody)s (as configured in
 CFG_OPENOFFICE_USER invenio(-local).conf variable): %(err)s.
 
 In your /etc/sudoers file, you should authorize the %(apache)s user to run
  %(unoconv)s as %(nobody)s user as in:
 
 
 %(apache)s ALL=(%(nobody)s) NOPASSWD: %(unoconv)s
 
 
 You should then run the following commands:
 
 $ sudo mkdir -p %(tmpdir)s
 
 $ sudo chown -R %(nobody)s %(tmpdir)s
 
 $ sudo chmod -R 755 %(tmpdir)s""" % {
             'tmpdir' : CFG_OPENOFFICE_TMPDIR,
             'nobody' : CFG_OPENOFFICE_USER,
             'err' : err,
             'apache' : CFG_BIBSCHED_PROCESS_USER or guess_apache_process_user(),
             'python' : CFG_PATH_OPENOFFICE_PYTHON,
             'unoconv' : os.path.join(CFG_BINDIR, 'inveniounoconv')
             })
 
 def can_unoconv(verbose=False):
     """
     If OpenOffice.org integration is enabled, checks whether the system is
     properly configured.
     """
     if CFG_PATH_OPENOFFICE_PYTHON and CFG_OPENOFFICE_SERVER_HOST:
         try:
             test = os.path.join(CFG_TMPDIR, 'test.txt')
             open(test, 'w').write('test')
             output = unoconv(test, output_format='pdf')
             output2 = convert_file(output, output_format='.txt')
             if 'test' not in open(output2).read():
                 raise Exception("Coulnd't produce a valid PDF with Libre/OpenOffice.org")
             os.remove(output2)
             os.remove(output)
             os.remove(test)
             return True
         except Exception, err:
             if verbose:
                 print >> sys.stderr, get_unoconv_installation_guideline(err)
             return False
     else:
         if verbose:
             print >> sys.stderr, "Libre/OpenOffice.org integration not enabled"
         return False
 
 
 def any2djvu(input_file, output_file=None, resolution=400, ocr=True, input_format=5, **dummy):
     """
     Transform input_file into a .djvu file.
     @param input_file [string] the input file name
     @param output_file [string] the output_file file name, None for temporary generated
     @param resolution [int] the resolution of the output_file
     @param input_format [int] [1-9]:
         1 - DjVu Document (for verification or OCR)
         2 - PS/PS.GZ/PDF Document (default)
         3 - Photo/Picture/Icon
         4 - Scanned Document - B&W - <200 dpi
         5 - Scanned Document - B&W - 200-400 dpi
         6 - Scanned Document - B&W - >400 dpi
         7 - Scanned Document - Color/Mixed - <200 dpi
         8 - Scanned Document - Color/Mixed - 200-400 dpi
         9 - Scanned Document - Color/Mixed - >400 dpi
     @return [string] output_file input_file.
     raise InvenioWebSubmitFileConverterError in case of errors.
     Note: due to the bottleneck of using a centralized server, it is very
     slow and is not suitable for interactive usage (e.g. WebSubmit functions)
     """
     from invenio.bibdocfile import decompose_file
     input_file, output_file, working_dir = prepare_io(input_file, output_file, '.djvu')
 
     ocr = ocr and "1" or "0"
 
     ## Any2djvu expect to find the file in the current directory.
     execute_command(CFG_PATH_ANY2DJVU, '-a', '-c', '-r', resolution, '-o', ocr, '-f', input_format, os.path.basename(input_file), cwd=working_dir)
 
     ## Any2djvu doesn't let you choose the output_file file name.
     djvu_output = os.path.join(working_dir, decompose_file(input_file)[1] + '.djvu')
     shutil.move(djvu_output, output_file)
     clean_working_dir(working_dir)
     return output_file
 
 
 _RE_FIND_TITLE = re.compile(r'^Title:\s*(.*?)\s*$')
 
 def pdf2pdfx(input_file, output_file=None, title=None, pdfopt=False, profile="pdf/x-3:2002", **dummy):
     """
     Transform any PDF into a PDF/X (see: <http://en.wikipedia.org/wiki/PDF/X>)
     @param input_file [string] the input file name
     @param output_file [string] the output_file file name, None for temporary generated
     @param title [string] the title of the document. None for autodiscovery.
     @param pdfopt [bool] whether to linearize the pdf, too.
     @param profile: [string] the PDFX profile to use. Supports: 'pdf/x-1a:2001', 'pdf/x-1a:2003', 'pdf/x-3:2002'
     @return [string] output_file input_file
     raise InvenioWebSubmitFileConverterError in case of errors.
     """
 
     input_file, output_file, working_dir = prepare_io(input_file, output_file, '.pdf')
 
     if title is None:
         stdout = execute_command(CFG_PATH_PDFINFO, input_file)
         for line in stdout.split('\n'):
             g = _RE_FIND_TITLE.match(line)
             if g:
                 title = g.group(1)
                 break
     if not title:
         title = 'No title'
 
     get_file_converter_logger().debug("Extracted title is %s" % title)
 
     if os.path.exists(CFG_ICC_PATH):
         shutil.copy(CFG_ICC_PATH, working_dir)
     else:
         raise InvenioWebSubmitFileConverterError('ERROR: ISOCoatedsb.icc file missing. Have you run "make install-pdfa-helper-files" as part of your Invenio deployment?')
     pdfx_header = open(CFG_PDFX_DEF_PATH).read()
     pdfx_header = pdfx_header.replace('<<<<TITLEMARKER>>>>', title)
     icc_iso_profile_def = ''
     if profile == 'pdf/x-1a:2001':
         pdfx_version = 'PDF/X-1a:2001'
         pdfx_conformance = 'PDF/X-1a:2001'
     elif profile == 'pdf/x-1a:2003':
         pdfx_version = 'PDF/X-1a:2003'
         pdfx_conformance = 'PDF/X-1a:2003'
     elif profile == 'pdf/x-3:2002':
         icc_iso_profile_def = '/ICCProfile (ISOCoatedsb.icc)'
         pdfx_version = 'PDF/X-3:2002'
         pdfx_conformance = 'PDF/X-3:2002'
     pdfx_header = pdfx_header.replace('<<<<ICCPROFILEDEF>>>>', icc_iso_profile_def)
     pdfx_header = pdfx_header.replace('<<<<GTS_PDFXVersion>>>>', pdfx_version)
     pdfx_header = pdfx_header.replace('<<<<GTS_PDFXConformance>>>>', pdfx_conformance)
     outputpdf = os.path.join(working_dir, 'output_file.pdf')
     open(os.path.join(working_dir, 'PDFX_def.ps'), 'w').write(pdfx_header)
     if profile in ['pdf/x-3:2002']:
         execute_command(CFG_PATH_GS, '-sProcessColorModel=DeviceCMYK', '-dPDFX', '-dBATCH', '-dNOPAUSE', '-dNOOUTERSAVE', '-dUseCIEColor', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sOutputFile=output_file.pdf', os.path.join(working_dir, 'PDFX_def.ps'), input_file, cwd=working_dir)
     elif profile in ['pdf/x-1a:2001', 'pdf/x-1a:2003']:
         execute_command(CFG_PATH_GS, '-sProcessColorModel=DeviceCMYK', '-dPDFX', '-dBATCH', '-dNOPAUSE', '-dNOOUTERSAVE', '-sColorConversionStrategy=CMYK', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sOutputFile=output_file.pdf', os.path.join(working_dir, 'PDFX_def.ps'), input_file, cwd=working_dir)
     if pdfopt:
         execute_command(CFG_PATH_PDFOPT, outputpdf, output_file)
     else:
         shutil.move(outputpdf, output_file)
     clean_working_dir(working_dir)
     return output_file
 
 def pdf2pdfa(input_file, output_file=None, title=None, pdfopt=True, **dummy):
     """
     Transform any PDF into a PDF/A (see: <http://www.pdfa.org/>)
     @param input_file [string] the input file name
     @param output_file [string] the output_file file name, None for temporary generated
     @param title [string] the title of the document. None for autodiscovery.
     @param pdfopt [bool] whether to linearize the pdf, too.
     @return [string] output_file input_file
     raise InvenioWebSubmitFileConverterError in case of errors.
     """
 
     input_file, output_file, working_dir = prepare_io(input_file, output_file, '.pdf')
 
     if title is None:
         stdout = execute_command(CFG_PATH_PDFINFO, input_file)
         for line in stdout.split('\n'):
             g = _RE_FIND_TITLE.match(line)
             if g:
                 title = g.group(1)
                 break
     if not title:
         title = 'No title'
 
     get_file_converter_logger().debug("Extracted title is %s" % title)
 
     if os.path.exists(CFG_ICC_PATH):
         shutil.copy(CFG_ICC_PATH, working_dir)
     else:
         raise InvenioWebSubmitFileConverterError('ERROR: ISOCoatedsb.icc file missing. Have you run "make install-pdfa-helper-files" as part of your Invenio deployment?')
     pdfa_header = open(CFG_PDFA_DEF_PATH).read()
     pdfa_header = pdfa_header.replace('<<<<TITLEMARKER>>>>', title)
     inputps = os.path.join(working_dir, 'input.ps')
     outputpdf = os.path.join(working_dir, 'output_file.pdf')
     open(os.path.join(working_dir, 'PDFA_def.ps'), 'w').write(pdfa_header)
     execute_command(CFG_PATH_PDFTOPS, '-level3', input_file, inputps)
     execute_command(CFG_PATH_GS, '-sProcessColorModel=DeviceCMYK', '-dPDFA', '-dBATCH', '-dNOPAUSE', '-dNOOUTERSAVE', '-dUseCIEColor', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sOutputFile=output_file.pdf', os.path.join(working_dir, 'PDFA_def.ps'), 'input.ps', cwd=working_dir)
     if pdfopt:
         execute_command(CFG_PATH_PDFOPT, outputpdf, output_file)
     else:
         shutil.move(outputpdf, output_file)
     clean_working_dir(working_dir)
     return output_file
 
 
 def pdf2pdfopt(input_file, output_file=None, **dummy):
     """
     Linearize the input PDF in order to improve the web-experience when
     visualizing the document through the web.
     @param input_file [string] the input input_file
     @param output_file [string] the output_file file name, None for temporary generated
     @return [string] output_file input_file
     raise InvenioWebSubmitFileConverterError in case of errors.
     """
     input_file, output_file, dummy = prepare_io(input_file, output_file, '.pdf', need_working_dir=False)
     execute_command(CFG_PATH_PDFOPT, input_file, output_file)
     return output_file
 
 
 def pdf2ps(input_file, output_file=None, level=2, compress=True, **dummy):
     """
     Convert from Pdf to Postscript.
     """
     if compress:
         suffix = '.ps.gz'
     else:
         suffix = '.ps'
     input_file, output_file, working_dir = prepare_io(input_file, output_file, suffix)
     execute_command(CFG_PATH_PDFTOPS, '-level%i' % level, input_file, os.path.join(working_dir, 'output.ps'))
     if compress:
         execute_command(CFG_PATH_GZIP, '-c', os.path.join(working_dir, 'output.ps'), filename_out=output_file)
     else:
         shutil.move(os.path.join(working_dir, 'output.ps'), output_file)
     clean_working_dir(working_dir)
     return output_file
 
 
 def ps2pdfx(input_file, output_file=None, title=None, pdfopt=False, profile="pdf/x-3:2002", **dummy):
     """
     Transform any PS into a PDF/X (see: <http://en.wikipedia.org/wiki/PDF/X>)
     @param input_file [string] the input file name
     @param output_file [string] the output_file file name, None for temporary generated
     @param title [string] the title of the document. None for autodiscovery.
     @param pdfopt [bool] whether to linearize the pdf, too.
     @param profile: [string] the PDFX profile to use. Supports: 'pdf/x-1a:2001', 'pdf/x-1a:2003', 'pdf/x-3:2002'
     @return [string] output_file input_file
     raise InvenioWebSubmitFileConverterError in case of errors.
     """
 
     input_file, output_file, working_dir = prepare_io(input_file, output_file, '.pdf')
     if input_file.endswith('.gz'):
         new_input_file = os.path.join(working_dir, 'input.ps')
         execute_command(CFG_PATH_GUNZIP, '-c', input_file, filename_out=new_input_file)
         input_file = new_input_file
     if not title:
         title = 'No title'
 
     shutil.copy(CFG_ICC_PATH, working_dir)
     pdfx_header = open(CFG_PDFX_DEF_PATH).read()
     pdfx_header = pdfx_header.replace('<<<<TITLEMARKER>>>>', title)
     icc_iso_profile_def = ''
     if profile == 'pdf/x-1a:2001':
         pdfx_version = 'PDF/X-1a:2001'
         pdfx_conformance = 'PDF/X-1a:2001'
     elif profile == 'pdf/x-1a:2003':
         pdfx_version = 'PDF/X-1a:2003'
         pdfx_conformance = 'PDF/X-1a:2003'
     elif profile == 'pdf/x-3:2002':
         icc_iso_profile_def = '/ICCProfile (ISOCoatedsb.icc)'
         pdfx_version = 'PDF/X-3:2002'
         pdfx_conformance = 'PDF/X-3:2002'
     pdfx_header = pdfx_header.replace('<<<<ICCPROFILEDEF>>>>', icc_iso_profile_def)
     pdfx_header = pdfx_header.replace('<<<<GTS_PDFXVersion>>>>', pdfx_version)
     pdfx_header = pdfx_header.replace('<<<<TITLEMARKER>>>>', title)
     outputpdf = os.path.join(working_dir, 'output_file.pdf')
     open(os.path.join(working_dir, 'PDFX_def.ps'), 'w').write(pdfx_header)
     if profile in ['pdf/x-3:2002']:
         execute_command(CFG_PATH_GS, '-sProcessColorModel=DeviceCMYK', '-dPDFX', '-dBATCH', '-dNOPAUSE', '-dNOOUTERSAVE', '-dUseCIEColor', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sOutputFile=output_file.pdf', os.path.join(working_dir, 'PDFX_def.ps'), 'input.ps', cwd=working_dir)
     elif profile in ['pdf/x-1a:2001', 'pdf/x-1a:2003']:
         execute_command(CFG_PATH_GS, '-sProcessColorModel=DeviceCMYK', '-dPDFX', '-dBATCH', '-dNOPAUSE', '-dNOOUTERSAVE', '-sColorConversionStrategy=CMYK', '-dAutoRotatePages=/None', '-sDEVICE=pdfwrite', '-sOutputFile=output_file.pdf', os.path.join(working_dir, 'PDFX_def.ps'), 'input.ps', cwd=working_dir)
     if pdfopt:
         execute_command(CFG_PATH_PDFOPT, outputpdf, output_file)
     else:
         shutil.move(outputpdf, output_file)
     clean_working_dir(working_dir)
     return output_file
 
 
 def ps2pdfa(input_file, output_file=None, title=None, pdfopt=True, **dummy):
     """
     Transform any PS into a PDF/A (see: <http://www.pdfa.org/>)
     @param input_file [string] the input file name
     @param output_file [string] the output_file file name, None for temporary generated
     @param title [string] the title of the document. None for autodiscovery.
     @param pdfopt [bool] whether to linearize the pdf, too.
     @return [string] output_file input_file
     raise InvenioWebSubmitFileConverterError in case of errors.
     """
 
     input_file, output_file, working_dir = prepare_io(input_file, output_file, '.pdf')
     if input_file.endswith('.gz'):
         new_input_file = os.path.join(working_dir, 'input.ps')
         execute_command(CFG_PATH_GUNZIP, '-c', input_file, filename_out=new_input_file)
         input_file = new_input_file
     if not title:
         title = 'No title'
 
     shutil.copy(CFG_ICC_PATH, working_dir)
     pdfa_header = open(CFG_PDFA_DEF_PATH).read()
     pdfa_header = pdfa_header.replace('<<<<TITLEMARKER>>>>', title)
     outputpdf = os.path.join(working_dir, 'output_file.pdf')
     open(os.path.join(working_dir, 'PDFA_def.ps'), 'w').write(pdfa_header)
     execute_command(CFG_PATH_GS, '-sProcessColorModel=DeviceCMYK', '-dPDFA', '-dBATCH', '-dNOPAUSE', '-dNOOUTERSAVE', '-dUseCIEColor', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sOutputFile=output_file.pdf', os.path.join(working_dir, 'PDFA_def.ps'), input_file, cwd=working_dir)
     if pdfopt:
         execute_command(CFG_PATH_PDFOPT, outputpdf, output_file)
     else:
         shutil.move(outputpdf, output_file)
     clean_working_dir(working_dir)
     return output_file
 
 def ps2pdf(input_file, output_file=None, pdfopt=True, **dummy):
     """
     Transform any PS into a PDF
     @param input_file [string] the input file name
     @param output_file [string] the output_file file name, None for temporary generated
     @param pdfopt [bool] whether to linearize the pdf, too.
     @return [string] output_file input_file
     raise InvenioWebSubmitFileConverterError in case of errors.
     """
     input_file, output_file, working_dir = prepare_io(input_file, output_file, '.pdf')
     if input_file.endswith('.gz'):
         new_input_file = os.path.join(working_dir, 'input.ps')
         execute_command(CFG_PATH_GUNZIP, '-c', input_file, filename_out=new_input_file)
         input_file = new_input_file
     outputpdf = os.path.join(working_dir, 'output_file.pdf')
     execute_command(CFG_PATH_PS2PDF, input_file, outputpdf, cwd=working_dir)
     if pdfopt:
         execute_command(CFG_PATH_PDFOPT, outputpdf, output_file)
     else:
         shutil.move(outputpdf, output_file)
     clean_working_dir(working_dir)
     return output_file
 
 def pdf2pdfhocr(input_pdf, text_hocr, output_pdf, rotations=None, font='Courier', draft=False):
     """
     Adds the OCRed text to the original pdf.
     @param rotations: a list of angles by which pages should be rotated
     """
     def _get_page_rotation(i):
         if len(rotations) > i:
             return rotations[i]
         return 0
 
     if rotations is None:
         rotations = []
     input_pdf, hocr_pdf, dummy = prepare_io(input_pdf, output_ext='.pdf', need_working_dir=False)
     create_pdf(extract_hocr(open(text_hocr).read()), hocr_pdf, font, draft)
     input1 = PdfFileReader(file(input_pdf, "rb"))
     input2 = PdfFileReader(file(hocr_pdf, "rb"))
     output = PdfFileWriter()
 
     info = input1.getDocumentInfo()
     if info:
         infoDict = output._info.getObject()
         infoDict.update(info)
 
     for i in range(0, input1.getNumPages()):
         orig_page = input1.getPage(i)
         text_page = input2.getPage(i)
         angle = _get_page_rotation(i)
         if angle != 0:
             print >> sys.stderr,  "Rotating page %d by %d degrees." % (i, angle)
             text_page = text_page.rotateClockwise(angle)
         if draft:
             below, above = orig_page, text_page
         else:
             below, above = text_page, orig_page
         below.mergePage(above)
         if angle != 0 and not draft:
             print >> sys.stderr,  "Rotating back page %d by %d degrees." % (i, angle)
             below.rotateCounterClockwise(angle)
         output.addPage(below)
     outputStream = file(output_pdf, "wb")
     output.write(outputStream)
     outputStream.close()
     os.remove(hocr_pdf)
     return output_pdf
 
 
 def pdf2hocr2pdf(input_file, output_file=None, ln='en', return_working_dir=False, extract_only_text=False, pdfopt=True, font='Courier', draft=False, **dummy):
     """
     Return the text content in input_file.
     @param ln is a two letter language code to give the OCR tool a hint.
     @param return_working_dir if set to True, will return output_file path and the working_dir path, instead of deleting the working_dir. This is useful in case you need the intermediate images to build again a PDF.
     """
 
     def _perform_rotate(working_dir, imagefile, angle):
         """Rotate imagefile of the corresponding angle. Creates a new file
         with rotated.ppm."""
         get_file_converter_logger().debug('Performing rotate on %s by %s degrees' % (imagefile, angle))
         if not angle:
             #execute_command('%s %s %s', CFG_PATH_CONVERT, os.path.join(working_dir, imagefile), os.path.join(working_dir, 'rotated-%s' % imagefile))
             shutil.copy(os.path.join(working_dir, imagefile), os.path.join(working_dir, 'rotated.ppm'))
         else:
             execute_command(CFG_PATH_CONVERT, os.path.join(working_dir, imagefile), '-rotate', str(angle), '-depth', str(8), os.path.join(working_dir, 'rotated.ppm'))
         return True
 
     def _perform_deskew(working_dir):
         """Perform ocroscript deskew. Expect to work on rotated-imagefile.
         Creates deskewed.ppm.
         Return True if deskewing was fine."""
         get_file_converter_logger().debug('Performing deskew')
         try:
             dummy, stderr = execute_command_with_stderr(CFG_PATH_OCROSCRIPT, os.path.join(CFG_ETCDIR, 'websubmit', 'file_converter_templates', 'deskew.lua'), os.path.join(working_dir, 'rotated.ppm'), os.path.join(working_dir, 'deskewed.ppm'))
             if stderr.strip():
                 get_file_converter_logger().debug('Errors found during deskewing')
                 return False
             else:
                 return True
         except InvenioWebSubmitFileConverterError, err:
             get_file_converter_logger().debug('Deskewing error: %s' % err)
             return False
 
     def _perform_recognize(working_dir):
         """Perform ocroscript recognize. Expect to work on deskewed.ppm.
         Creates recognized.out Return True if recognizing was fine."""
         get_file_converter_logger().debug('Performing recognize')
         if extract_only_text:
             output_mode = 'text'
         else:
             output_mode = 'hocr'
         try:
             dummy, stderr = execute_command_with_stderr(CFG_PATH_OCROSCRIPT, 'recognize', '--tesslanguage=%s' % ln, '--output-mode=%s' % output_mode, os.path.join(working_dir, 'deskewed.ppm'), filename_out=os.path.join(working_dir, 'recognize.out'))
             if stderr.strip():
                 ## There was some output on stderr
                 get_file_converter_logger().debug('Errors found in recognize.err')
                 return False
             return not guess_ocropus_produced_garbage(os.path.join(working_dir, 'recognize.out'), not extract_only_text)
         except InvenioWebSubmitFileConverterError, err:
             get_file_converter_logger().debug('Recognizer error: %s' % err)
             return False
 
     def _perform_dummy_recognize(working_dir):
         """Return an empty text or an empty hocr referencing the image."""
         get_file_converter_logger().debug('Performing dummy recognize')
         if extract_only_text:
             out = ''
         else:
             out = """<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml"><head><meta content="ocr_line ocr_page" name="ocr-capabilities"/><meta content="en" name="ocr-langs"/><meta content="Latin" name="ocr-scripts"/><meta content="" name="ocr-microformats"/><title>OCR Output</title></head>
 <body><div class="ocr_page" title="bbox 0 0 1 1; image deskewed.ppm">
 </div></body></html>"""
         open(os.path.join(working_dir, 'recognize.out'), 'w').write(out)
 
     def _find_image_file(working_dir, imageprefix, page):
         ret = '%s-%d.ppm' % (imageprefix, page)
         if os.path.exists(os.path.join(working_dir, ret)):
             return ret
         ret = '%s-%02d.ppm' % (imageprefix, page)
         if os.path.exists(os.path.join(working_dir, ret)):
             return ret
         ret = '%s-%03d.ppm' % (imageprefix, page)
         if os.path.exists(os.path.join(working_dir, ret)):
             return ret
         ret = '%s-%04d.ppm' % (imageprefix, page)
         if os.path.exists(os.path.join(working_dir, ret)):
             return ret
         ret = '%s-%05d.ppm' % (imageprefix, page)
         if os.path.exists(os.path.join(working_dir, ret)):
             return ret
         ret = '%s-%06d.ppm' % (imageprefix, page)
         if os.path.exists(os.path.join(working_dir, ret)):
             return ret
         ## I guess we won't have documents with more than million pages
         return None
 
     def _ocr(tmp_output_file):
         """
         Append to tmp_output_file the partial results of OCROpus recognize.
         Return a list of rotations.
         """
         page = 0
         rotations = []
         while True:
             page += 1
             get_file_converter_logger().debug('Page %d.' % page)
             execute_command(CFG_PATH_PDFTOPPM, '-f', str(page), '-l', str(page), '-r', str(CFG_PPM_RESOLUTION), '-aa', 'yes', '-freetype', 'yes', input_file, os.path.join(working_dir, 'image'))
             imagefile = _find_image_file(working_dir, 'image', page)
             if imagefile == None:
                 break
             for angle in (0, 180, 90, 270):
                 get_file_converter_logger().debug('Trying %d degrees...' % angle)
                 if _perform_rotate(working_dir, imagefile, angle) and _perform_deskew(working_dir) and _perform_recognize(working_dir):
                     rotations.append(angle)
                     break
             else:
                 get_file_converter_logger().debug('Dummy recognize')
                 rotations.append(0)
                 _perform_dummy_recognize(working_dir)
             open(tmp_output_file, 'a').write(open(os.path.join(working_dir, 'recognize.out')).read())
             # clean
             os.remove(os.path.join(working_dir, imagefile))
         return rotations
 
 
     if CFG_PATH_OCROSCRIPT:
         if len(ln) == 2:
             ln = CFG_TWO2THREE_LANG_CODES.get(ln, 'eng')
         if extract_only_text:
             input_file, output_file, working_dir = prepare_io(input_file, output_file, output_ext='.txt')
             _ocr(output_file)
         else:
             input_file, tmp_output_hocr, working_dir = prepare_io(input_file, output_ext='.hocr')
             rotations = _ocr(tmp_output_hocr)
             if pdfopt:
                 input_file, tmp_output_pdf, dummy = prepare_io(input_file, output_ext='.pdf', need_working_dir=False)
                 tmp_output_pdf, output_file, dummy = prepare_io(tmp_output_pdf, output_file, output_ext='.pdf', need_working_dir=False)
                 pdf2pdfhocr(input_file, tmp_output_hocr, tmp_output_pdf, rotations=rotations, font=font, draft=draft)
                 pdf2pdfopt(tmp_output_pdf, output_file)
                 os.remove(tmp_output_pdf)
             else:
                 input_file, output_file, dummy = prepare_io(input_file, output_file, output_ext='.pdf', need_working_dir=False)
                 pdf2pdfhocr(input_file, tmp_output_hocr, output_file, rotations=rotations, font=font, draft=draft)
         clean_working_dir(working_dir)
         return output_file
     else:
         raise InvenioWebSubmitFileConverterError("It's impossible to generate HOCR output from PDF. OCROpus is not available.")
 
 def pdf2text(input_file, output_file=None, perform_ocr=True, ln='en', **dummy):
     """
     Return the text content in input_file.
     """
     input_file, output_file, dummy = prepare_io(input_file, output_file, '.txt', need_working_dir=False)
     execute_command(CFG_PATH_PDFTOTEXT, '-enc', 'UTF-8', '-eol', 'unix', '-nopgbrk', input_file, output_file)
     if perform_ocr and can_perform_ocr():
         ocred_output = pdf2hocr2pdf(input_file, ln=ln, extract_only_text=True)
         try:
             output = open(output_file, 'a')
             for row in open(ocred_output):
                 output.write(row)
             output.close()
         finally:
             silent_remove(ocred_output)
     return output_file
 
 
 def txt2text(input_file, output_file=None, **dummy):
     """
     Return the text content in input_file
     """
     input_file, output_file, dummy = prepare_io(input_file, output_file, '.txt', need_working_dir=False)
     shutil.copy(input_file, output_file)
     return output_file
 
 
 def html2text(input_file, output_file=None, **dummy):
     """
     Return the text content of an HTML/XML file.
     """
 
     class HTMLStripper(HTMLParser.HTMLParser):
 
         def __init__(self, output_file):
             HTMLParser.HTMLParser.__init__(self)
             self.output_file = output_file
 
         def handle_entityref(self, name):
             if name in entitydefs:
                 self.output_file.write(entitydefs[name].decode('latin1').encode('utf8'))
 
         def handle_data(self, data):
             if data.strip():
                 self.output_file.write(_RE_CLEAN_SPACES.sub(' ', data))
 
         def handle_charref(self, data):
             try:
                 self.output_file.write(unichr(int(data)).encode('utf8'))
             except:
                 pass
 
         def close(self):
             self.output_file.close()
             HTMLParser.HTMLParser.close(self)
 
     input_file, output_file, dummy = prepare_io(input_file, output_file, '.txt', need_working_dir=False)
     html_stripper = HTMLStripper(open(output_file, 'w'))
     for line in open(input_file):
         html_stripper.feed(line)
     html_stripper.close()
     return output_file
 
 
 def djvu2text(input_file, output_file=None, **dummy):
     """
     Return the text content in input_file.
     """
     input_file, output_file, dummy = prepare_io(input_file, output_file, '.txt', need_working_dir=False)
     execute_command(CFG_PATH_DJVUTXT, input_file, output_file)
     return output_file
 
 
 def djvu2ps(input_file, output_file=None, level=2, compress=True, **dummy):
     """
     Convert a djvu into a .ps[.gz]
     """
     if compress:
         input_file, output_file, working_dir = prepare_io(input_file, output_file, output_ext='.ps.gz')
         try:
             execute_command(CFG_PATH_DJVUPS, input_file, os.path.join(working_dir, 'output.ps'))
             execute_command(CFG_PATH_GZIP, '-c', os.path.join(working_dir, 'output.ps'), filename_out=output_file)
         finally:
             clean_working_dir(working_dir)
     else:
         try:
             input_file, output_file, working_dir = prepare_io(input_file, output_file, output_ext='.ps')
             execute_command(CFG_PATH_DJVUPS, '-level=%i' % level, input_file, output_file)
         finally:
             clean_working_dir(working_dir)
     return output_file
 
 
 def tiff2pdf(input_file, output_file=None, pdfopt=True, pdfa=True, perform_ocr=True, **args):
     """
     Convert a .tiff into a .pdf
     """
     if pdfa or pdfopt or perform_ocr:
         input_file, output_file, working_dir = prepare_io(input_file, output_file, '.pdf')
         try:
             partial_output = os.path.join(working_dir, 'output.pdf')
             execute_command(CFG_PATH_TIFF2PDF, '-o', partial_output, input_file)
             if perform_ocr:
                 pdf2hocr2pdf(partial_output, output_file, pdfopt=pdfopt, **args)
             elif pdfa:
                 pdf2pdfa(partial_output, output_file, pdfopt=pdfopt, **args)
             else:
                 pdfopt(partial_output, output_file)
         finally:
             clean_working_dir(working_dir)
     else:
         input_file, output_file, dummy = prepare_io(input_file, output_file, '.pdf', need_working_dir=False)
         execute_command(CFG_PATH_TIFF2PDF, '-o', output_file, input_file)
     return output_file
 
 
 def pstotext(input_file, output_file=None, **dummy):
     """
     Convert a .ps[.gz] into text.
     """
     input_file, output_file, working_dir = prepare_io(input_file, output_file, '.txt')
     try:
         if input_file.endswith('.gz'):
             new_input_file = os.path.join(working_dir, 'input.ps')
             execute_command(CFG_PATH_GUNZIP, '-c', input_file, filename_out=new_input_file)
             input_file = new_input_file
         execute_command(CFG_PATH_PSTOTEXT, '-output', output_file, input_file)
     finally:
         clean_working_dir(working_dir)
     return output_file
 
 
 def gzip(input_file, output_file=None, **dummy):
     """
     Compress a file.
     """
     input_file, output_file, dummy = prepare_io(input_file, output_file, '.gz', need_working_dir=False)
     execute_command(CFG_PATH_GZIP, '-c', input_file, filename_out=output_file)
     return output_file
 
 
 def gunzip(input_file, output_file=None, **dummy):
     """
     Uncompress a file.
     """
     from invenio.bibdocfile import decompose_file
     input_ext = decompose_file(input_file, skip_version=True)[2]
     if input_ext.endswith('.gz'):
         input_ext = input_ext[:-len('.gz')]
     else:
         input_ext = None
     input_file, output_file, dummy = prepare_io(input_file, output_file, input_ext, need_working_dir=False)
     execute_command(CFG_PATH_GUNZIP, '-c', input_file, filename_out=output_file)
     return output_file
 
 
 def prepare_io(input_file, output_file=None, output_ext=None, need_working_dir=True):
     """Clean input_file and the output_file."""
     from invenio.bibdocfile import decompose_file, normalize_format
     output_ext = normalize_format(output_ext)
     get_file_converter_logger().debug('Preparing IO for input=%s, output=%s, output_ext=%s' % (input_file, output_file, output_ext))
     if output_ext is None:
         if output_file is None:
             output_ext = '.tmp'
         else:
             output_ext = decompose_file(output_file, skip_version=True)[2]
     if output_file is None:
         try:
             (fd, output_file) = tempfile.mkstemp(suffix=output_ext, dir=CFG_TMPDIR)
             os.close(fd)
         except IOError, err:
             raise InvenioWebSubmitFileConverterError("It's impossible to create a temporary file: %s" % err)
     else:
         output_file = os.path.abspath(output_file)
         if os.path.exists(output_file):
             os.remove(output_file)
 
     if need_working_dir:
         try:
             working_dir = tempfile.mkdtemp(dir=CFG_TMPDIR, prefix='conversion')
         except IOError, err:
             raise InvenioWebSubmitFileConverterError("It's impossible to create a temporary directory: %s" % err)
 
         input_ext = decompose_file(input_file, skip_version=True)[2]
         new_input_file = os.path.join(working_dir, 'input' + input_ext)
         shutil.copy(input_file, new_input_file)
         input_file = new_input_file
     else:
         working_dir = None
         input_file = os.path.abspath(input_file)
 
     get_file_converter_logger().debug('IO prepared: input_file=%s, output_file=%s, working_dir=%s' % (input_file, output_file, working_dir))
     return (input_file, output_file, working_dir)
 
 
 def clean_working_dir(working_dir):
     """
     Remove the working_dir.
     """
     get_file_converter_logger().debug('Cleaning working_dir: %s' % working_dir)
     shutil.rmtree(working_dir)
 
 
 def execute_command(*args, **argd):
     """Wrapper to run_process_with_timeout."""
     get_file_converter_logger().debug("Executing: %s" % (args, ))
     args = [str(arg) for arg in args]
     res, stdout, stderr = run_process_with_timeout(args, cwd=argd.get('cwd'), filename_out=argd.get('filename_out'), filename_err=argd.get('filename_err'), sudo=argd.get('sudo'))
     get_file_converter_logger().debug('res: %s, stdout: %s, stderr: %s' % (res, stdout, stderr))
     if res != 0:
         message = "ERROR: Error in running %s\n stdout:\n%s\nstderr:\n%s\n" % (args, stdout, stderr)
         get_file_converter_logger().error(message)
         raise InvenioWebSubmitFileConverterError(message)
     return stdout
 
 
 def execute_command_with_stderr(*args, **argd):
     """Wrapper to run_process_with_timeout."""
     get_file_converter_logger().debug("Executing: %s" % (args, ))
     res, stdout, stderr = run_process_with_timeout(args, cwd=argd.get('cwd'), filename_out=argd.get('filename_out'), sudo=argd.get('sudo'))
     if res != 0:
         message = "ERROR: Error in running %s\n stdout:\n%s\nstderr:\n%s\n" % (args, stdout, stderr)
         get_file_converter_logger().error(message)
         raise InvenioWebSubmitFileConverterError(message)
     return stdout, stderr
 
 def silent_remove(path):
     """Remove without errors a path."""
     if os.path.exists(path):
         try:
             os.remove(path)
         except OSError:
             pass
 
 __CONVERSION_MAP = get_conversion_map()
 
 
 def main_cli():
     """
     main function when the library behaves as a normal CLI tool.
     """
     from invenio.bibdocfile import normalize_format
     parser = OptionParser()
     parser.add_option("-c", "--convert", dest="input_name",
                   help="convert the specified FILE", metavar="FILE")
     parser.add_option("-d", "--debug", dest="debug", action="store_true", help="Enable debug information")
     parser.add_option("--special-pdf2hocr2pdf", dest="ocrize", help="convert the given scanned PDF into a PDF with OCRed text", metavar="FILE")
     parser.add_option("-f", "--format", dest="output_format", help="the desired output format", metavar="FORMAT")
     parser.add_option("-o", "--output", dest="output_name", help="the desired output FILE (if not specified a new file will be generated with the desired output format)")
     parser.add_option("--without-pdfa", action="store_false", dest="pdf_a", default=True, help="don't force creation of PDF/A  PDFs")
     parser.add_option("--without-pdfopt", action="store_false", dest="pdfopt", default=True, help="don't force optimization of PDFs files")
     parser.add_option("--without-ocr", action="store_false", dest="ocr", default=True, help="don't force OCR")
     parser.add_option("--can-convert", dest="can_convert", help="display all the possible format that is possible to generate from the given format", metavar="FORMAT")
     parser.add_option("--is-ocr-needed", dest="check_ocr_is_needed", help="check if OCR is needed for the FILE specified", metavar="FILE")
     parser.add_option("-t", "--title", dest="title", help="specify the title (used when creating PDFs)", metavar="TITLE")
     parser.add_option("-l", "--language", dest="ln", help="specify the language (used when performing OCR, e.g. en, it, fr...)", metavar="LN", default='en')
     (options, dummy) = parser.parse_args()
     if options.debug:
         from logging import basicConfig
         basicConfig()
         get_file_converter_logger().setLevel(DEBUG)
     if options.can_convert:
         if options.can_convert:
             input_format = normalize_format(options.can_convert)
             if input_format == '.pdf':
                 if can_pdfopt(True):
                     print "PDF linearization supported"
                 else:
                     print "No PDF linearization support"
                 if can_pdfa(True):
                     print "PDF/A generation supported"
                 else:
                     print "No PDF/A generation support"
             if can_perform_ocr(True):
                 print "OCR supported"
             else:
                 print "OCR not supported"
             print 'Can convert from "%s" to:' % input_format[1:],
             for output_format in __CONVERSION_MAP:
                 if can_convert(input_format, output_format):
                     print '"%s"' % output_format[1:],
             print
     elif options.check_ocr_is_needed:
         print "Checking if OCR is needed on %s..." % options.check_ocr_is_needed,
         sys.stdout.flush()
         if guess_is_OCR_needed(options.check_ocr_is_needed):
             print "needed."
         else:
             print "not needed."
     elif options.ocrize:
         try:
             output = pdf2hocr2pdf(options.ocrize, output_file=options.output_name, title=options.title, ln=options.ln)
             print "Output stored in %s" % output
         except InvenioWebSubmitFileConverterError, err:
             print "ERROR: %s" % err
             sys.exit(1)
     else:
         try:
             if not options.output_name and not options.output_format:
                 parser.error("Either --format, --output should be specified")
             if not options.input_name:
                 parser.error("An input should be specified!")
             output = convert_file(options.input_name, output_file=options.output_name, output_format=options.output_format, pdfopt=options.pdfopt, pdfa=options.pdf_a, title=options.title, ln=options.ln)
             print "Output stored in %s" % output
         except InvenioWebSubmitFileConverterError, err:
             print "ERROR: %s" % err
             sys.exit(1)
 
 
 if __name__ == "__main__":
     main_cli()