diff --git a/INSTALL b/INSTALL
index ca90fedb3..d106a8a5b 100644
--- a/INSTALL
+++ b/INSTALL
@@ -1,864 +1,863 @@
 Invenio INSTALLATION
 ====================
 
 About
 =====
 
 This document specifies how to build, customize, and install Invenio
 v1.1.4 for the first time.  See RELEASE-NOTES if you are upgrading
 from a previous Invenio release.
 
 Contents
 ========
 
 0. Prerequisites
 1. Quick instructions for the impatient Invenio admin
 2. Detailed instructions for the patient Invenio admin
 
 0. Prerequisites
 ================
 
    Here is the software you need to have around before you
    start installing Invenio:
 
      a) Unix-like operating system.  The main development and
         production platforms for Invenio at CERN are GNU/Linux
         distributions Debian, Gentoo, Scientific Linux (aka RHEL),
         Ubuntu, but we also develop on Mac OS X.  Basically any Unix
         system supporting the software listed below should do.
 
         If you are using Debian GNU/Linux ``Lenny'' or later, then you
         can install most of the below-mentioned prerequisites and
         recommendations by running:
 
           $ sudo aptitude install python-dev apache2-mpm-prefork \
               mysql-server mysql-client python-mysqldb \
               python-4suite-xml python-simplejson python-xml \
               gnuplot poppler-utils \
               gs-common clisp gettext libapache2-mod-wsgi unzip \
               python-dateutil python-rdflib python-pyparsing \
               python-gnuplot python-magic pdftk html2text giflib-tools \
               pstotext netpbm python-pypdf python-chardet python-lxml \
               python-unidecode redis-server python-redis
 
         You may also want to install some of the following packages,
         if you have them available on your concrete architecture:
 
           $ sudo aptitude install sbcl cmucl pylint pychecker pyflakes \
               python-profiler python-epydoc libapache2-mod-xsendfile \
               openoffice.org python-utidylib python-beautifulsoup \
-              python-unidecode libhdf5-dev
+              libhdf5-dev
 
         (Note that if you use pip to manage your Python dependencies
         instead of operating system packages, please see the section
         (d) below on how to use pip instead of aptitude.)
 
         Moreover, you should install some Message Transfer Agent (MTA)
         such as Postfix so that Invenio can email notification
         alerts or registration information to the end users, contact
         moderators and reviewers of submitted documents, inform
         administrators about various runtime system information, etc:
 
           $ sudo aptitude install postfix
 
         After running the above-quoted aptitude command(s), you can
         proceed to configuring your MySQL server instance
         (max_allowed_packet in my.cnf, see item 0b below) and then to
         installing the Invenio software package in the section 1
         below.
 
         If you are using another operating system, then please
         continue reading the rest of this prerequisites section, and
         please consult our wiki pages for any concrete hints for your
         specific operating system.
         <https://twiki.cern.ch/twiki/bin/view/CDS/Invenio>
 
      b) MySQL server (may be on a remote machine), and MySQL client
         (must be available locally too).  MySQL versions 4.1 or 5.0
         are supported.  Please set the variable "max_allowed_packet"
         in your "my.cnf" init file to at least 4M.  (For sites such as
         INSPIRE, having 1M records with 10M citer-citee pairs in its
         citation map, you may need to increase max_allowed_packet to
         1G.)  You may perhaps also want to run your MySQL server
         natively in UTF-8 mode by setting "default-character-set=utf8"
         in various parts of your "my.cnf" file, such as in the
         "[mysql]" part and elsewhere; but this is not really required.
         <http://mysql.com/>
 
      c) Redis server (may be on a remote machine) for user session
         management and caching purposes.  By default, Invenio would
         use Redis to store sessions, so it is highly recommended to
         install it.  However, if you do not want to use Redis, you can
         change CFG_WEBSESSION_STORAGE settings in invenio-local.conf
         and MySQL will be used for session management instead.
         <http://redis.io/>
 
      d) Apache 2 server, with support for loading DSO modules, and
         optionally with SSL support for HTTPS-secure user
         authentication, and mod_xsendfile for off-loading file
         downloads away from Invenio processes to Apache.
           <http://httpd.apache.org/>
           <http://tn123.ath.cx/mod_xsendfile/>
 
      e) Python v2.6 or above:
           <http://python.org/>
         as well as the following Python modules:
           - (mandatory) MySQLdb (version >= 1.2.1_p2; see below)
              <http://sourceforge.net/projects/mysql-python>
           - (mandatory) Pyparsing, for document parsing
              <http://pyparsing.wikispaces.com/>
+          - (mandatory) unidecode, for ASCII representation of Unicode text:
+             <https://pypi.python.org/pypi/Unidecode>
           - (recommended) Redis connector:
              <https://pypi.python.org/pypi/redis/>
           - (recommended) Nydus, Redis consistent hashing connector:
              <https://github.com/disqus/nydus>
           - (recommended) python-dateutil, for complex date processing:
              <http://labix.org/python-dateutil>
           - (recommended) PyXML, for XML processing:
              <http://pyxml.sourceforge.net/topics/download.html>
           - (recommended) PyRXP, for very fast XML MARC processing:
              <http://www.reportlab.org/pyrxp.html>
           - (recommended) lxml, for XML/XLST processing:
              <http://lxml.de/>
           - (recommended) Gnuplot.Py, for producing graphs:
              <http://gnuplot-py.sourceforge.net/>
           - (recommended) Snowball Stemmer, for stemming:
              <http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz>
           - (recommended) py-editdist, for record merging:
              <http://www.mindrot.org/projects/py-editdist/>
           - (recommended) numpy, for citerank methods:
              <http://numpy.scipy.org/>
           - (recommended) magic, for full-text file handling:
              <http://www.darwinsys.com/file/>
           - (recommended) cerberus, extensible validation for Python dictionaries.
              <http://cerberus.readthedocs.org/>
           - (optional) libxml2-python, for XML/XLST processing:
              <ftp://xmlsoft.org/libxml2/python/>
           - (optional) chardet, for character encoding detection:
              <http://chardet.feedparser.org/>
           - (optional) 4suite, slower alternative to PyRXP and
              libxml2-python:
              <http://4suite.org/>
           - (optional) feedparser, for web journal creation:
              <http://feedparser.org/>
           - (optional) RDFLib, to use RDF ontologies and thesauri:
              <http://rdflib.net/>
           - (optional) mechanize, to run regression web test suite:
              <http://wwwsearch.sourceforge.net/mechanize/>
           - (optional) python-mock, mocking library for the test suite:
              <http://www.voidspace.org.uk/python/mock/>
           - (optional) utidylib, for HTML washing:
              <http://utidylib.berlios.de/>
           - (optional) Beautiful Soup, for HTML washing:
              <http://www.crummy.com/software/BeautifulSoup/>
           - (optional) Python Twitter (and its dependencies) if you want
              to use the Twitter Fetcher bibtasklet:
              <http://code.google.com/p/python-twitter/>
           - (optional) Python OpenID if you want to enable OpenID support
              for authentication:
              <http://pypi.python.org/pypi/python-openid/>
           - (optional) Python Rauth if you want to enable OAuth 1.0/2.0
              support for authentication (depends on Python-2.6 or later):
              <http://packages.python.org/rauth/>
-          - (optional) unidecode, for ASCII representation of Unicode
-             text:
-             <https://pypi.python.org/pypi/Unidecode>
           - (optional) libhdf5-7, libhdf5-dev, python-h5py, in order to
              run author disambiguation.
              <https://code.google.com/p/h5py/>
 
         Note that if you are using pip to install and manage your
         Python dependencies, then you can run:
 
           $ sudo pip install -r requirements.txt
           $ sudo pip install -r requirements-extras.txt
 
         to install all manadatory, recommended, and optional packages
         mentioned above.
 
      f) mod_wsgi Apache module.  Versions 3.x and above are
         recommended.
           <http://code.google.com/p/modwsgi/>
 
      g) If you want to be able to extract references from PDF fulltext
         files, then you need to install pdftotext version 3 at least.
           <http://poppler.freedesktop.org/>
           <http://www.foolabs.com/xpdf/home.html>
 
      h) If you want to be able to search for words in the fulltext
         files (i.e. to have fulltext indexing) or to stamp submitted
         files, then you need as well to install some of the following
         tools:
           - for Microsoft Office/OpenOffice.org document conversion:
                 OpenOffice.org
               <http://www.openoffice.org/>
           - for PDF file stamping: pdftk, pdf2ps
               <http://www.accesspdf.com/pdftk/>
               <http://www.cs.wisc.edu/~ghost/doc/AFPL/>
           - for PDF files: pdftotext or pstotext
               <http://poppler.freedesktop.org/>
               <http://www.foolabs.com/xpdf/home.html>
               <http://www.cs.wisc.edu/~ghost/doc/AFPL/>
           - for PostScript files: pstotext or ps2ascii
               <http://www.cs.wisc.edu/~ghost/doc/AFPL/>
           - for DjVu creation, elaboration: DjVuLibre
               <http://djvu.sourceforge.net>
           - to perform OCR: OCRopus (tested only with release 0.3.1)
               <http://code.google.com/p/ocropus/>
           - to perform different image elaborations: ImageMagick
               <http://www.imagemagick.org/>
           - to generate PDF after OCR: netpbm, ReportLab and pyPdf or pyPdf2
               <http://netpbm.sourceforge.net/>
               <http://www.reportlab.org/rl_toolkit.html>
               <http://pybrary.net/pyPdf/>
               <http://knowah.github.io/PyPDF2/>
 
      i) If you have chosen to install fast XML MARC Python processors
         in the step d) above, then you have to install the parsers
         themselves:
           - (optional) 4suite:
              <http://4suite.org/>
 
      j) (recommended) Gnuplot, the command-line driven interactive
         plotting program.  It is used to display download and citation
         history graphs on the Detailed record pages on the web
         interface.  Note that Gnuplot must be compiled with PNG output
         support, that is, with the GD library.  Note also that Gnuplot
         is not required, only recommended.
           <http://www.gnuplot.info/>
 
      k) (recommended) A Common Lisp implementation, such as CLISP,
         SBCL or CMUCL.  It is used for the web server log analysing
         tool and the metadata checking program.  Note that any of the
         three implementations CLISP, SBCL, or CMUCL will do.  CMUCL
         produces fastest machine code, but it does not support UTF-8
         yet.  Pick up CLISP if you don't know what to do.  Note that a
         Common Lisp implementation is not required, only recommended.
           <http://clisp.cons.org/>
           <http://www.cons.org/cmucl/>
           <http://sbcl.sourceforge.net/>
 
      l) GNU gettext, a set of tools that makes it possible to
         translate the application in multiple languages.
            <http://www.gnu.org/software/gettext/>
         This is available by default on many systems.
 
      m) (recommended) xlwt 0.7.2, Library to create spreadsheet files
         compatible with MS Excel 97/2000/XP/2003 XLS files, on any
         platform, with Python 2.3 to 2.6
            <http://pypi.python.org/pypi/xlwt>
 
      n) (recommended) matplotlib 1.0.0 is a python 2D plotting library
              which produces publication quality figures in a variety of
         hardcopy formats and interactive environments across
         platforms. matplotlib can be used in python scripts, the
         python and ipython  shell (ala MATLAB®        or Mathematica®),
         web application servers, and six graphical user        interface
         toolkits. It is used to generate pie graphs in the custom
         summary query (WebStat)
            <http://matplotlib.sourceforge.net>
 
      o) (optional) FFmpeg, an open-source tools an libraries collection
         to convert video and audio files. It makes use of both internal
         as well as external libraries to generate videos for the web, such
         as Theora, WebM and H.264 out of almost any thinkable video input.
         FFmpeg is needed to run video related modules and submission workflows
         in Invenio. The minimal configuration of ffmpeg for the Invenio demo site
         requires a number of external libraries. It is highly recommended
         to remove all installed versions and packages that are comming with
         various Linux distributions and install the latest versions from
         sources. Additionally, you will need the Mediainfo Library for multimedia
         metadata handling.
         Minimum libraries for the demo site:
         - the ffmpeg multimedia encoder tools
           <http://ffmpeg.org/>
         - a library for jpeg images needed for thumbnail extraction
           <http://www.openjpeg.org/>
         - a library for the ogg container format, needed for Vorbis and Theora
           <http://www.xiph.org/ogg/>
         - the OGG Vorbis audi codec library
           <http://www.vorbis.com/>
         - the OGG Theora video codec library
           <http://www.theora.org/>
         - the WebM video codec library
           <http://www.webmproject.org/>
         - the mediainfo library for multimedia metadata
           <http://mediainfo.sourceforge.net/>
         Recommended for H.264 video (!be aware of licensing issues!):
         - a library for H.264 video encoding
           <http://www.videolan.org/developers/x264.html>
         - a library for Advanced Audi Coding
           <http://www.audiocoding.com/faac.html>
         - a library for MP3 encoding
           <http://lame.sourceforge.net/>
 
    Note that the configure script checks whether you have all the
    prerequisite software installed and that it won't let you continue
    unless everything is in order.  It also warns you if it cannot find
    some optional but recommended software.
 
 
 1. Quick instructions for the impatient Invenio admin
 =========================================================
 
 1a. Installation
 ----------------
 
       $ cd $HOME/src/
       $ wget http://invenio-software.org/download/invenio-1.1.4.tar.gz
       $ wget http://invenio-software.org/download/invenio-1.1.4.tar.gz.md5
       $ wget http://invenio-software.org/download/invenio-1.1.4.tar.gz.sig
       $ md5sum -c invenio-1.1.4.tar.gz.md5
       $ gpg --verify invenio-1.1.4.tar.gz.sig invenio-1.1.4.tar.gz
       $ tar xvfz invenio-1.1.4.tar.gz
       $ cd invenio-1.1.4
       $ ./configure
       $ make
       $ make install
       $ make install-mathjax-plugin    ## optional
       $ make install-jquery-plugins    ## optional
       $ make install-ckeditor-plugin   ## optional
       $ make install-pdfa-helper-files ## optional
       $ make install-mediaelement      ## optional
       $ make install-solrutils         ## optional
       $ make install-js-test-driver    ## optional
 
 1b. Configuration
 -----------------
 
       $ sudo chown -R www-data.www-data /opt/invenio
       $ sudo -u www-data emacs /opt/invenio/etc/invenio-local.conf
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --update-all
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-tables
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-bibfield-conf
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-webstat-conf
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-apache-conf
       $ sudo /etc/init.d/apache2 restart
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --check-openoffice
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-demo-site
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-demo-records
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-unit-tests
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-regression-tests
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-web-tests
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --remove-demo-records
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --drop-demo-site
       $ firefox http://your.site.com/help/admin/howto-run
 
 2. Detailed instructions for the patient Invenio admin
 ==========================================================
 
 2a. Installation
 ----------------
 
     The Invenio uses standard GNU autoconf method to build and
     install its files.  This means that you proceed as follows:
 
       $ cd $HOME/src/
 
           Change to a directory where we will build the Invenio
           sources.  (The built files will be installed into different
           "target" directories later.)
 
       $ wget http://invenio-software.org/download/invenio-1.1.4.tar.gz
       $ wget http://invenio-software.org/download/invenio-1.1.4.tar.gz.md5
       $ wget http://invenio-software.org/download/invenio-1.1.4.tar.gz.sig
 
           Fetch Invenio source tarball from the distribution server,
           together with MD5 checksum and GnuPG cryptographic signature
           files useful for verifying the integrity of the tarball.
 
       $ md5sum -c invenio-1.1.4.tar.gz.md5
 
           Verify MD5 checksum.
 
       $ gpg --verify invenio-1.1.4.tar.gz.sig invenio-1.1.4.tar.gz
 
           Verify GnuPG cryptographic signature.  Note that you may
           first have to import my public key into your keyring, if you
           haven't done that already:
             $ gpg --keyserver pool.sks-keyservers.net --recv-key 0xBA5A2B67
           The output of the gpg --verify command should then read:
             Good signature from "Tibor Simko <tibor@simko.info>"
           You can safely ignore any trusted signature certification
           warning that may follow after the signature has been
           successfully verified.
 
       $ tar xvfz invenio-1.1.4.tar.gz
 
           Untar the distribution tarball.
 
       $ cd invenio-1.1.4
 
           Go to the source directory.
 
       $ ./configure
 
           Configure Invenio software for building on this specific
           platform.  You can use the following optional parameters:
 
               --prefix=/opt/invenio
 
                  Optionally, specify the Invenio general
                  installation directory (default is /opt/invenio).
                  It will contain command-line binaries and program
                  libraries containing the core Invenio
                  functionality, but also store web pages, runtime log
                  and cache information, document data files, etc.
                  Several subdirs like `bin', `etc', `lib', or `var'
                  will be created inside the prefix directory to this
                  effect.  Note that the prefix directory should be
                  chosen outside of the Apache htdocs tree, since only
                  one its subdirectory (prefix/var/www) is to be
                  accessible directly via the Web (see below).
 
                  Note that Invenio won't install to any other
                  directory but to the prefix mentioned in this
                  configuration line.
 
               --with-python=/opt/python/bin/python2.7
 
                  Optionally, specify a path to some specific Python
                  binary.  This is useful if you have more than one
                  Python installation on your system.  If you don't set
                  this option, then the first Python that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-mysql=/opt/mysql/bin/mysql
 
                  Optionally, specify a path to some specific MySQL
                  client binary.  This is useful if you have more than
                  one MySQL installation on your system.  If you don't
                  set this option, then the first MySQL client
                  executable that will be found in your PATH will be
                  chosen for running Invenio.
 
               --with-clisp=/opt/clisp/bin/clisp
 
                  Optionally, specify a path to CLISP executable.  This
                  is useful if you have more than one CLISP
                  installation on your system.  If you don't set this
                  option, then the first executable that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-cmucl=/opt/cmucl/bin/lisp
 
                  Optionally, specify a path to CMUCL executable.  This
                  is useful if you have more than one CMUCL
                  installation on your system.  If you don't set this
                  option, then the first executable that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-sbcl=/opt/sbcl/bin/sbcl
 
                  Optionally, specify a path to SBCL executable.  This
                  is useful if you have more than one SBCL
                  installation on your system.  If you don't set this
                  option, then the first executable that will be found
                  in your PATH will be chosen for running Invenio.
 
               --with-openoffice-python
 
                  Optionally, specify the path to the Python interpreter
                  embedded with OpenOffice.org. This is normally not
                  contained in the normal path. If you don't specify this
                  it won't be possible to use OpenOffice.org to convert from and
                  to Microsoft Office and OpenOffice.org documents.
 
           This configuration step is mandatory.  Usually, you do this
           step only once.
 
           (Note that if you are building Invenio not from a
           released tarball, but from the Git sources, then you have to
           generate the configure file via autotools:
 
               $ sudo aptitude install automake1.9 autoconf
               $ aclocal-1.9
               $ automake-1.9 -a
               $ autoconf
 
           after which you proceed with the usual configure command.)
 
       $ make
 
           Launch the Invenio build.  Since many messages are printed
           during the build process, you may want to run it in a
           fast-scrolling terminal such as rxvt or in a detached screen
           session.
 
           During this step all the pages and scripts will be
           pre-created and customized based on the config you have
           edited in the previous step.
 
           Note that on systems such as FreeBSD or Mac OS X you have to
           use GNU make ("gmake") instead of "make".
 
       $ make install
 
           Install the web pages, scripts, utilities and everything
           needed for Invenio runtime into respective installation
           directories, as specified earlier by the configure command.
 
           Note that if you are installing Invenio for the first
           time, you will be asked to create symbolic link(s) from
           Python's site-packages system-wide directory(ies) to the
           installation location.  This is in order to instruct Python
           where to find Invenio's Python files.  You will be
           hinted as to the exact command to use based on the
           parameters you have used in the configure command.
 
       $ make install-mathjax-plugin  ## optional
 
           This will automatically download and install in the proper
           place MathJax, a JavaScript library to render LaTeX formulas
           in the client browser.
 
           Note that in order to enable the rendering you will have to
           set the variable CFG_WEBSEARCH_USE_MATHJAX_FOR_FORMATS in
           invenio-local.conf to a suitable list of output format
           codes. For example:
           CFG_WEBSEARCH_USE_MATHJAX_FOR_FORMATS = hd,hb
 
       $ make install-jquery-plugins  ## optional
 
           This will automatically download and install in the proper
           place jQuery and related plugins.  They are used for AJAX
           applications such as the record editor.
 
           Note that `unzip' is needed when installing jquery plugins.
 
       $ make install-ckeditor-plugin  ## optional
 
           This will automatically download and install in the proper
           place CKeditor, a WYSIWYG Javascript-based editor (e.g. for
           the WebComment module).
 
           Note that in order to enable the editor you have to set the
           CFG_WEBCOMMENT_USE_RICH_EDITOR to True.
 
       $ make install-pdfa-helper-files ## optional
 
           This will automatically download and install in the proper
           place the helper files needed to create PDF/A files out of
           existing PDF files.
 
       $ make install-mediaelement ## optional
 
           This will automatically download and install the MediaElementJS
           HTML5 video player that is needed for videos on the DEMO site.
 
       $ make install-solrutils  ## optional
 
           This will automatically download and install a Solr instance
           which can be used for full-text searching.  See CFG_SOLR_URL
           variable in the invenio.conf.  Note that the admin later has
           to take care of running init.d scripts which would start the
           Solr instance automatically.
 
       $ make install-js-test-driver  ## optional
 
           This will automatically download and install JsTestDriver
           which is needed to run JS unit tests. Recommended for developers.
 
 2b. Configuration
 -----------------
 
     Once the basic software installation is done, we proceed to
     configuring your Invenio system.
 
       $ sudo chown -R www-data.www-data /opt/invenio
 
           For the sake of simplicity, let us assume that your Invenio
           installation will run under the `www-data' user process
           identity.  The above command changes ownership of installed
           files to www-data, so that we shall run everything under
           this user identity from now on.
 
           For production purposes, you would typically enable Apache
           server to read all files from the installation place but to
           write only to the `var' subdirectory of your installation
           place.  You could achieve this by configuring Unix directory
           group permissions, for example.
 
       $ sudo -u www-data emacs /opt/invenio/etc/invenio-local.conf
 
           Customize your Invenio installation.  Please read the
           'invenio.conf' file located in the same directory that
           contains the vanilla default configuration parameters of
           your Invenio installation.  If you want to customize some of
           these parameters, you should create a file named
           'invenio-local.conf' in the same directory where
           'invenio.conf' lives and you should write there only the
           customizations that you want to be different from the
           vanilla defaults.
 
           Here is a realistic, minimalist, yet production-ready
           example of what you would typically put there:
 
              $ cat /opt/invenio/etc/invenio-local.conf
              [Invenio]
              CFG_SITE_NAME = John Doe's Document Server
              CFG_SITE_NAME_INTL_fr = Serveur des Documents de John Doe
              CFG_SITE_URL = http://your.site.com
              CFG_SITE_SECURE_URL = https://your.site.com
              CFG_SITE_ADMIN_EMAIL = john.doe@your.site.com
              CFG_SITE_SUPPORT_EMAIL = john.doe@your.site.com
              CFG_WEBALERT_ALERT_ENGINE_EMAIL = john.doe@your.site.com
              CFG_WEBCOMMENT_ALERT_ENGINE_EMAIL = john.doe@your.site.com
              CFG_WEBCOMMENT_DEFAULT_MODERATOR = john.doe@your.site.com
              CFG_BIBAUTHORID_AUTHOR_TICKET_ADMIN_EMAIL = john.doe@your.site.com
              CFG_BIBCATALOG_SYSTEM_EMAIL_ADDRESS = john.doe@your.site.com
              CFG_DATABASE_HOST = localhost
              CFG_DATABASE_NAME = invenio
              CFG_DATABASE_USER = invenio
              CFG_DATABASE_PASS = my123p$ss
              CFG_BIBDOCFILE_ENABLE_BIBDOCFSINFO_CACHE = 1
 
           You should override at least the parameters mentioned above
           in order to define some very essential runtime parameters
           such as the name of your document server (CFG_SITE_NAME and
           CFG_SITE_NAME_INTL_*), the visible URL of your document
           server (CFG_SITE_URL and CFG_SITE_SECURE_URL), the email
           address of the local Invenio administrator, comment
           moderator, and alert engine (CFG_SITE_SUPPORT_EMAIL,
           CFG_SITE_ADMIN_EMAIL, etc), and last but not least your
           database credentials (CFG_DATABASE_*).
 
           If this is a first installation of Invenio it is recommended
           you set the CFG_BIBDOCFILE_ENABLE_BIBDOCFSINFO_CACHE
           variable to 1. If this is instead an upgrade from an existing
           installation don't add it until you have run:
           $ bibdocfile --fix-bibdocfsinfo-cache .
 
           The Invenio system will then read both the default
           invenio.conf file and your customized invenio-local.conf
           file and it will override any default options with the ones
           you have specifield in your local file.  This cascading of
           configuration parameters will ease your future upgrades.
 
           If you want to have multiple Invenio instances for distributed
           video encoding, you need to share the same configuration amongs
           them and make some of the folders of the Invenio installation
           available for all nodes.
 
             Configure the allowed tasks for every node:
 
               CFG_BIBSCHED_NODE_TASKS = {
                   "hostname_machine1" : ["bibindex", "bibupload",
                       "bibreformat","webcoll", "bibtaskex", "bibrank",
                       "oaiharvest", "oairepositoryupdater", "inveniogc",
                       "webstatadmin", "bibclassify", "bibexport",
                       "dbdump", "batchuploader", "bibauthorid", "bibtasklet"],
                   "hostname_machine2" : ['bibencode',]
               }
 
             Share the following directories among Invenio instances:
 
               /var/tmp-shared
                  hosts video uploads in a temporary form
               /var/tmp-shared/bibencode/jobs
                  hosts new job files for the video encoding daemon
               /var/tmp-shared/bibencode/jobs/done
                  hosts job files that have been processed by the daemon
               /var/data/files
                  hosts fulltext and media files associated to records
               /var/data/submit
                  hosts files created during submissions
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --update-all
 
           Make the rest of the Invenio system aware of your
           invenio-local.conf changes.  This step is mandatory each
           time you edit your conf files.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-tables
 
           If you are installing Invenio for the first time, you
           have to create database tables.
 
           Note that this step checks for potential problems such as
           the database connection rights and may ask you to perform
           some more administrative steps in case it detects a problem.
           Notably, it may ask you to set up database access
           permissions, based on your configure values.
 
           If you are installing Invenio for the first time, you
           have to create a dedicated database on your MySQL server
           that the Invenio can use for its purposes.  Please
           contact your MySQL administrator and ask him to execute the
           commands this step proposes you.
 
           At this point you should now have successfully completed the
           "make install" process.  We continue by setting up the
           Apache web server.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-bibfield-conf
 
           Load the configuration file of the BibField module.  It will
           create `bibfield_config.py' file.  (FIXME: When BibField
           becomes essential part of Invenio, this step should be later
           automatised so that people do not have to run it manually.)
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-webstat-conf
 
           Load the configuration file of webstat module. It will create
           the tables in the database for register customevents, such as
           basket hits.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-apache-conf
 
           Running this command will generate Apache virtual host
           configurations matching your installation.  You will be
           instructed to check created files (usually they are located
           under /opt/invenio/etc/apache/) and edit your httpd.conf
           to activate Invenio virtual hosts.
 
           If you are using Debian GNU/Linux ``Lenny'' or later, then
           you can do the following to create your SSL certificate and
           to activate your Invenio vhosts:
 
               ## make SSL certificate:
               $ sudo aptitude install ssl-cert
               $ sudo mkdir /etc/apache2/ssl
               $ sudo /usr/sbin/make-ssl-cert /usr/share/ssl-cert/ssleay.cnf \
                      /etc/apache2/ssl/apache.pem
 
               ## add Invenio web sites:
               $ sudo ln -s /opt/invenio/etc/apache/invenio-apache-vhost.conf \
                            /etc/apache2/sites-available/invenio
               $ sudo ln -s /opt/invenio/etc/apache/invenio-apache-vhost-ssl.conf \
                            /etc/apache2/sites-available/invenio-ssl
 
               ## disable Debian's default web site:
               $ sudo /usr/sbin/a2dissite default
 
               ## enable Invenio web sites:
               $ sudo /usr/sbin/a2ensite invenio
               $ sudo /usr/sbin/a2ensite invenio-ssl
 
               ## enable SSL module:
               $ sudo /usr/sbin/a2enmod ssl
 
               ## if you are using xsendfile module, enable it too:
               $ sudo /usr/sbin/a2enmod xsendfile
 
           If you are using another operating system, you should do the
           equivalent, for example edit your system-wide httpd.conf and
           put the following include statements:
 
              Include /opt/invenio/etc/apache/invenio-apache-vhost.conf
              Include /opt/invenio/etc/apache/invenio-apache-vhost-ssl.conf
 
           Note that you may need to adapt generated vhost file
           snippets to match your concrete operating system specifics.
           For example, the generated configuration snippet will
           preload Invenio WSGI daemon application upon Apache start up
           for faster site response.  The generated configuration
           assumes that you are using mod_wsgi version 3 or later.  If
           you are using the old legacy mod_wsgi version 2, then you
           would need to comment out the WSGIImportScript directive
           from the generated snippet, or else move the WSGI daemon
           setup to the top level, outside of the VirtualHost section.
 
           Note also that you may want to tweak the generated Apache
           vhost snippet for performance reasons, especially with
           respect to WSGIDaemonProcess parameters.  For example, you
           can increase the number of processes from the default value
           `processes=5' if you have lots of RAM and if many concurrent
           users may access your site in parallel.  However, note that
           you must use `threads=1' there, because Invenio WSGI daemon
           processes are not fully thread safe yet.  This may change in
           the future.
 
       $ sudo /etc/init.d/apache2 restart
 
           Please ask your webserver administrator to restart the
           Apache server after the above "httpd.conf" changes.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --check-openoffice
 
           If you plan to support MS Office or Open Document Format
           files in your installation, you should check whether
           LibreOffice or OpenOffice.org is well integrated with
           Invenio by running the above command.  You may be asked to
           create a temporary directory for converting office files
           with special ownership (typically as user nobody) and
           permissions.  Note that you can do this step later.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --create-demo-site
 
           This step is recommended to test your local Invenio
           installation.  It should give you our "Atlantis Institute of
           Science" demo installation, exactly as you see it at
           <http://invenio-demo.cern.ch/>.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --load-demo-records
 
           Optionally, load some demo records to be able to test
           indexing and searching of your local Invenio demo
           installation.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-unit-tests
 
           Optionally, you can run the unit test suite to verify the
           unit behaviour of your local Invenio installation.  Note
           that this command should be run only after you have
           installed the whole system via `make install'.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-regression-tests
 
            Optionally, you can run the full regression test suite to
            verify the functional behaviour of your local Invenio
            installation.  Note that this command requires to have
            created the demo site and loaded the demo records.  Note
            also that running the regression test suite may alter the
            database content with junk data, so that rebuilding the
            demo site is strongly recommended afterwards.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --run-web-tests
 
            Optionally, you can run additional automated web tests
            running in a real browser.  This requires to have Firefox
            with the Selenium IDE extension installed.
            <http://en.www.mozilla.com/en/firefox/>
            <http://selenium-ide.openqa.org/>
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --remove-demo-records
 
           Optionally, remove the demo records loaded in the previous
           step, but keeping otherwise the demo collection, submission,
           format, and other configurations that you may reuse and
           modify for your own production purposes.
 
       $ sudo -u www-data /opt/invenio/bin/inveniocfg --drop-demo-site
 
           Optionally, drop also all the demo configuration so that
           you'll end up with a completely blank Invenio system.
           However, you may want to find it more practical not to drop
           the demo site configuration but to start customizing from
           there.
 
       $ firefox http://your.site.com/help/admin/howto-run
 
           In order to start using your Invenio installation, you
           can start indexing, formatting and other daemons as
           indicated in the "HOWTO Run" guide on the above URL.  You
           can also use the Admin Area web interfaces to perform
           further runtime configurations such as the definition of
           data collections, document types, document formats, word
           indexes, etc.
 
       $ sudo ln -s /opt/invenio/etc/bash_completion.d/inveniocfg \
                    /etc/bash_completion.d/inveniocfg
 
            Optionally, if you are using Bash shell completion, then
            you may want to create the above symlink in order to
            configure completion for the inveniocfg command.
 
 Good luck and thanks for choosing Invenio.
 
       - Invenio Development Team
 
         Email: info@invenio-software.org
         IRC: #invenio on irc.freenode.net
         Twitter: http://twitter.com/inveniosoftware
         Github: http://github.com/inveniosoftware
         URL: http://invenio-software.org
diff --git a/configure-tests.py b/configure-tests.py
index 4793988af..07f1c4ec3 100644
--- a/configure-tests.py
+++ b/configure-tests.py
@@ -1,532 +1,513 @@
 ## This file is part of Invenio.
 ## Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """
 Test the suitability of Python core and the availability of various
 Python modules for running Invenio.  Warn the user if there are
 eventual troubles.  Exit status: 0 if okay, 1 if not okay.  Useful for
 running from configure.ac.
 """
 
 ## minimally recommended/required versions:
 cfg_min_python_version = "2.6"
 cfg_max_python_version = "2.9.9999"
 cfg_min_mysqldb_version = "1.2.1_p2"
 
 ## 0) import modules needed for this testing:
 import string
 import sys
 import getpass
 import subprocess
 import re
 
 error_messages = []
 warning_messages = []
 
 def wait_for_user(msg):
     """Print MSG and prompt user for confirmation."""
     try:
         raw_input(msg)
     except KeyboardInterrupt:
         print "\n\nInstallation aborted."
         sys.exit(1)
     except EOFError:
         print " (continuing in batch mode)"
         return
 
 ## 1) check Python version:
 if sys.version < cfg_min_python_version:
     error_messages.append(
     """
     *******************************************************
     ** ERROR: TOO OLD PYTHON DETECTED: %s
     *******************************************************
     ** You seem to be using a too old version of Python. **
     ** You must use at least Python %s.                 **
     **                                                   **
     ** Note that if you have more than one Python        **
     ** installed on your system, you can specify the     **
     ** --with-python configuration option to choose      **
     ** a specific (e.g. non system wide) Python binary.  **
     **                                                   **
     ** Please upgrade your Python before continuing.     **
     *******************************************************
     """ % (string.replace(sys.version, "\n", ""), cfg_min_python_version)
     )
 
 if sys.version > cfg_max_python_version:
     error_messages.append(
     """
     *******************************************************
     ** ERROR: TOO NEW PYTHON DETECTED: %s
     *******************************************************
     ** You seem to be using a too new version of Python. **
     ** You must use at most Python %s.             **
     **                                                   **
     ** Perhaps you have downloaded and are installing an **
     ** old Invenio version?  Please look for more recent **
     ** Invenio version or please contact the development **
     ** team at <info@invenio-software.org> about this    **
     ** problem.                                          **
     **                                                   **
     ** Installation aborted.                             **
     *******************************************************
     """ % (string.replace(sys.version, "\n", ""), cfg_max_python_version)
     )
 
 ## 2) check for required modules:
 try:
     import MySQLdb
     import base64
     import cPickle
     import cStringIO
     import cgi
     import copy
     import fileinput
     import getopt
     import sys
     if sys.hexversion < 0x2060000:
         import md5
     else:
         import hashlib
     import marshal
     import os
     import pyparsing
     import signal
     import tempfile
     import time
     import traceback
     import unicodedata
     import urllib
     import zlib
     import wsgiref
+    import unidecode
 except ImportError, msg:
     error_messages.append("""
     *************************************************
     ** IMPORT ERROR %s
     *************************************************
     ** Perhaps you forgot to install some of the   **
     ** prerequisite Python modules?  Please look   **
     ** at our INSTALL file for more details and    **
     ** fix the problem before continuing!          **
     *************************************************
     """ % msg
     )
 
 ## 3) check for recommended modules:
 try:
     import rdflib
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that rdflib is needed only if you plan     **
     ** to work with the automatic classification of    **
     ** documents based on RDF-based taxonomies.        **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import pyRXP
 except ImportError, msg:
     warning_messages.append("""
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that PyRXP is not really required but      **
     ** we recommend it for fast XML MARC parsing.      **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import dateutil
 except ImportError, msg:
     warning_messages.append("""
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that dateutil is not really required but   **
     ** we recommend it for user-friendly date          **
     ** parsing.                                        **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import libxml2
 except ImportError, msg:
     warning_messages.append("""
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that libxml2 is not really required but    **
     ** we recommend it for XML metadata conversions    **
     ** and for fast XML parsing.                       **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import libxslt
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that libxslt is not really required but    **
     ** we recommend it for XML metadata conversions.   **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import Gnuplot
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that Gnuplot.py is not really required but **
     ** we recommend it in order to have nice download  **
     ** and citation history graphs on Detailed record  **
     ** pages.                                          **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import rauth
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that python-rauth is not really required   **
     ** but we recommend it in order to enable oauth    **
     ** based authentication.                           **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import openid
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that python-openid is not really required  **
     ** but we recommend it in order to enable OpenID   **
     ** based authentication.                           **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     import magic
     if not hasattr(magic, "open"):
         raise StandardError
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that magic module is not really required   **
     ** but we recommend it in order to have detailed   **
     ** content information about fulltext files.       **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 except StandardError:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING python-magic
     *****************************************************
     ** The python-magic package you installed is not   **
     ** the one supported by Invenio. Please refer to   **
     ** the INSTALL file for more details.              **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """
     )
 
 try:
     import reportlab
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that reportlab module is not really        **
     ** required, but we recommend it you want to       **
     ** enrich PDF with OCR information.                **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
 try:
     try:
         import PyPDF2
     except ImportError:
         import pyPdf
 except ImportError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** IMPORT WARNING %s
     *****************************************************
     ** Note that pyPdf or pyPdf2 module is not really  **
     ** required, but we recommend it you want to       **
     ** enrich PDF with OCR information.                **
     **                                                 **
     ** You can safely continue installing Invenio      **
     ** now, and add this module anytime later.  (I.e.  **
     ** even after your Invenio installation is put     **
     ** into production.)                               **
     *****************************************************
     """ % msg
     )
 
-try:
-    import unidecode
-except ImportError, msg:
-    warning_messages.append(
-    """
-    *****************************************************
-    ** IMPORT WARNING %s
-    *****************************************************
-    ** Note that unidecode module is not really        **
-    ** required, but we recommend it you want to       **
-    ** introduce smarter author names matching.        **
-    **                                                 **
-    ** You can safely continue installing Invenio      **
-    ** now, and add this module anytime later.  (I.e.  **
-    ** even after your Invenio installation is put     **
-    ** into production.)                               **
-    *****************************************************
-    """ % msg
-    )
-
 ## 4) check for versions of some important modules:
 if MySQLdb.__version__ < cfg_min_mysqldb_version:
     error_messages.append(
     """
     *****************************************************
     ** ERROR: PYTHON MODULE MYSQLDB %s DETECTED
     *****************************************************
     ** You have to upgrade your MySQLdb to at least    **
     ** version %s.  You must fix this problem    **
     ** before continuing.  Please see the INSTALL file **
     ** for more details.                               **
     *****************************************************
     """ % (MySQLdb.__version__, cfg_min_mysqldb_version)
     )
 
 try:
     import Stemmer
     try:
         from Stemmer import algorithms
     except ImportError, msg:
         error_messages.append(
         """
         *****************************************************
         ** ERROR: STEMMER MODULE PROBLEM %s
         *****************************************************
         ** Perhaps you are using an old Stemmer version?   **
         ** You must either remove your old Stemmer or else **
         ** upgrade to Snowball Stemmer
         **   <http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz>
         ** before continuing.  Please see the INSTALL file **
         ** for more details.                               **
         *****************************************************
         """ % (msg)
         )
 except ImportError:
     pass # no prob, Stemmer is optional
 
 ## 5) check for Python.h (needed for intbitset):
 try:
     from distutils.sysconfig import get_python_inc
     path_to_python_h = get_python_inc() + os.sep + 'Python.h'
     if not os.path.exists(path_to_python_h):
         raise StandardError, "Cannot find %s" % path_to_python_h
 except StandardError, msg:
     error_messages.append(
     """
     *****************************************************
     ** ERROR: PYTHON HEADER FILE ERROR %s
     *****************************************************
     ** You do not seem to have Python developer files  **
     ** installed (such as Python.h).  Some operating   **
     ** systems provide these in a separate Python      **
     ** package called python-dev or python-devel.      **
     ** You must install such a package before          **
     ** continuing the installation process.            **
     *****************************************************
     """ % (msg)
     )
 
 ## 6) Check if ffmpeg is installed and if so, with the minimum configuration for bibencode
 try:
     try:
         process = subprocess.Popen('ffprobe', stderr=subprocess.PIPE, stdout=subprocess.PIPE)
     except OSError:
         raise StandardError, "FFMPEG/FFPROBE does not seem to be installed!"
     returncode = process.wait()
     output = process.communicate()[1]
     RE_CONFIGURATION = re.compile("(--enable-[a-z0-9\-]*)")
     CONFIGURATION_REQUIRED = (
                 '--enable-gpl',
                 '--enable-version3',
                 '--enable-nonfree',
                 '--enable-libtheora',
                 '--enable-libvorbis',
                 '--enable-libvpx',
                 '--enable-libopenjpeg'
                 )
     options = RE_CONFIGURATION.findall(output)
     if sys.version_info < (2, 6):
         import sets
         s = sets.Set(CONFIGURATION_REQUIRED)
         if not s.issubset(options):
             raise StandardError, options.difference(s)
     else:
         if not set(CONFIGURATION_REQUIRED).issubset(options):
             raise StandardError, set(CONFIGURATION_REQUIRED).difference(options)
 except StandardError, msg:
     warning_messages.append(
     """
     *****************************************************
     ** WARNING: FFMPEG CONFIGURATION MISSING %s
     *****************************************************
     ** You do not seem to have FFmpeg configured with  **
     ** the minimum video codecs to run the demo site.  **
     ** Please install the necessary libraries and      **
     ** re-install FFmpeg according to the Invenio      **
     ** installation manual (INSTALL).                  **
     *****************************************************
     """ % (msg)
     )
 
 if warning_messages:
     print """
     ******************************************************
     ** WARNING MESSAGES                                 **
     ******************************************************
     """
     for warning in warning_messages:
         print warning
 
 if error_messages:
     print """
     ******************************************************
     ** ERROR MESSAGES                                   **
     ******************************************************
     """
     for error in error_messages:
         print error
 
 if warning_messages and error_messages:
     print """
     There were %(n_err)s error(s) found that you need to solve.
     Please see above, solve them, and re-run configure.
     Note that there are also %(n_wrn)s warnings you may want
     to look into.  Aborting the installation.
     """ % {'n_wrn': len(warning_messages),
            'n_err': len(error_messages)}
 
     sys.exit(1)
 elif error_messages:
     print """
     There were %(n_err)s error(s) found that you need to solve.
     Please see above, solve them, and re-run configure.
     Aborting the installation.
     """ % {'n_err': len(error_messages)}
 
     sys.exit(1)
 elif warning_messages:
     print """
     There were %(n_wrn)s warnings found that you may want to
     look into, solve, and re-run configure before you
     continue the installation.  However, you can also continue
     the installation now and solve these issues later, if you wish.
     """ % {'n_wrn': len(warning_messages)}
diff --git a/modules/bibsort/lib/bibsort_washer.py b/modules/bibsort/lib/bibsort_washer.py
index 73938e2f0..52fecdbe2 100644
--- a/modules/bibsort/lib/bibsort_washer.py
+++ b/modules/bibsort/lib/bibsort_washer.py
@@ -1,133 +1,142 @@
 ## -*- mode: python; coding: utf-8; -*-
 ##
 ## This file is part of Invenio.
 ## Copyright (C) 2010, 2011, 2012 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """Applies a transformation function to a value"""
 
-from time import strptime
-from invenio.dateutils import strftime
-from invenio.textutils import strip_accents
+import re
+from invenio.dateutils import strftime, strptime
+from invenio.textutils import decode_to_unicode, translate_to_ascii
 
 LEADING_ARTICLES = ['the', 'a', 'an', 'at', 'on', 'of']
 
+_RE_NOSYMBOLS = re.compile("\w+")
 
 class InvenioBibSortWasherNotImplementedError(Exception):
     """Exception raised when a washer method
     defined in the bibsort config file is not implemented"""
     pass
 
 
 class BibSortWasher(object):
     """Implements all the washer methods"""
 
     def __init__(self, washer):
         self.washer = washer
         fnc_name = '_' + washer
         try:
             self.washer_fnc = self.__getattribute__(fnc_name)
         except AttributeError, err:
             raise InvenioBibSortWasherNotImplementedError(err)
 
     def get_washer(self):
         """Returns the washer name"""
         return self.washer
 
     def get_transformed_value(self, val):
         """Returns the value"""
         return self.washer_fnc(val)
 
     def _sort_alphanumerically_remove_leading_articles_strip_accents(self, val):
         """
         Convert:
         'The title' => 'title'
         'A title' => 'title'
         'Title' => 'title'
         """
         if not val:
             return ''
-        val_tokens = str(val).split(" ", 1) #split in leading_word, phrase_without_leading_word
-        if len(val_tokens) == 2 and val_tokens[0].lower() in LEADING_ARTICLES:
-            return strip_accents(val_tokens[1].strip().lower())
-        return strip_accents(val.lower())
+        val = translate_to_ascii(val).pop().lower()
+        val_tokens = val.split(" ", 1) #split in leading_word, phrase_without_leading_word
+        if len(val_tokens) == 2 and val_tokens[0].strip() in LEADING_ARTICLES:
+            return val_tokens[1].strip()
+        return val.strip()
 
     def _sort_alphanumerically_remove_leading_articles(self, val):
         """
         Convert:
         'The title' => 'title'
         'A title' => 'title'
         'Title' => 'title'
         """
         if not val:
             return ''
-        val_tokens = str(val).split(" ", 1) #split in leading_word, phrase_without_leading_word
-        if len(val_tokens) == 2 and val_tokens[0].lower() in LEADING_ARTICLES:
-            return val_tokens[1].strip().lower()
-        return val.lower()
+        val = decode_to_unicode(val).lower().encode('UTF-8')
+        val_tokens = val.split(" ", 1) #split in leading_word, phrase_without_leading_word
+        if len(val_tokens) == 2 and val_tokens[0].strip() in LEADING_ARTICLES:
+            return val_tokens[1].strip()
+        return val.strip()
 
     def _sort_case_insensitive_strip_accents(self, val):
         """Remove accents and convert to lower case"""
         if not val:
             return ''
-        return strip_accents(str(val).lower())
+        return translate_to_ascii(val).pop().lower()
+
+    def _sort_nosymbols_case_insensitive_strip_accents(self, val):
+        """Remove accents, remove symbols, and convert to lower case"""
+        if not val:
+            return ''
+        return ''.join(_RE_NOSYMBOLS.findall(translate_to_ascii(val).pop().lower()))
 
     def _sort_case_insensitive(self, val):
         """Conversion to lower case"""
         if not val:
             return ''
-        return str(val).lower()
+        return decode_to_unicode(val).lower().encode('UTF-8')
 
     def _sort_dates(self, val):
         """
         Convert:
         '8 nov 2010' => '2010-11-08'
         'nov 2010' => '2010-11-01'
         '2010' => '2010-01-01'
         """
         datetext_format = "%Y-%m-%d"
         try:
             datestruct = strptime(val, datetext_format)
         except ValueError:
             try:
                 datestruct = strptime(val, "%d %b %Y")
             except ValueError:
                 try:
                     datestruct = strptime(val, "%b %Y")
                 except ValueError:
                     try:
                         datestruct = strptime(val, "%Y")
                     except ValueError:
                         return val
         return strftime(datetext_format, datestruct)
 
     def _sort_numerically(self, val):
         """
         Convert:
         1245 => float(1245)
         """
         try:
             return float(val)
         except ValueError:
             return 0
 
 
 def get_all_available_washers():
     """
     Returns all the available washer functions without the leading '_'
     """
     method_list = dir(BibSortWasher)
     return [method[1:] for method in method_list if method.startswith('_') and method.find('__') < 0]
diff --git a/modules/bibsort/lib/bibsort_washer_unit_tests.py b/modules/bibsort/lib/bibsort_washer_unit_tests.py
index 8dd94152a..a51cf4f0a 100644
--- a/modules/bibsort/lib/bibsort_washer_unit_tests.py
+++ b/modules/bibsort/lib/bibsort_washer_unit_tests.py
@@ -1,64 +1,72 @@
 ## -*- mode: python; coding: utf-8; -*-
 ##
 ## This file is part of Invenio.
 ## Copyright (C) 2010, 2011, 2012 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """Testing module for BibSort Method Treatment"""
 
 from invenio.testutils import InvenioTestCase
 
 from invenio.bibsort_washer import BibSortWasher
 from invenio.testutils import make_test_suite, run_test_suite
 
 
 class TestBibSortWasherCreation(InvenioTestCase):
     """Test BibSortWasher Creation."""
 
     def test_method_creation(self):
         """Tests the creation of a method"""
         method = 'sort_alphanumerically_remove_leading_articles'
         bsm = BibSortWasher(method)
         self.assertEqual(bsm.get_washer(), method)
 
 
 class TestBibSortWasherWashers(InvenioTestCase):
     """Test BibSortWasher Washers."""
 
     def test_sort_alphanumerically_remove_leading_articles(self):
         """Test the sort_alphanumerically_remove_leading_articles method"""
         method = "sort_alphanumerically_remove_leading_articles"
         bsm = BibSortWasher(method)
         self.assertEqual('title of a record', bsm.get_transformed_value('The title of a record'))
         self.assertEqual('title of a record', bsm.get_transformed_value('a title of a record'))
         self.assertEqual('the', bsm.get_transformed_value('The'))
 
     def test_sort_dates(self):
         """Test the sort_dates method"""
         method = "sort_dates"
         bsm = BibSortWasher(method)
         self.assertEqual('2010-01-10', bsm.get_transformed_value('2010-01-10'))
         self.assertEqual('2010-11-10', bsm.get_transformed_value('10 nov 2010'))
         self.assertEqual('2010-11-01', bsm.get_transformed_value('nov 2010'))
         self.assertEqual('2010-01-01', bsm.get_transformed_value('2010'))
         self.assertEqual('2010-11-08', bsm.get_transformed_value('8 nov 2010'))
 
+    def test_sort_nosymbols_case_insensitive_strip_accents(self):
+        """Test the sort_nosymbols_case_insensitive_strip_accents method"""
+        method = "sort_nosymbols_case_insensitive_strip_accents"
+        bsm = BibSortWasher(method)
+        self.assertEqual("thooftgerardus", bsm.get_transformed_value("'t Hooft, Gerardus"))
+        self.assertEqual("ahearnmichaelf", bsm.get_transformed_value("A'Hearn, Michael F."))
+        self.assertEqual("zvolskymilan", bsm.get_transformed_value("Zvolský, Milan"))
+
 
 TEST_SUITE = make_test_suite(TestBibSortWasherWashers,
                              TestBibSortWasherCreation)
 
 if __name__ == "__main__":
     run_test_suite(TEST_SUITE)
diff --git a/modules/docextract/lib/refextract_tag.py b/modules/docextract/lib/refextract_tag.py
index e93cff128..389700a8d 100644
--- a/modules/docextract/lib/refextract_tag.py
+++ b/modules/docextract/lib/refextract_tag.py
@@ -1,1415 +1,1410 @@
 # -*- coding: utf-8 -*-
 ##
 ## This file is part of Invenio.
 ## Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 import re
 
-try:
-    from unidecode import unidecode
-    UNIDECODE_AVAILABLE = True
-except ImportError:
-    UNIDECODE_AVAILABLE = False
+from unidecode import unidecode
 
 from invenio.refextract_config import \
     CFG_REFEXTRACT_MARKER_CLOSING_AUTHOR_ETAL, \
     CFG_REFEXTRACT_MARKER_CLOSING_AUTHOR_INCL, \
     CFG_REFEXTRACT_MARKER_CLOSING_AUTHOR_STND, \
     CFG_REFEXTRACT_MARKER_CLOSING_TITLE_IBID, \
     CFG_REFEXTRACT_MARKER_OPENING_TITLE_IBID, \
     CFG_REFEXTRACT_MARKER_OPENING_COLLABORATION, \
     CFG_REFEXTRACT_MARKER_CLOSING_COLLABORATION
 
 from invenio.docextract_text import remove_and_record_multiple_spaces_in_line
 
 from invenio.refextract_re import \
     re_ibid, \
     re_doi, \
     re_raw_url, \
     re_series_from_numeration, \
     re_punctuation, \
     re_correct_numeration_2nd_try_ptn1, \
     re_correct_numeration_2nd_try_ptn2, \
     re_correct_numeration_2nd_try_ptn3, \
     re_correct_numeration_2nd_try_ptn4, \
     re_numeration_nucphys_vol_page_yr, \
     re_numeration_vol_subvol_nucphys_yr_page, \
     re_numeration_nucphys_vol_yr_page, \
     re_multiple_hyphens, \
     re_numeration_vol_page_yr, \
     re_numeration_vol_yr_page, \
     re_numeration_vol_nucphys_series_yr_page, \
     re_numeration_vol_series_nucphys_page_yr, \
     re_numeration_vol_nucphys_series_page_yr, \
     re_html_tagged_url, \
     re_numeration_yr_vol_page, \
     re_numeration_vol_nucphys_page_yr, \
     re_wash_volume_tag, \
     re_numeration_vol_nucphys_yr_subvol_page, \
     re_quoted, \
     re_isbn, \
     re_arxiv, \
     re_arxiv_5digits, \
     re_new_arxiv, \
     re_new_arxiv_5digits, \
     re_pos, \
     re_pos_year_num, \
     re_series_from_numeration_after_volume, \
     RE_OLD_ARXIV, \
     RE_ARXIV_CATCHUP, \
     RE_ATLAS_CONF_PRE_2010, \
     RE_ATLAS_CONF_POST_2010
 
 from invenio.authorextract_re import (get_author_regexps,
                                       etal_matches,
                                       re_ed_notation,
                                       re_etal)
 from invenio.docextract_text import wash_line
 
 
 def tag_reference_line(line, kbs, record_titles_count):
     # take a copy of the line as a first working line, clean it of bad
     # accents, and correct puncutation, etc:
     working_line1 = wash_line(line)
 
     # Identify volume for POS journal
     working_line1 = tag_pos_volume(working_line1)
 
     # Clean the line once more:
     working_line1 = wash_line(working_line1)
 
     # We identify quoted text
     # This is useful for books matching
     # This is also used by the author tagger to remove quoted
     # text which is a sign of a title and not an author
     working_line1 = tag_quoted_text(working_line1)
 
     # Identify ISBN (for books)
     working_line1 = tag_isbn(working_line1)
 
     # Identify arxiv reports
     working_line1 = tag_arxiv(working_line1)
     working_line1 = tag_arxiv_more(working_line1)
     # Identify volume for POS journal
     # needs special handling because the volume contains the year
     working_line1 = tag_pos_volume(working_line1)
     # Identify ATL-CONF and ATLAS-CONF report numbers
     # needs special handling because it has 2 formats depending on the year
     # and a 2 years digit format to convert
     working_line1 = tag_atlas_conf(working_line1)
 
     # Identify journals with regular expression
     # Some journals need to match exact regexps because they can
     # conflict with other elements
     # e.g. DAN is also a common first name
     standardised_titles = kbs['journals'][1]
     standardised_titles.update(kbs['journals_re'])
     journals_matches = identifiy_journals_re(working_line1, kbs['journals_re'])
 
     # Remove identified tags
     working_line2 = strip_tags(working_line1)
 
     # Transform the line to upper-case, now making a new working line:
     working_line2 = working_line2.upper()
 
     # Strip punctuation from the line:
     working_line2 = re_punctuation.sub(u' ', working_line2)
 
     # Remove multiple spaces from the line, recording
     # information about their coordinates:
     removed_spaces, working_line2 = \
          remove_and_record_multiple_spaces_in_line(working_line2)
 
     # Identify and record coordinates of institute preprint report numbers:
     found_pprint_repnum_matchlens, found_pprint_repnum_replstr, working_line2 =\
        identify_report_numbers(working_line2, kbs['report-numbers'])
 
     # Identify and record coordinates of non-standard journal titles:
     journals_matches_more, working_line2, line_titles_count = \
         identify_journals(working_line2, kbs['journals'])
     journals_matches.update(journals_matches_more)
 
     # Add the count of 'bad titles' found in this line to the total
     # for the reference section:
     record_titles_count = sum_2_dictionaries(record_titles_count,
                                              line_titles_count)
 
     # Attempt to identify, record and replace any IBIDs in the line:
     if (working_line2.upper().find(u"IBID") != -1):
         # there is at least one IBID in the line - try to
         # identify its meaning:
         found_ibids_matchtext, working_line2 = \
             identify_ibids(working_line2)
         # now update the dictionary of matched title lengths with the
         # matched IBID(s) lengths information:
         journals_matches.update(found_ibids_matchtext)
 
     publishers_matches = identify_publishers(working_line2, kbs['publishers'])
 
     tagged_line = process_reference_line(
         working_line=working_line1,
         journals_matches=journals_matches,
         pprint_repnum_len=found_pprint_repnum_matchlens,
         pprint_repnum_matchtext=found_pprint_repnum_replstr,
         publishers_matches=publishers_matches,
         removed_spaces=removed_spaces,
         standardised_titles=standardised_titles,
         kbs=kbs,
     )
 
     return tagged_line, record_titles_count
 
 
 def process_reference_line(working_line,
                            journals_matches,
                            pprint_repnum_len,
                            pprint_repnum_matchtext,
                            publishers_matches,
                            removed_spaces,
                            standardised_titles,
                            kbs):
     """After the phase of identifying and tagging citation instances
        in a reference line, this function is called to go through the
        line and the collected information about the recognised citations,
        and to transform the line into a string of MARC XML in which the
        recognised citations are grouped under various datafields and
        subfields, depending upon their type.
        @param line_marker: (string) - this is the marker for this
         reference line (e.g. [1]).
        @param working_line: (string) - this is the line before the
         punctuation was stripped. At this stage, it has not been
         capitalised, and neither TITLES nor REPORT NUMBERS have been
         stripped from it. However, any recognised numeration and/or URLs
         have been tagged with <cds.YYYY> tags.
         The working_line could, for example, look something like this:
          [1] CDS <cds.URL description="http //invenio-software.org/">
          http //invenio-software.org/</cds.URL>.
        @param found_title_len: (dictionary) - the lengths of the title
         citations that have been recognised in the line. Keyed by the index
         within the line of each match.
        @param found_title_matchtext: (dictionary) - The text that was found
         for each matched title citation in the line. Keyed by the index within
         the line of each match.
        @param pprint_repnum_len: (dictionary) - the lengths of the matched
         institutional preprint report number citations found within the line.
         Keyed by the index within the line of each match.
        @param pprint_repnum_matchtext: (dictionary) - The matched text for each
         matched institutional report number. Keyed by the index within the line
         of each match.
        @param identified_dois (list) - The list of dois inside the citation
        @identified_urls: (list) - contains 2-cell tuples, each of which
         represents an idenitfied URL and its description string.
         The list takes the order in which the URLs were identified in the line
         (i.e. first-found, second-found, etc).
        @param removed_spaces: (dictionary) - The number of spaces removed from
         the various positions in the line. Keyed by the index of the position
         within the line at which the spaces were removed.
        @param standardised_titles: (dictionary) - The standardised journal
         titles, keyed by the non-standard version of those titles.
        @return: (tuple) of 5 components:
                   ( string  -> a MARC XML-ized reference line.
                     integer -> number of fields of miscellaneous text marked-up
                                for the line.
                     integer -> number of title citations marked-up for the line.
                     integer -> number of institutional report-number citations
                                marked-up for the line.
                     integer -> number of URL citations marked-up for the record.
                     integer -> number of DOI's found for the record
                     integer -> number of author groups found
                   )
 
     """
     if len(journals_matches) + len(pprint_repnum_len) + len(publishers_matches) == 0:
         # no TITLE or REPORT-NUMBER citations were found within this line,
         # use the raw line: (This 'raw' line could still be tagged with
         # recognised URLs or numeration.)
         tagged_line = working_line
     else:
         # TITLE and/or REPORT-NUMBER citations were found in this line,
         # build a new version of the working-line in which the standard
         # versions of the REPORT-NUMBERs and TITLEs are tagged:
         startpos = 0          # First cell of the reference line...
         previous_match = {}   # previously matched TITLE within line (used
                               # for replacement of IBIDs.
         replacement_types = {}
         journals_keys = journals_matches.keys()
         journals_keys.sort()
         reports_keys = pprint_repnum_matchtext.keys()
         reports_keys.sort()
         publishers_keys = publishers_matches.keys()
         publishers_keys.sort()
         spaces_keys = removed_spaces.keys()
         spaces_keys.sort()
         replacement_types = get_replacement_types(journals_keys,
                                                   reports_keys,
                                                   publishers_keys)
         replacement_locations = replacement_types.keys()
         replacement_locations.sort()
 
         tagged_line = u""  # This is to be the new 'working-line'. It will
                            # contain the tagged TITLEs and REPORT-NUMBERs,
                            # as well as any previously tagged URLs and
                            # numeration components.
         # begin:
         for replacement_index in replacement_locations:
             # first, factor in any stripped spaces before this 'replacement'
             true_replacement_index, extras = \
                   account_for_stripped_whitespace(spaces_keys,
                                                   removed_spaces,
                                                   replacement_types,
                                                   pprint_repnum_len,
                                                   journals_matches,
                                                   replacement_index)
 
             if replacement_types[replacement_index] == u"journal":
                 # Add a tagged periodical TITLE into the line:
                 rebuilt_chunk, startpos, previous_match = \
                     add_tagged_journal(
                         reading_line=working_line,
                         journal_info=journals_matches[replacement_index],
                         previous_match=previous_match,
                         startpos=startpos,
                         true_replacement_index=true_replacement_index,
                         extras=extras,
                         standardised_titles=standardised_titles)
                 tagged_line += rebuilt_chunk
 
             elif replacement_types[replacement_index] == u"reportnumber":
                 # Add a tagged institutional preprint REPORT-NUMBER
                 # into the line:
                 rebuilt_chunk, startpos = \
                   add_tagged_report_number(
                     reading_line=working_line,
                     len_reportnum=pprint_repnum_len[replacement_index],
                     reportnum=pprint_repnum_matchtext[replacement_index],
                     startpos=startpos,
                     true_replacement_index=true_replacement_index,
                     extras=extras)
                 tagged_line += rebuilt_chunk
 
             elif replacement_types[replacement_index] == u"publisher":
                 rebuilt_chunk, startpos = \
                   add_tagged_publisher(
                     reading_line=working_line,
                     matched_publisher=publishers_matches[replacement_index],
                     startpos=startpos,
                     true_replacement_index=true_replacement_index,
                     extras=extras,
                     kb_publishers=kbs['publishers'])
                 tagged_line += rebuilt_chunk
 
         # add the remainder of the original working-line into the rebuilt line:
         tagged_line += working_line[startpos:]
 
         # we have all the numeration
         # we can make sure there's no space between the volume
         # letter and the volume number
         # e.g. B 20 -> B20
         tagged_line = wash_volume_tag(tagged_line)
 
     # Try to find any authors in the line
     tagged_line = identify_and_tag_authors(tagged_line, kbs['authors'])
     # Try to find any collaboration in the line
     tagged_line = identify_and_tag_collaborations(tagged_line,
                                                   kbs['collaborations'])
 
     return tagged_line.replace('\n', '')
 
 
 def wash_volume_tag(line):
     return re_wash_volume_tag[0].sub(re_wash_volume_tag[1], line)
 
 
 def tag_isbn(line):
     """Tag books ISBN"""
     return re_isbn.sub(ur'<cds.ISBN>\g<code></cds.ISBN>', line)
 
 
 def tag_quoted_text(line):
     """Tag quoted titles
 
     We use titles for pretty display of references that we could not
     associate we record.
     We also use titles for recognising books.
     """
     return re_quoted.sub(ur'<cds.QUOTED>\g<title></cds.QUOTED>', line)
 
 
 def tag_arxiv(line):
     """Tag arxiv report numbers
 
     We handle arXiv in 2 ways:
     * starting with arXiv:1022.1111
     * this format exactly 9999.9999
     We also format the output to the standard arxiv notation:
     * arXiv:2007.12.1111
     * arXiv:2007.12.1111v2
     """
     def tagger(match):
         groups = match.groupdict()
         if match.group('suffix'):
             groups['suffix'] = ' ' + groups['suffix']
         else:
             groups['suffix'] = ''
         return u'<cds.REPORTNUMBER>arXiv:%(year)s'\
             u'%(month)s.%(num)s%(suffix)s' \
             u'</cds.REPORTNUMBER>' % groups
 
     line = re_arxiv_5digits.sub(tagger, line)
     line = re_arxiv.sub(tagger, line)
     line = re_new_arxiv_5digits.sub(tagger, line)
     line = re_new_arxiv.sub(tagger, line)
     return line
 
 
 def tag_arxiv_more(line):
     """Tag old arxiv report numbers
 
     Either formats:
     * hep-th/1234567
     * arXiv:1022111 [hep-ph] which transforms to hep-ph/1022111
     """
     line = RE_ARXIV_CATCHUP.sub(ur"\g<suffix>/\g<year>\g<month>\g<num>", line)
 
     for report_re, report_repl in RE_OLD_ARXIV:
         report_number = report_repl + ur"/\g<num>"
         line = report_re.sub(u'<cds.REPORTNUMBER>' + report_number
                                                      + u'</cds.REPORTNUMBER>',
                              line)
     return line
 
 
 def tag_pos_volume(line):
     """Tag POS volume number
 
     POS is journal that has special volume numbers
     e.g. PoS LAT2007 (2007) 369
     """
     def tagger(match):
         groups = match.groupdict()
         try:
             year = match.group('year')
         except IndexError:
             # Extract year from volume name
             # which should always include the year
             g = re.search(re_pos_year_num, match.group('volume_num'), re.UNICODE)
             year = g.group(0)
 
         if year:
             groups['year'] = ' <cds.YR>(%s)</cds.YR>' % year.strip().strip('()')
         else:
             groups['year'] = ''
 
         return '<cds.JOURNAL>PoS</cds.JOURNAL>' \
             ' <cds.VOL>%(volume_name)s%(volume_num)s</cds.VOL>' \
             '%(year)s' \
             ' <cds.PG>%(page)s</cds.PG>' % groups
 
     for p in re_pos:
         line = p.sub(tagger, line)
 
     return line
 
 
 def tag_atlas_conf(line):
     line = RE_ATLAS_CONF_PRE_2010.sub(
         ur'<cds.REPORTNUMBER>ATL-CONF-\g<code></cds.REPORTNUMBER>', line)
     line = RE_ATLAS_CONF_POST_2010.sub(
         ur'<cds.REPORTNUMBER>ATLAS-CONF-\g<code></cds.REPORTNUMBER>', line)
     return line
 
 
 def identifiy_journals_re(line, kb_journals):
     matches = {}
     for pattern, dummy_journal in kb_journals:
         match = re.search(pattern, line)
         if match:
             matches[match.start()] = match.group(0)
     return matches
 
 
 def find_numeration_more(line):
     """Look for other numeration in line."""
     # First, attempt to use marked-up titles
     patterns = (
         re_correct_numeration_2nd_try_ptn1,
         re_correct_numeration_2nd_try_ptn2,
         re_correct_numeration_2nd_try_ptn3,
         re_correct_numeration_2nd_try_ptn4,
     )
     for pattern in patterns:
         match = pattern.search(line)
         if match:
             info = match.groupdict()
             series = extract_series_from_volume(info['vol'])
             if not info['vol_num']:
                 info['vol_num'] = info['vol_num_alt']
             if not info['vol_num']:
                 info['vol_num'] = info['vol_num_alt2']
             return {'year': info.get('year', None),
                     'series': series,
                     'volume': info['vol_num'],
                     'page': info['page'],
                     'page_end': info['page_end'],
                     'len': len(info['aftertitle'])}
 
     return None
 
 
 def add_tagged_report_number(reading_line,
                              len_reportnum,
                              reportnum,
                              startpos,
                              true_replacement_index,
                              extras):
     """In rebuilding the line, add an identified institutional REPORT-NUMBER
        (standardised and tagged) into the line.
        @param reading_line: (string) The reference line before capitalization
         was performed, and before REPORT-NUMBERs and TITLEs were stipped out.
        @param len_reportnum: (integer) the length of the matched REPORT-NUMBER.
        @param reportnum: (string) the replacement text for the matched
         REPORT-NUMBER.
        @param startpos: (integer) the pointer to the next position in the
         reading-line from which to start rebuilding.
        @param true_replacement_index: (integer) the replacement index of the
         matched REPORT-NUMBER in the reading-line, with stripped punctuation
         and whitespace accounted for.
        @param extras: (integer) extras to be added into the replacement index.
        @return: (tuple) containing a string (the rebuilt line segment) and an
         integer (the next 'startpos' in the reading-line).
     """
     rebuilt_line = u""  # The segment of the line that's being rebuilt to
                         # include the tagged & standardised REPORT-NUMBER
 
     # Fill rebuilt_line with the contents of the reading_line up to the point
     # of the institutional REPORT-NUMBER. However, stop 1 character before the
     # replacement index of this REPORT-NUMBER to allow for removal of braces,
     # if necessary:
     if (true_replacement_index - startpos - 1) >= 0:
         rebuilt_line += reading_line[startpos:true_replacement_index - 1]
     else:
         rebuilt_line += reading_line[startpos:true_replacement_index]
 
     # Add the tagged REPORT-NUMBER into the rebuilt-line segment:
     rebuilt_line += u"<cds.REPORTNUMBER>%(reportnum)s</cds.REPORTNUMBER>" \
                         % {'reportnum' : reportnum}
 
     # Move the pointer in the reading-line past the current match:
     startpos = true_replacement_index + len_reportnum + extras
 
     # Move past closing brace for report number (if there was one):
     try:
         if reading_line[startpos] in (u"]", u")"):
             startpos += 1
     except IndexError:
         # moved past end of line - ignore
         pass
 
     # return the rebuilt-line segment and the pointer to the next position in
     # the reading-line from  which to start rebuilding up to the next match:
     return rebuilt_line, startpos
 
 
 def add_tagged_journal_in_place_of_IBID(previous_match):
     """In rebuilding the line, if the matched TITLE was actually an IBID, this
        function will replace it with the previously matched TITLE, and add it
        into the line, tagged. It will even handle the series letter, if it
        differs. For example, if the previous match is "Nucl. Phys. B", and
        the ibid is "IBID A", the title inserted into the line will be
        "Nucl. Phys. A". Otherwise, if the IBID had no series letter, it will
        simply be replaced by "Nucl. Phys. B" (i.e. the previous match.)
        @param previous_match: (string) - the previously matched TITLE.
        @param ibid_series: (string) - the series of the IBID (if any).
        @return: (tuple) containing a string (the rebuilt line segment) and an
         other string (the newly updated previous-match).
     """
 
     return " %s%s%s" % (CFG_REFEXTRACT_MARKER_OPENING_TITLE_IBID,
                        previous_match['title'],
                        CFG_REFEXTRACT_MARKER_CLOSING_TITLE_IBID)
 
 
 def extract_series_from_volume(volume):
     patterns = (re_series_from_numeration,
                 re_series_from_numeration_after_volume)
     for p in patterns:
         match = p.search(volume)
         if match:
             return match.group(1)
     return None
 
 
 def create_numeration_tag(info):
     if info['series']:
         series_and_volume = info['series'] + info['volume']
     else:
         series_and_volume = info['volume']
     numeration_tags = u' <cds.VOL>%s</cds.VOL>' % series_and_volume
     if info.get('year', False):
         numeration_tags += u' <cds.YR>(%(year)s)</cds.YR>' % info
     if info.get('page_end', False):
         numeration_tags += u' <cds.PG>%(page)s-%(page_end)s</cds.PG>' % info
     else:
         numeration_tags += u' <cds.PG>%(page)s</cds.PG>' % info
     return numeration_tags
 
 
 def add_tagged_journal(reading_line,
                        journal_info,
                        previous_match,
                        startpos,
                        true_replacement_index,
                        extras,
                        standardised_titles):
     """In rebuilding the line, add an identified periodical TITLE (standardised
        and tagged) into the line.
        @param reading_line: (string) The reference line before capitalization
         was performed, and before REPORT-NUMBERs and TITLEs were stripped out.
        @param len_title: (integer) the length of the matched TITLE.
        @param matched_title: (string) the matched TITLE text.
        @param previous_match: (dict) the previous periodical TITLE citation to
         have been matched in the current reference line. It is used when
         replacing an IBID instance in the line.
        @param startpos: (integer) the pointer to the next position in the
         reading-line from which to start rebuilding.
        @param true_replacement_index: (integer) the replacement index of the
         matched TITLE in the reading-line, with stripped punctuation and
         whitespace accounted for.
        @param extras: (integer) extras to be added into the replacement index.
        @param standardised_titles: (dictionary) the standardised versions of
         periodical titles, keyed by their various non-standard versions.
        @return: (tuple) containing a string (the rebuilt line segment), an
         integer (the next 'startpos' in the reading-line), and an other string
         (the newly updated previous-match).
     """
     old_startpos = startpos
     old_previous_match = previous_match
     skip_numeration = False
     series = None
 
     def skip_ponctuation(line, pos):
         # Skip past any punctuation at the end of the replacement that was
         # just made:
         try:
             while line[pos] in (".", ":", "-", ")"):
                 pos += 1
         except IndexError:
             # The match was at the very end of the line
             pass
 
         return pos
 
     # Fill 'rebuilt_line' (the segment of the line that is being rebuilt to
     # include the tagged and standardised periodical TITLE) with the contents
     # of the reading-line, up to the point of the matched TITLE:
     rebuilt_line = reading_line[startpos:true_replacement_index]
 
     # Test to see whether a title or an "IBID" was matched:
     if journal_info.upper().find("IBID") != -1:
         # This is an IBID
         # Try to replace the IBID with a title:
         if previous_match:
             # Replace this IBID with the previous title match, if possible:
             rebuilt_line += add_tagged_journal_in_place_of_IBID(previous_match)
             series = previous_match['series']
             # Update start position for next segment of original line:
             startpos = true_replacement_index + len(journal_info) + extras
             startpos = skip_ponctuation(reading_line, startpos)
         else:
             rebuilt_line = ""
             skip_numeration = True
     else:
         if ';' in standardised_titles[journal_info]:
             title, series = \
                               standardised_titles[journal_info].rsplit(';', 1)
             series = series.strip()
             previous_match = {'title': title,
                               'series': series}
         else:
             title = standardised_titles[journal_info]
             previous_match = {'title': title,
                               'series': None}
 
         # This is a normal title, not an IBID
         rebuilt_line += "<cds.JOURNAL>%s</cds.JOURNAL>" % title
         startpos = true_replacement_index + len(journal_info) + extras
         startpos = skip_ponctuation(reading_line, startpos)
 
     if not skip_numeration:
         # Check for numeration
         numeration_line = reading_line[startpos:]
         # First look for standard numeration
         numerotation_info = find_numeration(numeration_line)
         if not numerotation_info:
             numeration_line = rebuilt_line + " " + numeration_line
             # Now look for more funky numeration
             # With possibly some elements before the journal title
             numerotation_info = find_numeration_more(numeration_line)
 
         if not numerotation_info:
             startpos = old_startpos
             previous_match = old_previous_match
             rebuilt_line = ""
         else:
             if series and not numerotation_info['series']:
                 numerotation_info['series'] = series
             startpos += numerotation_info['len']
             rebuilt_line += create_numeration_tag(numerotation_info)
 
             previous_match['series'] = numerotation_info['series']
 
     # return the rebuilt line-segment, the position (of the reading line) from
     # which the next part of the rebuilt line should be started, and the newly
     # updated previous match.
     return rebuilt_line, startpos, previous_match
 
 
 def add_tagged_publisher(reading_line,
                          matched_publisher,
                          startpos,
                          true_replacement_index,
                          extras,
                          kb_publishers):
     """In rebuilding the line, add an identified periodical TITLE (standardised
        and tagged) into the line.
        @param reading_line: (string) The reference line before capitalization
         was performed, and before REPORT-NUMBERs and TITLEs were stripped out.
        @param len_title: (integer) the length of the matched TITLE.
        @param matched_title: (string) the matched TITLE text.
        @param previous_match: (string) the previous periodical TITLE citation to
         have been matched in the current reference line. It is used when
         replacing an IBID instance in the line.
        @param startpos: (integer) the pointer to the next position in the
         reading-line from which to start rebuilding.
        @param true_replacement_index: (integer) the replacement index of the
         matched TITLE in the reading-line, with stripped punctuation and
         whitespace accounted for.
        @param extras: (integer) extras to be added into the replacement index.
        @param standardised_titles: (dictionary) the standardised versions of
         periodical titles, keyed by their various non-standard versions.
        @return: (tuple) containing a string (the rebuilt line segment), an
         integer (the next 'startpos' in the reading-line), and an other string
         (the newly updated previous-match).
     """
     # Fill 'rebuilt_line' (the segment of the line that is being rebuilt to
     # include the tagged and standardised periodical TITLE) with the contents
     # of the reading-line, up to the point of the matched TITLE:
     rebuilt_line = reading_line[startpos:true_replacement_index]
     # This is a normal title, not an IBID
     rebuilt_line += "<cds.PUBLISHER>%(title)s</cds.PUBLISHER>" \
                     % {'title' : kb_publishers[matched_publisher]['repl']}
     # Compute new start pos
     startpos = true_replacement_index + len(matched_publisher) + extras
 
     # return the rebuilt line-segment, the position (of the reading line) from
     # which the next part of the rebuilt line should be started, and the newly
     # updated previous match.
 
     return rebuilt_line, startpos
 
 
 def get_replacement_types(titles, reportnumbers, publishers):
     """Given the indices of the titles and reportnumbers that have been
        recognised within a reference line, create a dictionary keyed by
        the replacement position in the line, where the value for each
        key is a string describing the type of item replaced at that
        position in the line.
        The description strings are:
            'title'        - indicating that the replacement is a
                             periodical title
            'reportnumber' - indicating that the replacement is a
                             preprint report number.
        @param titles: (list) of locations in the string at which
         periodical titles were found.
        @param reportnumbers: (list) of locations in the string at which
         reportnumbers were found.
        @return: (dictionary) of replacement types at various locations
         within the string.
     """
     rep_types = {}
     for item_idx in titles:
         rep_types[item_idx] = "journal"
     for item_idx in reportnumbers:
         rep_types[item_idx] = "reportnumber"
     for item_idx in publishers:
         rep_types[item_idx] = "publisher"
     return rep_types
 
 
 def account_for_stripped_whitespace(spaces_keys,
                                     removed_spaces,
                                     replacement_types,
                                     len_reportnums,
                                     journals_matches,
                                     replacement_index):
     """To build a processed (MARC XML) reference line in which the
        recognised citations such as standardised periodical TITLEs and
        REPORT-NUMBERs have been marked up, it is necessary to read from
        the reference line BEFORE all punctuation was stripped and it was
        made into upper-case. The indices of the cited items in this
        'original line', however, will be different to those in the
        'working-line', in which punctuation and multiple-spaces were
        stripped out. For example, the following reading-line:
 
         [26] E. Witten and S.-T. Yau, hep-th/9910245.
        ...becomes (after punctuation and multiple white-space stripping):
         [26] E WITTEN AND S T YAU HEP TH/9910245
 
        It can be seen that the report-number citation (hep-th/9910245) is
        at a different index in the two strings. When refextract searches
        for this citation, it uses the 2nd string (i.e. that which is
        capitalised and has no punctuation). When it builds the MARC XML
        representation of the reference line, however, it needs to read from
        the first string. It must therefore consider the whitespace,
        punctuation, etc that has been removed, in order to get the correct
        index for the cited item. This function accounts for the stripped
        characters before a given TITLE or REPORT-NUMBER index.
        @param spaces_keys: (list) - the indices at which spaces were
         removed from the reference line.
        @param removed_spaces: (dictionary) - keyed by the indices at which
         spaces were removed from the line, the values are the number of
         spaces actually removed from that position.
         So, for example, "3 spaces were removed from position 25 in
         the line."
        @param replacement_types: (dictionary) - at each 'replacement_index'
         in the line, the of replacement to make (title or reportnumber).
        @param len_reportnums: (dictionary) - the lengths of the REPORT-
         NUMBERs matched at the various indices in the line.
        @param len_titles: (dictionary) - the lengths of the various
         TITLEs matched at the various indices in the line.
        @param replacement_index: (integer) - the index in the working line
         of the identified TITLE or REPORT-NUMBER citation.
        @return: (tuple) containing 2 elements:
                         + the true replacement index of a replacement in
                           the reading line;
                         + any extras to add into the replacement index;
     """
     extras = 0
     true_replacement_index = replacement_index
     spare_replacement_index = replacement_index
 
     for space in spaces_keys:
         if space < true_replacement_index:
             # There were spaces stripped before the current replacement
             # Add the number of spaces removed from this location to the
             # current replacement index:
             true_replacement_index += removed_spaces[space]
             spare_replacement_index += removed_spaces[space]
         elif space >= spare_replacement_index and \
                  replacement_types[replacement_index] == u"journal" and \
                  space < (spare_replacement_index +
                                      len(journals_matches[replacement_index])):
             # A periodical title is being replaced. Account for multi-spaces
             # that may have been stripped from the title before its
             # recognition:
             spare_replacement_index += removed_spaces[space]
             extras += removed_spaces[space]
         elif space >= spare_replacement_index and \
                  replacement_types[replacement_index] == u"reportnumber" and \
                  space < (spare_replacement_index +
                            len_reportnums[replacement_index]):
             # An institutional preprint report-number is being replaced.
             # Account for multi-spaces that may have been stripped from it
             # before its recognition:
             spare_replacement_index += removed_spaces[space]
             extras += removed_spaces[space]
 
     # return the new values for replacement indices with stripped
     # whitespace accounted for:
     return true_replacement_index, extras
 
 
 def strip_tags(line):
     # Firstly, go through and change ALL TAGS and their contents to underscores
     # author content can be checked for underscores later on
     # Note that we don't have embedded tags this is why
     # we can do this
     re_tag = re.compile(ur'<cds\.[A-Z]+>[^<]*</cds\.[A-Z]+>|<cds\.[A-Z]+ />',
                         re.UNICODE)
     for m in re_tag.finditer(line):
         chars_count = m.end() - m.start()
         line = re_tag.sub('_'*chars_count, line, count=1)
     return line
 
 
 def identify_and_tag_collaborations(line, collaborations_kb):
     """Given a line where Authors have been tagged, and all other tags
        and content has been replaced with underscores, go through and try
        to identify extra items of data which should be placed into 'h'
        subfields.
        Later on, these tagged pieces of information will be merged into
        the content of the most recently found author. This is separated
        from the author tagging procedure since separate tags can be used,
        which won't influence the reference splitting heuristics
        (used when looking at mulitple <AUTH> tags in a line).
     """
     for dummy_collab, re_collab in collaborations_kb.iteritems():
         matches = re_collab.finditer(strip_tags(line))
 
         for match in reversed(list(matches)):
             line = line[:match.start()] \
                 + CFG_REFEXTRACT_MARKER_OPENING_COLLABORATION \
                 + match.group(1).strip(".,:;- [](){}") \
                 + CFG_REFEXTRACT_MARKER_CLOSING_COLLABORATION \
                 + line[match.end():]
 
     return line
 
 
 def identify_and_tag_authors(line, authors_kb):
     """Given a reference, look for a group of author names,
        place tags around the author group, return the newly tagged line.
     """
     re_auth, re_auth_near_miss = get_author_regexps()
 
     # Replace authors which do not convert well from utf-8
     for pattern, repl in authors_kb:
         line = line.replace(pattern, repl)
 
     output_line = line
 
     # We matched authors here
     line = strip_tags(output_line)
     matched_authors = list(re_auth.finditer(line))
     # We try to have better results by unidecoding
-    if UNIDECODE_AVAILABLE:
-        unidecoded_line = strip_tags(unidecode(output_line))
-        matched_authors_unidecode = list(re_auth.finditer(unidecoded_line))
+    unidecoded_line = strip_tags(unidecode(output_line))
+    matched_authors_unidecode = list(re_auth.finditer(unidecoded_line))
 
-        if len(matched_authors_unidecode) > len(matched_authors):
-            output_line = unidecode(output_line)
-            matched_authors = matched_authors_unidecode
+    if len(matched_authors_unidecode) > len(matched_authors):
+        output_line = unidecode(output_line)
+        matched_authors = matched_authors_unidecode
 
     # If there is at least one matched author group
     if matched_authors:
         matched_positions = []
         preceeding_text_string = line
         preceeding_text_start = 0
         for auth_no, match in enumerate(matched_authors):
             # Only if there are no underscores or closing arrows found in the matched author group
             # This must be checked for here, as it cannot be applied to the re without clashing with
             # other Unicode characters
             if line[match.start():match.end()].find("_") == -1:
                 # Has the group with name 'et' (for 'et al') been found in the pattern?
                 # Has the group with name 'es' (for ed. before the author) been found in the pattern?
                 # Has the group with name 'ee' (for ed. after the author) been found in the pattern?
                 matched_positions.append({
                     'start'       : match.start(),
                     'end'         : match.end(),
                     'etal'        : match.group('et') or match.group('et2'),
                     'ed_start'    : match.group('es'),
                     'ed_end'      : match.group('ee'),
                     'multi_auth'  : match.group('multi_auth'),
                     'multi_surs'  : match.group('multi_surs'),
                     'text_before' : preceeding_text_string[preceeding_text_start:match.start()],
                     'auth_no'     : auth_no,
                     'author_names': match.group('author_names')
                 })
                 # Save the end of the match, from where to snip the misc text found before an author match
                 preceeding_text_start = match.end()
 
         # Work backwards to avoid index problems when adding AUTH tags
         matched_positions.reverse()
         for m in matched_positions:
             dump_in_misc = False
             start = m['start']
             end = m['end']
 
             # Check the text before the current match to see if it has a bad 'et al'
             lower_text_before = m['text_before'].strip().lower()
             for e in etal_matches:
                 if lower_text_before.endswith(e):
                     ## If so, this author match is likely to be a bad match on a missed title
                     dump_in_misc = True
                     break
 
             # An AND found here likely indicates a missed author before this text
             # Thus, triggers weaker author searching, within the previous misc text
             # (Check the text before the current match to see if it has a bad 'and')
             # A bad 'and' will only be denoted as such if there exists only one author after it
             # and the author group is legit (not to be dumped in misc)
             if not dump_in_misc and not (m['multi_auth'] or m['multi_surs']) \
                     and (lower_text_before.endswith(' and')):
                 # Search using a weaker author pattern to try and find the missed author(s) (cut away the end 'and')
                 weaker_match = re_auth_near_miss.match(m['text_before'])
                 if weaker_match and not (weaker_match.group('es') or weaker_match.group('ee')):
                     # Change the start of the author group to include this new author group
                     start = start - (len(m['text_before']) - weaker_match.start())
                 # Still no match, do not add tags for this author match.. dump it into misc
                 else:
                     dump_in_misc = True
 
             add_to_misc = ""
             # If a semi-colon was found at the end of this author group, keep it in misc
             # so that it can be looked at for splitting heurisitics
             if len(output_line) > m['end']:
                 if output_line[m['end']].strip(" ,.") == ';':
                     add_to_misc = ';'
 
             # Standardize eds. notation
             tmp_output_line = re.sub(re_ed_notation, '(ed.)',
                 output_line[start:end], re.IGNORECASE)
             # Standardize et al. notation
             tmp_output_line = re.sub(re_etal, 'et al.',
                 tmp_output_line, re.IGNORECASE)
             # Strip
             tmp_output_line = tmp_output_line.lstrip('.').strip(",:;- [](")
             if not tmp_output_line.endswith('(ed.)'):
                 tmp_output_line = tmp_output_line.strip(')')
 
             # ONLY wrap author data with tags IF there is no evidence that it is an
             # ed. author. (i.e. The author is not referred to as an editor)
             # Does this author group string have 'et al.'?
             if m['etal'] and not (m['ed_start'] or m['ed_end'] or dump_in_misc):
                 output_line = output_line[:start] \
                     + "<cds.AUTHetal>" \
                     + tmp_output_line \
                     + CFG_REFEXTRACT_MARKER_CLOSING_AUTHOR_ETAL \
                     + add_to_misc \
                     + output_line[end:]
             elif not (m['ed_start'] or m['ed_end'] or dump_in_misc):
                 # Insert the std (standard) tag
                 output_line = output_line[:start] \
                     + "<cds.AUTHstnd>" \
                     + tmp_output_line \
                     + CFG_REFEXTRACT_MARKER_CLOSING_AUTHOR_STND \
                     + add_to_misc \
                     + output_line[end:]
             # Apply the 'include in $h' method to author groups marked as editors
             elif m['ed_start'] or m['ed_end']:
                 ed_notation = " (eds.)"
                 # Standardize et al. notation
                 tmp_output_line = re.sub(re_etal, 'et al.',
                     m['author_names'], re.IGNORECASE)
                 # remove any characters which denote this author group
                 # to be editors, just take the
                 # author names, and append '(ed.)'
                 output_line = output_line[:start] \
                     + "<cds.AUTHincl>" \
                     + tmp_output_line.strip(",:;- [](") \
                     + ed_notation \
                     + CFG_REFEXTRACT_MARKER_CLOSING_AUTHOR_INCL \
                     + add_to_misc \
                     + output_line[end:]
 
     return output_line
 
 
 def sum_2_dictionaries(dicta, dictb):
     """Given two dictionaries of totals, where each total refers to a key
        in the dictionary, add the totals.
        E.g.:  dicta = { 'a' : 3, 'b' : 1 }
               dictb = { 'a' : 1, 'c' : 5 }
               dicta + dictb = { 'a' : 4, 'b' : 1, 'c' : 5 }
        @param dicta: (dictionary)
        @param dictb: (dictionary)
        @return: (dictionary) - the sum of the 2 dictionaries
     """
     dict_out = dicta.copy()
     for key in dictb.keys():
         if 'key' in dict_out:
             # Add the sum for key in dictb to that of dict_out:
             dict_out[key] += dictb[key]
         else:
             # the key is not in the first dictionary - add it directly:
             dict_out[key] = dictb[key]
     return dict_out
 
 
 def identify_ibids(line):
     """Find IBIDs within the line, record their position and length,
        and replace them with underscores.
        @param line: (string) the working reference line
        @return: (tuple) containing 2 dictionaries and a string:
          Dictionary:   matched IBID text: (Key: position of IBID in
                        line; Value: matched IBID text)
          String:       working line with matched IBIDs removed
     """
     ibid_match_txt = {}
     # Record details of each matched ibid:
     for m_ibid in re_ibid.finditer(line):
         ibid_match_txt[m_ibid.start()] = m_ibid.group(0)
         # Replace matched text in line with underscores:
         line = line[0:m_ibid.start()] + \
                "_" * len(m_ibid.group(0)) + \
                line[m_ibid.end():]
 
     return ibid_match_txt, line
 
 
 def find_all(string, sub):
     listindex = []
     offset = 0
     i = string.find(sub, offset)
     while i >= 0:
         listindex.append(i)
         i = string.find(sub, i + 1)
     return listindex
 
 
 def find_numeration(line):
     """Given a reference line, attempt to locate instances of citation
        'numeration' in the line.
        @param line: (string) the reference line.
        @return: (string) the reference line after numeration has been checked
         and possibly recognized/marked-up.
     """
     patterns = (
         # vol,page,year
         re_numeration_vol_page_yr,
         re_numeration_vol_nucphys_page_yr,
         re_numeration_nucphys_vol_page_yr,
         # With sub volume
         re_numeration_vol_subvol_nucphys_yr_page,
         re_numeration_vol_nucphys_yr_subvol_page,
         # vol,year,page
         re_numeration_vol_yr_page,
         re_numeration_nucphys_vol_yr_page,
         re_numeration_vol_nucphys_series_yr_page,
         # vol,page,year
         re_numeration_vol_series_nucphys_page_yr,
         re_numeration_vol_nucphys_series_page_yr,
         # year,vol,page
         re_numeration_yr_vol_page,
     )
 
     for pattern in patterns:
         match = pattern.match(line)
         if match:
             info = match.groupdict()
             series = info.get('series', None)
             if not series:
                 series = extract_series_from_volume(info['vol'])
             if not info['vol_num']:
                 info['vol_num'] = info['vol_num_alt']
             if not info['vol_num']:
                 info['vol_num'] = info['vol_num_alt2']
             return {'year': info.get('year', None),
                     'series': series,
                     'volume': info['vol_num'],
                     'page': info['page'],
                     'page_end': info['page_end'],
                     'len': match.end()}
 
     return None
 
 
 def identify_journals(line, kb_journals):
     """Attempt to identify all periodical titles in a reference line.
        Titles will be identified, their information (location in line,
        length in line, and non-standardised version) will be recorded,
        and they will be replaced in the working line by underscores.
        @param line: (string) - the working reference line.
        @param periodical_title_search_kb: (dictionary) - contains the
         regexp patterns used to search for a non-standard TITLE in the
         working reference line. Keyed by the TITLE string itself.
        @param periodical_title_search_keys: (list) - contains the non-
         standard periodical TITLEs to be searched for in the line. This
         list of titles has already been ordered and is used to force
         the order of searching.
        @return: (tuple) containing 4 elements:
                         + (dictionary) - the lengths of all titles
                                          matched at each given index
                                          within the line.
                         + (dictionary) - the text actually matched for
                                          each title at each given
                                          index within the line.
                         + (string)     - the working line, with the
                                          titles removed from it and
                                          replaced by underscores.
                         + (dictionary) - the totals for each bad-title
                                          found in the line.
     """
     periodical_title_search_kb = kb_journals[0]
     periodical_title_search_keys = kb_journals[2]
 
     title_matches = {}            # the text matched at the given line
                                   # location (i.e. the title itself)
     titles_count = {}             # sum totals of each 'bad title found in
                                   # line.
 
     # Begin searching:
     for title in periodical_title_search_keys:
         # search for all instances of the current periodical title
         # in the line:
         # for each matched periodical title:
         for title_match in periodical_title_search_kb[title].finditer(line):
 
             if title not in titles_count:
                 # Add this title into the titles_count dictionary:
                 titles_count[title] = 1
             else:
                 # Add 1 to the count for the given title:
                 titles_count[title] += 1
 
             # record the details of this title match:
             # record the match length:
             title_matches[title_match.start()] = title
 
             len_to_replace = len(title)
 
             # replace the matched title text in the line it n * '_',
             # where n is the length of the matched title:
             line = u"".join((line[:title_match.start()],
                              u"_" * len_to_replace,
                              line[title_match.start() + len_to_replace:]))
 
     # return recorded information about matched periodical titles,
     # along with the newly changed working line:
     return title_matches, line, titles_count
 
 
 def identify_report_numbers(line, kb_reports):
     """Attempt to identify all preprint report numbers in a reference
        line.
        Report numbers will be identified, their information (location
        in line, length in line, and standardised replacement version)
        will be recorded, and they will be replaced in the working-line
        by underscores.
        @param line: (string) - the working reference line.
        @param preprint_repnum_search_kb: (dictionary) - contains the
         regexp patterns used to identify preprint report numbers.
        @param preprint_repnum_standardised_categs: (dictionary) -
         contains the standardised 'category' of a given preprint report
         number.
        @return: (tuple) - 3 elements:
            * a dictionary containing the lengths in the line of the
              matched preprint report numbers, keyed by the index at
              which each match was found in the line.
            * a dictionary containing the replacement strings (standardised
              versions) of preprint report numbers that were matched in
              the line.
            * a string, that is the new version of the working reference
              line, in which any matched preprint report numbers have been
              replaced by underscores.
         Returned tuple is therefore in the following order:
             (matched-reportnum-lengths, matched-reportnum-replacements,
              working-line)
     """
     def _by_len(a, b):
         """Comparison function used to sort a list by the length of the
            strings in each element of the list.
         """
         if len(a[1]) < len(b[1]):
             return 1
         elif len(a[1]) == len(b[1]):
             return 0
         else:
             return -1
 
     repnum_matches_matchlen = {}  # info about lengths of report numbers
                                   # matched at given locations in line
     repnum_matches_repl_str = {}  # standardised report numbers matched
                                   # at given locations in line
 
     repnum_search_kb, repnum_standardised_categs = kb_reports
     repnum_categs = repnum_standardised_categs.keys()
     repnum_categs.sort(_by_len)
 
     # Handle CERN/LHCC/98-013
     line = line.replace('/', ' ')
 
     # try to match preprint report numbers in the line:
     for categ in repnum_categs:
         # search for all instances of the current report
         # numbering style in the line:
         repnum_matches_iter = repnum_search_kb[categ].finditer(line)
 
         # for each matched report number of this style:
         for repnum_match in repnum_matches_iter:
             # Get the matched text for the numeration part of the
             # preprint report number:
             numeration_match = repnum_match.group('numn')
             # clean/standardise this numeration text:
             numeration_match = numeration_match.replace(" ", "-")
             numeration_match = re_multiple_hyphens.sub("-", numeration_match)
             numeration_match = numeration_match.replace("/-", "/")
             numeration_match = numeration_match.replace("-/", "/")
             numeration_match = numeration_match.replace("-/-", "/")
 
             # replace the found preprint report number in the
             # string with underscores
             # (this will replace chars in the lower-cased line):
             line = line[0:repnum_match.start(1)] \
                    + "_"*len(repnum_match.group(1)) + line[repnum_match.end(1):]
 
             # record the information about the matched preprint report number:
             # total length in the line of the matched preprint report number:
             repnum_matches_matchlen[repnum_match.start(1)] = \
                                                     len(repnum_match.group(1))
             # standardised replacement for the matched preprint report number:
             repnum_matches_repl_str[repnum_match.start(1)] = \
                                     repnum_standardised_categs[categ] \
                                     + numeration_match
 
     # return recorded information about matched report numbers, along with
     # the newly changed working line:
     return repnum_matches_matchlen, repnum_matches_repl_str, line
 
 
 def identify_publishers(line, kb_publishers):
     matches_repl = {}  # standardised report numbers matched
                        # at given locations in line
 
     for abbrev, info in kb_publishers.iteritems():
         for match in info['pattern'].finditer(line):
             # record the matched non-standard version of the publisher:
             matches_repl[match.start(0)] = abbrev
 
     return matches_repl
 
 
 def identify_and_tag_URLs(line):
     """Given a reference line, identify URLs in the line, record the
        information about them, and replace them with a "<cds.URL />" tag.
        URLs are identified in 2 forms:
         + Raw: http://invenio-software.org/
         + HTML marked-up: <a href="http://invenio-software.org/">CERN Document
           Server Software Consortium</a>
        These URLs are considered to have 2 components: The URL itself
        (url string); and the URL description. The description is effectively
        the text used for the created Hyperlink when the URL is marked-up
        in HTML. When an HTML marked-up URL has been recognised, the text
        between the anchor tags is therefore taken as the URL description.
        In the case of a raw URL recognition, however, the URL itself will
        also be used as the URL description.
        For example, in the following reference line:
         [1] See <a href="http://invenio-software.org/">CERN Document Server
         Software Consortium</a>.
        ...the URL string will be "http://invenio-software.org/" and the URL
        description will be
        "CERN Document Server Software Consortium".
        The line returned from this function will be:
         [1] See <cds.URL />
        In the following line, however:
         [1] See http //invenio-software.org/ for more details.
        ...the URL string will be "http://invenio-software.org/" and the URL
        description will also be "http://invenio-software.org/".
        The line returned will be:
         [1] See <cds.URL /> for more details.
 
        @param line: (string) the reference line in which to search for URLs.
        @return: (tuple) - containing 2 items:
         + the line after URLs have been recognised and removed;
         + a list of 2-item tuples where each tuple represents a recognised URL
           and its description:
             [(url, url-description), (url, url-description), ... ]
        @Exceptions raised:
         + an IndexError if there is a problem with the number of URLs
           recognised (this should not happen.)
     """
     # Take a copy of the line:
     line_pre_url_check = line
     # Dictionaries to record details of matched URLs:
     found_url_full_matchlen = {}
     found_url_urlstring     = {}
     found_url_urldescr      = {}
 
     # List to contain details of all matched URLs:
     identified_urls = []
 
     # Attempt to identify and tag all HTML-MARKED-UP URLs in the line:
     m_tagged_url_iter = re_html_tagged_url.finditer(line)
     for m_tagged_url in m_tagged_url_iter:
         startposn = m_tagged_url.start()        # start position of matched URL
         endposn   = m_tagged_url.end()          # end position of matched URL
         matchlen  = len(m_tagged_url.group(0))  # total length of URL match
 
         found_url_full_matchlen[startposn] = matchlen
         found_url_urlstring[startposn]     = m_tagged_url.group('url')
         found_url_urldescr[startposn]      = m_tagged_url.group('desc')
         # temporarily replace the URL match with underscores so that
         # it won't be re-found
         line = line[0:startposn] + u"_"*matchlen + line[endposn:]
 
     # Attempt to identify and tag all RAW (i.e. not
     # HTML-marked-up) URLs in the line:
     m_raw_url_iter = re_raw_url.finditer(line)
     for m_raw_url in m_raw_url_iter:
         startposn   = m_raw_url.start()        # start position of matched URL
         endposn     = m_raw_url.end()          # end position of matched URL
         matchlen    = len(m_raw_url.group(0))  # total length of URL match
         matched_url = m_raw_url.group('url')
 
         if len(matched_url) > 0 and matched_url[-1] in (".", ","):
             # Strip the full-stop or comma from the end of the url:
             matched_url = matched_url[:-1]
 
         found_url_full_matchlen[startposn] = matchlen
         found_url_urlstring[startposn]     = matched_url
         found_url_urldescr[startposn]      = matched_url
         # temporarily replace the URL match with underscores
         # so that it won't be re-found
         line = line[0:startposn] + u"_"*matchlen + line[endposn:]
 
     # Now that all URLs have been identified, insert them
     # back into the line, tagged:
     found_url_positions = found_url_urlstring.keys()
     found_url_positions.sort()
     found_url_positions.reverse()
     for url_position in found_url_positions:
         line = line[0:url_position] + "<cds.URL />" \
                + line[url_position + found_url_full_matchlen[url_position]:]
 
     # The line has been rebuilt. Now record the information about the
     # matched URLs:
     found_url_positions = found_url_urlstring.keys()
     found_url_positions.sort()
     for url_position in found_url_positions:
         identified_urls.append((found_url_urlstring[url_position],
                                 found_url_urldescr[url_position]))
 
     # Somehow the number of URLs found doesn't match the number of
     # URLs recorded in "identified_urls". Raise an IndexError.
     msg = """Error: The number of URLs found in the reference line """ \
           """does not match the number of URLs recorded in the """ \
           """list of identified URLs!\nLine pre-URL checking: %s\n""" \
           """Line post-URL checking: %s\n""" \
           % (line_pre_url_check, line)
     assert len(identified_urls) == len(found_url_positions), msg
 
     # return the line containing the tagged URLs:
     return line, identified_urls
 
 
 def identify_and_tag_DOI(line):
     """takes a single citation line and attempts to locate any DOI references.
        DOI references are recognised in both http (url) format and also the
        standard DOI notation (DOI: ...)
        @param line: (string) the reference line in which to search for DOI's.
        @return: the tagged line and a list of DOI strings (if any)
     """
     # Used to hold the DOI strings in the citation line
     doi_strings = []
 
     # Run the DOI pattern on the line, returning the re.match objects
     matched_doi = re_doi.finditer(line)
     # For each match found in the line
     for match in reversed(list(matched_doi)):
         # Store the start and end position
         start = match.start()
         end = match.end()
         # Get the actual DOI string (remove the url part of the doi string)
         doi_phrase = match.group(6)
 
         # Replace the entire matched doi with a tag
         line = line[0:start] + "<cds.DOI />" + line[end:]
         # Add the single DOI string to the list of DOI strings
         doi_strings.append(doi_phrase)
 
     doi_strings.reverse()
     return line, doi_strings
diff --git a/modules/miscutil/lib/textutils.py b/modules/miscutil/lib/textutils.py
index 792251862..fa0156e27 100644
--- a/modules/miscutil/lib/textutils.py
+++ b/modules/miscutil/lib/textutils.py
@@ -1,775 +1,766 @@
 # -*- coding: utf-8 -*-
 
 ## This file is part of Invenio.
 ## Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2013 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """
 Functions useful for text wrapping (in a box) and indenting.
 """
 
 __revision__ = "$Id$"
 
 import sys
 import re
 import textwrap
 import htmlentitydefs
 import invenio.template
 from invenio.config import CFG_ETCDIR
 try:
     import chardet
     CHARDET_AVAILABLE = True
 except ImportError:
     CHARDET_AVAILABLE = False
 
-try:
-    from unidecode import unidecode
-    UNIDECODE_AVAILABLE = True
-except ImportError:
-    UNIDECODE_AVAILABLE = False
+from unidecode import unidecode
 
 CFG_LATEX_UNICODE_TRANSLATION_CONST = {}
 
 CFG_WRAP_TEXT_IN_A_BOX_STYLES = {
     '__DEFAULT' : {
         'horiz_sep' : '*',
         'max_col' : 72,
         'min_col' : 40,
         'tab_str' : '    ',
         'tab_num' : 0,
         'border' : ('**', '*', '**', '** ', ' **', '**', '*', '**'),
         'prefix' : '\n',
         'suffix' : '\n',
         'break_long' : False,
         'force_horiz' : False,
     },
     'squared' : {
         'horiz_sep' : '-',
         'border' : ('+', '-', '+', '| ', ' |', '+', '-', '+')
     },
     'double_sharp' : {
         'horiz_sep' : '#',
         'border' : ('##', '#', '##', '## ', ' ##', '##', '#', '##')
     },
     'single_sharp' : {
         'horiz_sep' : '#',
         'border' : ('#', '#', '#', '# ', ' #', '#', '#', '#')
     },
     'single_star' : {
         'border' : ('*', '*', '*', '* ', ' *', '*', '*', '*',)
     },
     'double_star' : {
     },
     'no_border' : {
         'horiz_sep' : '',
         'border' : ('', '', '', '', '', '', '', ''),
         'prefix' : '',
         'suffix' : ''
     },
     'conclusion' : {
         'border' : ('', '', '', '', '', '', '', ''),
         'prefix' : '',
         'horiz_sep' : '-',
         'force_horiz' : True,
     },
     'important' : {
         'tab_num' : 1,
     },
     'ascii' : {
         'horiz_sep' : (u'├', u'─', u'┤'),
         'border' : (u'┌', u'─', u'┐', u'│ ', u' │', u'└', u'─', u'┘'),
     },
     'ascii_double' : {
         'horiz_sep' : (u'╠', u'═', u'╣'),
         'border' : (u'╔', u'═', u'╗', u'║ ', u' ║', u'╚', u'═', u'╝'),
     }
 
 }
 
 re_unicode_lowercase_a = re.compile(unicode(r"(?u)[áàäâãå]", "utf-8"))
 re_unicode_lowercase_ae = re.compile(unicode(r"(?u)[æ]", "utf-8"))
 re_unicode_lowercase_oe = re.compile(unicode(r"(?u)[œ]", "utf-8"))
 re_unicode_lowercase_e = re.compile(unicode(r"(?u)[éèëê]", "utf-8"))
 re_unicode_lowercase_i = re.compile(unicode(r"(?u)[íìïî]", "utf-8"))
 re_unicode_lowercase_o = re.compile(unicode(r"(?u)[óòöôõø]", "utf-8"))
 re_unicode_lowercase_u = re.compile(unicode(r"(?u)[úùüû]", "utf-8"))
 re_unicode_lowercase_y = re.compile(unicode(r"(?u)[ýÿ]", "utf-8"))
 re_unicode_lowercase_c = re.compile(unicode(r"(?u)[çć]", "utf-8"))
 re_unicode_lowercase_n = re.compile(unicode(r"(?u)[ñ]", "utf-8"))
 re_unicode_lowercase_ss = re.compile(unicode(r"(?u)[ß]", "utf-8"))
 re_unicode_uppercase_a = re.compile(unicode(r"(?u)[ÁÀÄÂÃÅ]", "utf-8"))
 re_unicode_uppercase_ae = re.compile(unicode(r"(?u)[Æ]", "utf-8"))
 re_unicode_uppercase_oe = re.compile(unicode(r"(?u)[Œ]", "utf-8"))
 re_unicode_uppercase_e = re.compile(unicode(r"(?u)[ÉÈËÊ]", "utf-8"))
 re_unicode_uppercase_i = re.compile(unicode(r"(?u)[ÍÌÏÎ]", "utf-8"))
 re_unicode_uppercase_o = re.compile(unicode(r"(?u)[ÓÒÖÔÕØ]", "utf-8"))
 re_unicode_uppercase_u = re.compile(unicode(r"(?u)[ÚÙÜÛ]", "utf-8"))
 re_unicode_uppercase_y = re.compile(unicode(r"(?u)[Ý]", "utf-8"))
 re_unicode_uppercase_c = re.compile(unicode(r"(?u)[ÇĆ]", "utf-8"))
 re_unicode_uppercase_n = re.compile(unicode(r"(?u)[Ñ]", "utf-8"))
 re_latex_lowercase_a = re.compile("\\\\[\"H'`~^vu=k]\{?a\}?")
 re_latex_lowercase_ae = re.compile("\\\\ae\\{\\}?")
 re_latex_lowercase_oe = re.compile("\\\\oe\\{\\}?")
 re_latex_lowercase_e = re.compile("\\\\[\"H'`~^vu=k]\\{?e\\}?")
 re_latex_lowercase_i = re.compile("\\\\[\"H'`~^vu=k]\\{?i\\}?")
 re_latex_lowercase_o = re.compile("\\\\[\"H'`~^vu=k]\\{?o\\}?")
 re_latex_lowercase_u = re.compile("\\\\[\"H'`~^vu=k]\\{?u\\}?")
 re_latex_lowercase_y = re.compile("\\\\[\"']\\{?y\\}?")
 re_latex_lowercase_c = re.compile("\\\\['uc]\\{?c\\}?")
 re_latex_lowercase_n = re.compile("\\\\[c'~^vu]\\{?n\\}?")
 re_latex_uppercase_a = re.compile("\\\\[\"H'`~^vu=k]\\{?A\\}?")
 re_latex_uppercase_ae = re.compile("\\\\AE\\{?\\}?")
 re_latex_uppercase_oe = re.compile("\\\\OE\\{?\\}?")
 re_latex_uppercase_e = re.compile("\\\\[\"H'`~^vu=k]\\{?E\\}?")
 re_latex_uppercase_i = re.compile("\\\\[\"H'`~^vu=k]\\{?I\\}?")
 re_latex_uppercase_o = re.compile("\\\\[\"H'`~^vu=k]\\{?O\\}?")
 re_latex_uppercase_u = re.compile("\\\\[\"H'`~^vu=k]\\{?U\\}?")
 re_latex_uppercase_y = re.compile("\\\\[\"']\\{?Y\\}?")
 re_latex_uppercase_c = re.compile("\\\\['uc]\\{?C\\}?")
 re_latex_uppercase_n = re.compile("\\\\[c'~^vu]\\{?N\\}?")
 
 def indent_text(text,
                 nb_tabs=0,
                 tab_str="  ",
                 linebreak_input="\n",
                 linebreak_output="\n",
                 wrap=False):
     """
     add tabs to each line of text
     @param text: the text to indent
     @param nb_tabs: number of tabs to add
     @param tab_str: type of tab (could be, for example "\t", default: 2 spaces
     @param linebreak_input: linebreak on input
     @param linebreak_output: linebreak on output
     @param wrap: wethever to apply smart text wrapping.
         (by means of wrap_text_in_a_box)
     @return: indented text as string
     """
     if not wrap:
         lines = text.split(linebreak_input)
         tabs = nb_tabs*tab_str
         output = ""
         for line in lines:
             output += tabs + line + linebreak_output
         return output
     else:
         return wrap_text_in_a_box(body=text, style='no_border',
             tab_str=tab_str, tab_num=nb_tabs)
 
 _RE_BEGINNING_SPACES = re.compile(r'^\s*')
 _RE_NEWLINES_CLEANER = re.compile(r'\n+')
 _RE_LONELY_NEWLINES = re.compile(r'\b\n\b')
 def wrap_text_in_a_box(body='', title='', style='double_star', **args):
     """Return a nicely formatted text box:
         e.g.
        ******************
        **  title       **
        **--------------**
        **  body        **
        ******************
 
     Indentation and newline are respected.
     @param body: the main text
     @param title: an optional title
     @param style: the name of one of the style in CFG_WRAP_STYLES. By default
         the double_star style is used.
 
     You can further tune the desired style by setting various optional
     parameters:
         @param horiz_sep: a string that is repeated in order to produce a
             separator row between the title and the body (if needed)
             or a tuple of three characters in the form (l, c, r)
         @param max_col: the maximum number of coulmns used by the box
             (including indentation)
         @param min_col: the symmetrical minimum number of columns
         @param tab_str: a string to represent indentation
         @param tab_num: the number of leveles of indentations
         @param border: a tuple of 8 element in the form
             (tl, t, tr, l, r, bl, b, br) of strings that represent the
             different corners and sides of the box
         @param prefix: a prefix string added before the box
         @param suffix: a suffix string added after the box
         @param break_long: wethever to break long words in order to respect
             max_col
         @param force_horiz: True in order to print the horizontal line even when
             there is no title
 
     e.g.:
     print wrap_text_in_a_box(title='prova',
         body='  123 prova.\n    Vediamo come si indenta',
         horiz_sep='-', style='no_border', max_col=20, tab_num=1)
 
         prova
         ----------------
         123 prova.
             Vediamo come
             si indenta
 
     """
 
     def _wrap_row(row, max_col, break_long):
         """Wrap a single row"""
         spaces = _RE_BEGINNING_SPACES.match(row).group()
         row = row[len(spaces):]
         spaces = spaces.expandtabs()
         return textwrap.wrap(row, initial_indent=spaces,
             subsequent_indent=spaces, width=max_col,
             break_long_words=break_long)
 
     def _clean_newlines(text):
         text = _RE_LONELY_NEWLINES.sub(' \n', text)
         return _RE_NEWLINES_CLEANER.sub(lambda x: x.group()[:-1], text)
 
     body = unicode(body, 'utf-8')
     title = unicode(title, 'utf-8')
 
     astyle = dict(CFG_WRAP_TEXT_IN_A_BOX_STYLES['__DEFAULT'])
     if CFG_WRAP_TEXT_IN_A_BOX_STYLES.has_key(style):
         astyle.update(CFG_WRAP_TEXT_IN_A_BOX_STYLES[style])
     astyle.update(args)
 
     horiz_sep = astyle['horiz_sep']
     border = astyle['border']
     tab_str = astyle['tab_str'] * astyle['tab_num']
     max_col = max(astyle['max_col'] \
         - len(border[3]) - len(border[4]) - len(tab_str), 1)
     min_col = astyle['min_col']
     prefix = astyle['prefix']
     suffix = astyle['suffix']
     force_horiz = astyle['force_horiz']
     break_long = astyle['break_long']
 
     body = _clean_newlines(body)
     tmp_rows = [_wrap_row(row, max_col, break_long)
                         for row in body.split('\n')]
     body_rows = []
     for rows in tmp_rows:
         if rows:
             body_rows += rows
         else:
             body_rows.append('')
     if not ''.join(body_rows).strip():
         # Concrete empty body
         body_rows = []
 
     title = _clean_newlines(title)
     tmp_rows = [_wrap_row(row, max_col, break_long)
                         for row in title.split('\n')]
     title_rows = []
     for rows in tmp_rows:
         if rows:
             title_rows += rows
         else:
             title_rows.append('')
     if not ''.join(title_rows).strip():
         # Concrete empty title
         title_rows = []
 
     max_col = max([len(row) for row in body_rows + title_rows] + [min_col])
 
     mid_top_border_len = max_col \
         + len(border[3]) + len(border[4]) - len(border[0]) - len(border[2])
     mid_bottom_border_len = max_col \
         + len(border[3]) + len(border[4]) - len(border[5]) - len(border[7])
     top_border = border[0] \
         + (border[1] * mid_top_border_len)[:mid_top_border_len] + border[2]
     bottom_border = border[5] \
         + (border[6] * mid_bottom_border_len)[:mid_bottom_border_len] \
         + border[7]
     if type(horiz_sep) is tuple and len(horiz_sep) == 3:
         horiz_line = horiz_sep[0] + (horiz_sep[1] * (max_col + 2))[:(max_col + 2)] + horiz_sep[2]
     else:
         horiz_line = border[3] + (horiz_sep * max_col)[:max_col] + border[4]
 
     title_rows = [tab_str + border[3] + row
         + ' ' * (max_col - len(row)) + border[4] for row in title_rows]
     body_rows = [tab_str + border[3] + row
         + ' ' * (max_col - len(row)) + border[4] for row in body_rows]
 
     ret = []
     if top_border:
         ret += [tab_str + top_border]
     ret += title_rows
     if title_rows or force_horiz:
         ret += [tab_str + horiz_line]
     ret += body_rows
     if bottom_border:
         ret += [tab_str + bottom_border]
     return (prefix + '\n'.join(ret) + suffix).encode('utf-8')
 
 def wait_for_user(msg=""):
     """
     Print MSG and a confirmation prompt, waiting for user's
     confirmation, unless silent '--yes-i-know' command line option was
     used, in which case the function returns immediately without
     printing anything.
     """
     if '--yes-i-know' in sys.argv:
         return
     print msg
     try:
         answer = raw_input("Please confirm by typing 'Yes, I know!': ")
     except KeyboardInterrupt:
         print
         answer = ''
     if answer != 'Yes, I know!':
         sys.stderr.write("ERROR: Aborted.\n")
         sys.exit(1)
     return
 
 def guess_minimum_encoding(text, charsets=('ascii', 'latin1', 'utf8')):
     """Try to guess the minimum charset that is able to represent the given
     text using the provided charsets. text is supposed to be encoded in utf8.
     Returns (encoded_text, charset) where charset is the first charset
     in the sequence being able to encode text.
     Returns (text_in_utf8, 'utf8') in case no charset is able to encode text.
 
     @note: If the input text is not in strict UTF-8, then replace any
         non-UTF-8 chars inside it.
     """
     text_in_unicode = text.decode('utf8', 'replace')
     for charset in charsets:
         try:
             return (text_in_unicode.encode(charset), charset)
         except (UnicodeEncodeError, UnicodeDecodeError):
             pass
     return (text_in_unicode.encode('utf8'), 'utf8')
 
 def encode_for_xml(text, wash=False, xml_version='1.0', quote=False):
     """Encodes special characters in a text so that it would be
     XML-compliant.
     @param text: text to encode
     @return: an encoded text"""
     text = text.replace('&', '&amp;')
     text = text.replace('<', '&lt;')
     if quote:
         text = text.replace('"', '&quot;')
     if wash:
         text = wash_for_xml(text, xml_version=xml_version)
     return text
 
 try:
     unichr(0x100000)
     RE_ALLOWED_XML_1_0_CHARS = re.compile(u'[^\U00000009\U0000000A\U0000000D\U00000020-\U0000D7FF\U0000E000-\U0000FFFD\U00010000-\U0010FFFF]')
     RE_ALLOWED_XML_1_1_CHARS = re.compile(u'[^\U00000001-\U0000D7FF\U0000E000-\U0000FFFD\U00010000-\U0010FFFF]')
 except ValueError:
     # oops, we are running on a narrow UTF/UCS Python build,
     # so we have to limit the UTF/UCS char range:
     RE_ALLOWED_XML_1_0_CHARS = re.compile(u'[^\U00000009\U0000000A\U0000000D\U00000020-\U0000D7FF\U0000E000-\U0000FFFD]')
     RE_ALLOWED_XML_1_1_CHARS = re.compile(u'[^\U00000001-\U0000D7FF\U0000E000-\U0000FFFD]')
 
 def wash_for_xml(text, xml_version='1.0'):
     """
     Removes any character which is not in the range of allowed
     characters for XML. The allowed characters depends on the version
     of XML.
 
         - XML 1.0:
             <http://www.w3.org/TR/REC-xml/#charsets>
         - XML 1.1:
             <http://www.w3.org/TR/xml11/#charsets>
 
     @param text: input string to wash.
     @param xml_version: version of the XML for which we wash the
         input. Value for this parameter can be '1.0' or '1.1'
     """
     if xml_version == '1.0':
         return RE_ALLOWED_XML_1_0_CHARS.sub('', unicode(text, 'utf-8')).encode('utf-8')
     else:
         return RE_ALLOWED_XML_1_1_CHARS.sub('', unicode(text, 'utf-8')).encode('utf-8')
 
 def wash_for_utf8(text, correct=True):
     """Return UTF-8 encoded binary string with incorrect characters washed away.
 
     @param text: input string to wash (can be either a binary string or a Unicode string)
     @param correct: whether to correct bad characters or throw exception
     """
     if isinstance(text, unicode):
         return text.encode('utf-8')
 
     errors = "ignore" if correct else "strict"
     return text.decode("utf-8", errors).encode("utf-8", errors)
 
 def nice_size(size):
     """
     @param size: the size.
     @type size: int
     @return: a nicely printed size.
     @rtype: string
     """
     websearch_templates = invenio.template.load('websearch')
     unit = 'B'
     if size > 1024:
         size /= 1024.0
         unit = 'KB'
         if size > 1024:
             size /= 1024.0
             unit = 'MB'
             if size > 1024:
                 size /= 1024.0
                 unit = 'GB'
     return '%s %s' % (websearch_templates.tmpl_nice_number(size, max_ndigits_after_dot=2), unit)
 
 def remove_line_breaks(text):
     """
     Remove line breaks from input, including unicode 'line
     separator', 'paragraph separator', and 'next line' characters.
     """
     return unicode(text, 'utf-8').replace('\f', '').replace('\n', '').replace('\r', '').replace(u'\xe2\x80\xa8', '').replace(u'\xe2\x80\xa9', '').replace(u'\xc2\x85', '').encode('utf-8')
 
 def decode_to_unicode(text, default_encoding='utf-8'):
     """
     Decode input text into Unicode representation by first using the default
     encoding utf-8.
     If the operation fails, it detects the type of encoding used in the given text.
     For optimal result, it is recommended that the 'chardet' module is installed.
     NOTE: Beware that this might be slow for *very* large strings.
 
     If chardet detection fails, it will try to decode the string using the basic
     detection function guess_minimum_encoding().
 
     Also, bear in mind that it is impossible to detect the correct encoding at all
     times, other then taking educated guesses. With that said, this function will
     always return some decoded Unicode string, however the data returned may not
     be the same as original data in some cases.
 
     @param text: the text to decode
     @type text: string
 
     @param default_encoding: the character encoding to use. Optional.
     @type default_encoding: string
 
     @return: input text as Unicode
     @rtype: string
     """
     if not text:
         return ""
     try:
         return text.decode(default_encoding)
     except (UnicodeError, LookupError):
         pass
     detected_encoding = None
     if CHARDET_AVAILABLE:
         # We can use chardet to perform detection
         res = chardet.detect(text)
         if res['confidence'] >= 0.8:
             detected_encoding = res['encoding']
     if detected_encoding == None:
         # No chardet detection, try to make a basic guess
         dummy, detected_encoding = guess_minimum_encoding(text)
     return text.decode(detected_encoding)
 
 def translate_latex2unicode(text, kb_file="%s/bibconvert/KB/latex-to-unicode.kb" % \
                             (CFG_ETCDIR,)):
     """
     This function will take given text, presumably containing LaTeX symbols,
     and attempts to translate it to Unicode using the given or default KB
     translation table located under CFG_ETCDIR/bibconvert/KB/latex-to-unicode.kb.
     The translated Unicode string will then be returned.
 
     If the translation table and compiled regular expression object is not
     previously generated in the current session, they will be.
 
     @param text: a text presumably containing LaTeX symbols.
     @type text: string
 
     @param kb_file: full path to file containing latex2unicode translations.
                     Defaults to CFG_ETCDIR/bibconvert/KB/latex-to-unicode.kb
     @type kb_file: string
 
     @return: Unicode representation of translated text
     @rtype: unicode
     """
     # First decode input text to Unicode
     try:
         text = decode_to_unicode(text)
     except UnicodeDecodeError:
         text = unicode(wash_for_utf8(text))
     # Load translation table, if required
     if CFG_LATEX_UNICODE_TRANSLATION_CONST == {}:
         _load_latex2unicode_constants(kb_file)
     # Find all matches and replace text
     for match in CFG_LATEX_UNICODE_TRANSLATION_CONST['regexp_obj'].finditer(text):
         # If LaTeX style markers {, } and $ are before or after the matching text, it
         # will replace those as well
         text = re.sub("[\{\$]?%s[\}\$]?" % (re.escape(match.group()),), \
                       CFG_LATEX_UNICODE_TRANSLATION_CONST['table'][match.group()], \
                       text)
     # Return Unicode representation of translated text
     return text
 
 def _load_latex2unicode_constants(kb_file="%s/bibconvert/KB/latex-to-unicode.kb" % \
                             (CFG_ETCDIR,)):
     """
     Load LaTeX2Unicode translation table dictionary and regular expression object
     from KB to a global dictionary.
 
     @param kb_file: full path to file containing latex2unicode translations.
                     Defaults to CFG_ETCDIR/bibconvert/KB/latex-to-unicode.kb
     @type kb_file: string
 
     @return: dict of type: {'regexp_obj': regexp match object,
                             'table': dict of LaTeX -> Unicode mappings}
     @rtype: dict
     """
     try:
         data = open(kb_file)
     except IOError:
         # File not found or similar
         sys.stderr.write("\nCould not open LaTeX to Unicode KB file. Aborting translation.\n")
         return CFG_LATEX_UNICODE_TRANSLATION_CONST
     latex_symbols = []
     translation_table = {}
     for line in data:
         # The file has form of latex|--|utf-8. First decode to Unicode.
         line = line.decode('utf-8')
         mapping = line.split('|--|')
         translation_table[mapping[0].rstrip('\n')] = mapping[1].rstrip('\n')
         latex_symbols.append(re.escape(mapping[0].rstrip('\n')))
     data.close()
     CFG_LATEX_UNICODE_TRANSLATION_CONST['regexp_obj'] = re.compile("|".join(latex_symbols))
     CFG_LATEX_UNICODE_TRANSLATION_CONST['table'] = translation_table
 
 def translate_to_ascii(values):
     """
     Transliterate the string contents of the given sequence into ascii representation.
     Returns a sequence with the modified values if the module 'unidecode' is
     available. Otherwise it will fall back to the inferior strip_accents function.
 
     For example: H\xc3\xb6hne becomes Hohne.
 
     Note: Passed strings are returned as a list.
 
     @param values: sequence of strings to transform
     @type values: sequence
 
     @return: sequence with values transformed to ascii
     @rtype: sequence
     """
     if not values and not type(values) == str:
         return values
 
     if type(values) == str:
         values = [values]
     for index, value in enumerate(values):
         if not value:
             continue
-        if not UNIDECODE_AVAILABLE:
-            ascii_text = strip_accents(value)
-        else:
-            encoded_text, encoding = guess_minimum_encoding(value)
-            unicode_text = unicode(encoded_text.decode(encoding))
-            decoded_text = ""
+        unicode_text = decode_to_unicode(value)
+        if u"[?]" in unicode_text:
+            decoded_text = []
             for unicode_char in unicode_text:
                 decoded_char = unidecode(unicode_char)
                 # Skip unrecognized characters
                 if decoded_char != "[?]":
-                    decoded_text += decoded_char
-            ascii_text = decoded_text.encode('ascii')
+                    decoded_text.append(decoded_char)
+            ascii_text = ''.join(decoded_text).encode('ascii')
+        else:
+            ascii_text = unidecode(unicode_text).replace(u"[?]", u"").encode('ascii')
         values[index] = ascii_text
     return values
 
 def xml_entities_to_utf8(text, skip=('lt', 'gt', 'amp')):
     """
     Removes HTML or XML character references and entities from a text string
     and replaces them with their UTF-8 representation, if possible.
 
     @param text: The HTML (or XML) source text.
     @type text: string
 
     @param skip: list of entity names to skip when transforming.
     @type skip: iterable
 
     @return: The plain text, as a Unicode string, if necessary.
     @author: Based on http://effbot.org/zone/re-sub.htm#unescape-html
     """
     def fixup(m):
         text = m.group(0)
         if text[:2] == "&#":
             # character reference
             try:
                 if text[:3] == "&#x":
                     return unichr(int(text[3:-1], 16)).encode("utf-8")
                 else:
                     return unichr(int(text[2:-1])).encode("utf-8")
             except ValueError:
                 pass
         else:
             # named entity
             if text[1:-1] not in skip:
                 try:
                     text = unichr(htmlentitydefs.name2codepoint[text[1:-1]]).encode("utf-8")
                 except KeyError:
                     pass
         return text # leave as is
     return re.sub("&#?\w+;", fixup, text)
 
 def strip_accents(x):
     """
     Strip accents in the input phrase X (assumed in UTF-8) by replacing
     accented characters with their unaccented cousins (e.g. é by e).
 
     @param x: the input phrase to strip.
     @type x: string
 
     @return: Return such a stripped X.
     """
     x = re_latex_lowercase_a.sub("a", x)
     x = re_latex_lowercase_ae.sub("ae", x)
     x = re_latex_lowercase_oe.sub("oe", x)
     x = re_latex_lowercase_e.sub("e", x)
     x = re_latex_lowercase_i.sub("i", x)
     x = re_latex_lowercase_o.sub("o", x)
     x = re_latex_lowercase_u.sub("u", x)
     x = re_latex_lowercase_y.sub("x", x)
     x = re_latex_lowercase_c.sub("c", x)
     x = re_latex_lowercase_n.sub("n", x)
     x = re_latex_uppercase_a.sub("A", x)
     x = re_latex_uppercase_ae.sub("AE", x)
     x = re_latex_uppercase_oe.sub("OE", x)
     x = re_latex_uppercase_e.sub("E", x)
     x = re_latex_uppercase_i.sub("I", x)
     x = re_latex_uppercase_o.sub("O", x)
     x = re_latex_uppercase_u.sub("U", x)
     x = re_latex_uppercase_y.sub("Y", x)
     x = re_latex_uppercase_c.sub("C", x)
     x = re_latex_uppercase_n.sub("N", x)
 
     # convert input into Unicode string:
     try:
         y = unicode(x, "utf-8")
     except:
         return x # something went wrong, probably the input wasn't UTF-8
     # asciify Latin-1 lowercase characters:
     y = re_unicode_lowercase_a.sub("a", y)
     y = re_unicode_lowercase_ae.sub("ae", y)
     y = re_unicode_lowercase_oe.sub("oe", y)
     y = re_unicode_lowercase_e.sub("e", y)
     y = re_unicode_lowercase_i.sub("i", y)
     y = re_unicode_lowercase_o.sub("o", y)
     y = re_unicode_lowercase_u.sub("u", y)
     y = re_unicode_lowercase_y.sub("y", y)
     y = re_unicode_lowercase_c.sub("c", y)
     y = re_unicode_lowercase_n.sub("n", y)
     y = re_unicode_lowercase_ss.sub("ss", y)
     # asciify Latin-1 uppercase characters:
     y = re_unicode_uppercase_a.sub("A", y)
     y = re_unicode_uppercase_ae.sub("AE", y)
     y = re_unicode_uppercase_oe.sub("OE", y)
     y = re_unicode_uppercase_e.sub("E", y)
     y = re_unicode_uppercase_i.sub("I", y)
     y = re_unicode_uppercase_o.sub("O", y)
     y = re_unicode_uppercase_u.sub("U", y)
     y = re_unicode_uppercase_y.sub("Y", y)
     y = re_unicode_uppercase_c.sub("C", y)
     y = re_unicode_uppercase_n.sub("N", y)
     # return UTF-8 representation of the Unicode string:
     return y.encode("utf-8")
 
 
 def show_diff(original, modified, prefix='', suffix='',
             prefix_unchanged=' ',
             suffix_unchanged='',
             prefix_removed='-',
             suffix_removed='',
             prefix_added='+',
             suffix_added=''):
     """
     Returns the diff view between original and modified strings.
     Function checks both arguments line by line and returns a string
     with a:
     - prefix_unchanged when line is common to both sequences
     - prefix_removed when line is unique to sequence 1
     - prefix_added when line is unique to sequence 2
     and a corresponding suffix in each line
     @param original: base string
     @param modified: changed string
     @param prefix: prefix of the output string
     @param suffix: suffix of the output string
     @param prefix_unchanged: prefix of the unchanged line
     @param suffix_unchanged: suffix of the unchanged line
     @param prefix_removed: prefix of the removed line
     @param suffix_removed: suffix of the removed line
     @param prefix_added: prefix of the added line
     @param suffix_added: suffix of the added line
 
     @return: string with the comparison of the records
     @rtype: string
     """
     import difflib
     differ = difflib.Differ()
 
     result = [prefix]
     for line in differ.compare(modified.splitlines(), original.splitlines()):
         if line[0] == ' ':
             # Mark as unchanged
             result.append(prefix_unchanged + line[2:].strip() + suffix_unchanged)
         elif line[0] == '-':
             # Mark as removed
             result.append(prefix_removed + line[2:].strip() + suffix_removed)
         elif line[0] == '+':
             # Mark as added/modified
             result.append(prefix_added + line[2:].strip() + suffix_added)
 
     result.append(suffix)
     return '\n'.join(result)
 
 
 def transliterate_ala_lc(value):
     """
     Transliterate a string.
     Compatibility with the ALA-LC romanization standard:
     http://www.loc.gov/catdir/cpso/roman.html
 
     Maps from one system of writing into another, letter by letter.
     Uses 'unidecode' if available.
 
     @param values: string to transform
     @type values: string
 
     @return: string transliterated
     @rtype: string
     """
     if not value:
         return value
-    if UNIDECODE_AVAILABLE:
-        text = unidecode(value)
-    else:
-        text = translate_to_ascii(value)
-        text = text.pop()
+    text = unidecode(value)
     return text
 
 
 def escape_latex(text):
     """
     This function takes the given text and escapes characters
     that have a special meaning in LaTeX: # $ % ^ & _ { } ~ \
     """
     text = unicode(text.decode('utf-8'))
     CHARS = {
         '&':  r'\&',
         '%':  r'\%',
         '$':  r'\$',
         '#':  r'\#',
         '_':  r'\_',
         '{':  r'\{',
         '}':  r'\}',
         '~':  r'\~{}',
         '^':  r'\^{}',
         '\\': r'\textbackslash{}',
     }
     escaped = "".join([CHARS.get(char, char) for char in text])
     return escaped.encode('utf-8')
diff --git a/modules/miscutil/lib/textutils_unit_tests.py b/modules/miscutil/lib/textutils_unit_tests.py
index 66bdf981b..9a7a93145 100644
--- a/modules/miscutil/lib/textutils_unit_tests.py
+++ b/modules/miscutil/lib/textutils_unit_tests.py
@@ -1,589 +1,581 @@
 # -*- coding: utf-8 -*-
 ##
 ## This file is part of Invenio.
 ## Copyright (C) 2008, 2009, 2010, 2011, 2013 CERN.
 ##
 ## Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 """Unit tests for the textutils library."""
 
 __revision__ = "$Id$"
 
 from invenio.testutils import InvenioTestCase
 
 try:
     import chardet
     CHARDET_AVAILABLE = True
 except ImportError:
     CHARDET_AVAILABLE = False
 
-try:
-    from unidecode import unidecode
-    UNIDECODE_AVAILABLE = True
-except ImportError:
-    UNIDECODE_AVAILABLE = False
+from unidecode import unidecode
 
 from invenio.textutils import \
      wrap_text_in_a_box, \
      guess_minimum_encoding, \
      wash_for_xml, \
      wash_for_utf8, \
      decode_to_unicode, \
      translate_latex2unicode, \
      translate_to_ascii, \
      strip_accents, \
      transliterate_ala_lc, \
      escape_latex, \
      show_diff
 
 from invenio.testutils import make_test_suite, run_test_suite
 
 
 class GuessMinimumEncodingTest(InvenioTestCase):
     """Test functions related to guess_minimum_encoding function."""
     def test_guess_minimum_encoding(self):
         """textutils - guess_minimum_encoding."""
         self.assertEqual(guess_minimum_encoding('patata'), ('patata', 'ascii'))
         self.assertEqual(guess_minimum_encoding('àèéìòù'), ('\xe0\xe8\xe9\xec\xf2\xf9', 'latin1'))
         self.assertEqual(guess_minimum_encoding('Ιθάκη'), ('Ιθάκη', 'utf8'))
 
 
 class WashForXMLTest(InvenioTestCase):
     """Test functions related to wash_for_xml function."""
 
     def test_latin_characters_washing_1_0(self):
         """textutils - washing latin characters for XML 1.0."""
         self.assertEqual(wash_for_xml('àèéìòùÀ'), 'àèéìòùÀ')
 
     def test_latin_characters_washing_1_1(self):
         """textutils - washing latin characters for XML 1.1."""
         self.assertEqual(wash_for_xml('àèéìòùÀ', xml_version='1.1'), 'àèéìòùÀ')
 
     def test_chinese_characters_washing_1_0(self):
         """textutils - washing chinese characters for XML 1.0."""
         self.assertEqual(wash_for_xml('''
         春眠暁を覚えず
         処処に啼鳥と聞く
         夜来風雨の声
         花落つること
         知んぬ多少ぞ'''), '''
         春眠暁を覚えず
         処処に啼鳥と聞く
         夜来風雨の声
         花落つること
         知んぬ多少ぞ''')
 
     def test_chinese_characters_washing_1_1(self):
         """textutils - washing chinese characters for XML 1.1."""
         self.assertEqual(wash_for_xml('''
         春眠暁を覚えず
         処処に啼鳥と聞く
         夜来風雨の声
         花落つること
         知んぬ多少ぞ''', xml_version='1.1'), '''
         春眠暁を覚えず
         処処に啼鳥と聞く
         夜来風雨の声
         花落つること
         知んぬ多少ぞ''')
 
     def test_greek_characters_washing_1_0(self):
         """textutils - washing greek characters for XML 1.0."""
         self.assertEqual(wash_for_xml('''
         ἄνδρα μοι ἔννεπε, μου̂σα, πολύτροπον, ὃς μάλα πολλὰ
         πλάγχθη, ἐπεὶ Τροίης ἱερὸν πτολίεθρον ἔπερσεν:
         πολλω̂ν δ' ἀνθρώπων ἴδεν ἄστεα καὶ νόον ἔγνω,
         πολλὰ δ' ὅ γ' ἐν πόντῳ πάθεν ἄλγεα ὃν κατὰ θυμόν,
         ἀρνύμενος ἥν τε ψυχὴν καὶ νόστον ἑταίρων.
         ἀλλ' οὐδ' ὣς ἑτάρους ἐρρύσατο, ἱέμενός περ:
         αὐτω̂ν γὰρ σφετέρῃσιν ἀτασθαλίῃσιν ὄλοντο,
         νήπιοι, οἳ κατὰ βου̂ς  ̔Υπερίονος  ̓Ηελίοιο
         ἤσθιον: αὐτὰρ ὁ τοι̂σιν ἀφείλετο νόστιμον ἠ̂μαρ.
         τω̂ν ἁμόθεν γε, θεά, θύγατερ Διός, εἰπὲ καὶ ἡμι̂ν.'''), '''
         ἄνδρα μοι ἔννεπε, μου̂σα, πολύτροπον, ὃς μάλα πολλὰ
         πλάγχθη, ἐπεὶ Τροίης ἱερὸν πτολίεθρον ἔπερσεν:
         πολλω̂ν δ' ἀνθρώπων ἴδεν ἄστεα καὶ νόον ἔγνω,
         πολλὰ δ' ὅ γ' ἐν πόντῳ πάθεν ἄλγεα ὃν κατὰ θυμόν,
         ἀρνύμενος ἥν τε ψυχὴν καὶ νόστον ἑταίρων.
         ἀλλ' οὐδ' ὣς ἑτάρους ἐρρύσατο, ἱέμενός περ:
         αὐτω̂ν γὰρ σφετέρῃσιν ἀτασθαλίῃσιν ὄλοντο,
         νήπιοι, οἳ κατὰ βου̂ς  ̔Υπερίονος  ̓Ηελίοιο
         ἤσθιον: αὐτὰρ ὁ τοι̂σιν ἀφείλετο νόστιμον ἠ̂μαρ.
         τω̂ν ἁμόθεν γε, θεά, θύγατερ Διός, εἰπὲ καὶ ἡμι̂ν.''')
 
     def test_greek_characters_washing_1_1(self):
         """textutils - washing greek characters for XML 1.1."""
         self.assertEqual(wash_for_xml('''
         ἄνδρα μοι ἔννεπε, μου̂σα, πολύτροπον, ὃς μάλα πολλὰ
         πλάγχθη, ἐπεὶ Τροίης ἱερὸν πτολίεθρον ἔπερσεν:
         πολλω̂ν δ' ἀνθρώπων ἴδεν ἄστεα καὶ νόον ἔγνω,
         πολλὰ δ' ὅ γ' ἐν πόντῳ πάθεν ἄλγεα ὃν κατὰ θυμόν,
         ἀρνύμενος ἥν τε ψυχὴν καὶ νόστον ἑταίρων.
         ἀλλ' οὐδ' ὣς ἑτάρους ἐρρύσατο, ἱέμενός περ:
         αὐτω̂ν γὰρ σφετέρῃσιν ἀτασθαλίῃσιν ὄλοντο,
         νήπιοι, οἳ κατὰ βου̂ς  ̔Υπερίονος  ̓Ηελίοιο
         ἤσθιον: αὐτὰρ ὁ τοι̂σιν ἀφείλετο νόστιμον ἠ̂μαρ.
         τω̂ν ἁμόθεν γε, θεά, θύγατερ Διός, εἰπὲ καὶ ἡμι̂ν.''',
         xml_version='1.1'), '''
         ἄνδρα μοι ἔννεπε, μου̂σα, πολύτροπον, ὃς μάλα πολλὰ
         πλάγχθη, ἐπεὶ Τροίης ἱερὸν πτολίεθρον ἔπερσεν:
         πολλω̂ν δ' ἀνθρώπων ἴδεν ἄστεα καὶ νόον ἔγνω,
         πολλὰ δ' ὅ γ' ἐν πόντῳ πάθεν ἄλγεα ὃν κατὰ θυμόν,
         ἀρνύμενος ἥν τε ψυχὴν καὶ νόστον ἑταίρων.
         ἀλλ' οὐδ' ὣς ἑτάρους ἐρρύσατο, ἱέμενός περ:
         αὐτω̂ν γὰρ σφετέρῃσιν ἀτασθαλίῃσιν ὄλοντο,
         νήπιοι, οἳ κατὰ βου̂ς  ̔Υπερίονος  ̓Ηελίοιο
         ἤσθιον: αὐτὰρ ὁ τοι̂σιν ἀφείλετο νόστιμον ἠ̂μαρ.
         τω̂ν ἁμόθεν γε, θεά, θύγατερ Διός, εἰπὲ καὶ ἡμι̂ν.''')
 
     def test_russian_characters_washing_1_0(self):
         """textutils - washing greek characters for XML 1.0."""
         self.assertEqual(wash_for_xml('''
         В тени дерев, над чистыми водами
         Дерновый холм вы видите ль, друзья?
         Чуть слышно там плескает в брег струя;
         Чуть ветерок там дышит меж листами;
         На ветвях лира и венец...
         Увы! друзья, сей холм - могила;
         Здесь прах певца земля сокрыла;
         Бедный певец!''', xml_version='1.1'), '''
         В тени дерев, над чистыми водами
         Дерновый холм вы видите ль, друзья?
         Чуть слышно там плескает в брег струя;
         Чуть ветерок там дышит меж листами;
         На ветвях лира и венец...
         Увы! друзья, сей холм - могила;
         Здесь прах певца земля сокрыла;
         Бедный певец!''')
 
     def test_russian_characters_washing_1_1(self):
         """textutils - washing greek characters for XML 1.1."""
         self.assertEqual(wash_for_xml('''
         В тени дерев, над чистыми водами
         Дерновый холм вы видите ль, друзья?
         Чуть слышно там плескает в брег струя;
         Чуть ветерок там дышит меж листами;
         На ветвях лира и венец...
         Увы! друзья, сей холм - могила;
         Здесь прах певца земля сокрыла;
         Бедный певец!''', xml_version='1.1'), '''
         В тени дерев, над чистыми водами
         Дерновый холм вы видите ль, друзья?
         Чуть слышно там плескает в брег струя;
         Чуть ветерок там дышит меж листами;
         На ветвях лира и венец...
         Увы! друзья, сей холм - могила;
         Здесь прах певца земля сокрыла;
         Бедный певец!''')
 
     def test_illegal_characters_washing_1_0(self):
         """textutils - washing illegal characters for XML 1.0."""
         self.assertEqual(wash_for_xml(chr(8) + chr(9) + 'some chars'), '\tsome chars')
         self.assertEqual(wash_for_xml('$b\bar{b}$'), '$bar{b}$')
 
     def test_illegal_characters_washing_1_1(self):
         """textutils - washing illegal characters for XML 1.1."""
         self.assertEqual(wash_for_xml(chr(8) + chr(9) + 'some chars',
                                       xml_version='1.1'), '\x08\tsome chars')
         self.assertEqual(wash_for_xml('$b\bar{b}$', xml_version='1.1'), '$b\x08ar{b}$')
 
 
 class WashForUTF8Test(InvenioTestCase):
     def test_normal_legal_string_washing(self):
         """textutils - testing UTF-8 washing on a perfectly normal string"""
         some_str = "This is an example string"
         self.assertEqual(some_str, wash_for_utf8(some_str))
 
     def test_chinese_string_washing(self):
         """textutils - testing washing functions on chinese script"""
         some_str = """春眠暁を覚えず
         処処に啼鳥と聞く
         夜来風雨の声
         花落つること
         知んぬ多少ぞ"""
         self.assertEqual(some_str, wash_for_utf8(some_str))
 
     def test_russian_characters_washing(self):
         """textutils - washing Russian characters for UTF-8"""
         self.assertEqual(wash_for_utf8('''
         В тени дерев, над чистыми водами
         Дерновый холм вы видите ль, друзья?
         Чуть слышно там плескает в брег струя;
         Чуть ветерок там дышит меж листами;
         На ветвях лира и венец...
         Увы! друзья, сей холм - могила;
         Здесь прах певца земля сокрыла;
         Бедный певец!'''), '''
         В тени дерев, над чистыми водами
         Дерновый холм вы видите ль, друзья?
         Чуть слышно там плескает в брег струя;
         Чуть ветерок там дышит меж листами;
         На ветвях лира и венец...
         Увы! друзья, сей холм - могила;
         Здесь прах певца земля сокрыла;
         Бедный певец!''')
 
     def test_remove_incorrect_unicode_characters(self):
         """textutils - washing out the incorrect characters"""
         self.assertEqual(wash_for_utf8("Ź\206dź\204bło żół\203wia \202"), "Źdźbło żółwia ")
 
     def test_empty_string_wash(self):
         """textutils - washing an empty string"""
         self.assertEqual(wash_for_utf8(""), "")
 
     def test_only_incorrect_unicode_wash(self):
         """textutils - washing an empty string"""
         self.assertEqual(wash_for_utf8("\202\203\204\205"), "")
 
     def test_raising_exception_on_incorrect(self):
         """textutils - assuring an exception on incorrect input"""
         self.assertRaises(UnicodeDecodeError, wash_for_utf8, "\202\203\204\205", correct=False)
 
     def test_already_utf8_input(self):
         """textutils - washing a Unicode string into UTF-8 binary string"""
         self.assertEqual('Göppert', wash_for_utf8(u'G\xf6ppert', True))
 
 
 class WrapTextInABoxTest(InvenioTestCase):
     """Test functions related to wrap_text_in_a_box function."""
 
     def test_plain_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box plain."""
         result = """
 **********************************************
 ** foo bar                                  **
 **********************************************
 """
         self.assertEqual(wrap_text_in_a_box('foo bar'), result)
 
     def test_empty_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box empty."""
         result = """
 **********************************************
 **********************************************
 """
         self.assertEqual(wrap_text_in_a_box(), result)
 
     def test_with_title_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box with title."""
         result = """
 **********************************************
 ** a Title!                                 **
 ** **************************************** **
 ** foo bar                                  **
 **********************************************
 """
         self.assertEqual(wrap_text_in_a_box('foo bar', title='a Title!'), result)
 
     def test_multiline_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box multiline."""
         result = """
 **********************************************
 ** foo bar                                  **
 **********************************************
 """
         self.assertEqual(wrap_text_in_a_box('foo\n bar'), result)
 
     def test_real_multiline_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box real multiline."""
         result = """
 **********************************************
 ** foo                                      **
 ** bar                                      **
 **********************************************
 """
         self.assertEqual(wrap_text_in_a_box('foo\n\nbar'), result)
 
     def test_real_no_width_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box no width."""
         result = """
 ************
 ** foobar **
 ************
 """
         self.assertEqual(wrap_text_in_a_box('foobar', min_col=0), result)
 
     def test_real_nothing_at_all_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box nothing at all."""
         result = """
 ******
 ******
 """
         self.assertEqual(wrap_text_in_a_box(min_col=0), result)
 
     def test_real_squared_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box squared style."""
         result = """
 +--------+
 | foobar |
 +--------+
 """
         self.assertEqual(wrap_text_in_a_box('foobar', style='squared', min_col=0), result)
 
     def test_indented_text_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box indented text."""
         text = """
     def test_real_squared_wrap_text_in_a_box(self):\n
         \"""wrap_text_in_a_box - squared style.\"""\n
         result = \"""\n
 +--------+\n
 | foobar |\n
 +--------+
 \"""
 """
         result = """
 ******************************
 **     def test_real_square **
 **     d_wrap_text_in_a_box **
 **     (self):              **
 **         \"""wrap_text_in_ **
 **         a_box - squared  **
 **         style.\"""        **
 **         result = \"""     **
 ** +--------+               **
 ** | foobar |               **
 ** +--------+\"""            **
 ******************************
 """
         self.assertEqual(wrap_text_in_a_box(text, min_col=0, max_col=30, break_long=True), result)
 
     def test_single_new_line_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box single new line."""
         result = """
 **********************************************
 ** ciao come và?                            **
 **********************************************
 """
         self.assertEqual(wrap_text_in_a_box("ciao\ncome và?"), result)
 
     def test_indented_box_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box indented box."""
         result = """
     **********************************************
     ** foobar                                   **
     **********************************************
 """
         self.assertEqual(wrap_text_in_a_box('foobar', tab_num=1), result)
 
     def test_real_conclusion_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box conclusion."""
         result = """----------------------------------------
 foobar                                  \n"""
         self.assertEqual(wrap_text_in_a_box('foobar', style='conclusion'), result)
 
     def test_real_longtext_wrap_text_in_a_box(self):
         """textutils - wrap_text_in_a_box long text."""
         text = """Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 
 At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat."""
 
         result = """
 ************************************************************************
 ** Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do   **
 ** eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut     **
 ** enim ad minim veniam, quis nostrud exercitation ullamco laboris    **
 ** nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in  **
 ** reprehenderit in voluptate velit esse cillum dolore eu fugiat      **
 ** nulla pariatur. Excepteur sint occaecat cupidatat non proident,    **
 ** sunt in culpa qui officia deserunt mollit anim id est laborum.     **
 ** At vero eos et accusamus et iusto odio dignissimos ducimus qui     **
 ** blanditiis praesentium voluptatum deleniti atque corrupti quos     **
 ** dolores et quas molestias excepturi sint occaecati cupiditate non  **
 ** provident, similique sunt in culpa qui officia deserunt mollitia   **
 ** animi, id est laborum et dolorum fuga. Et harum quidem rerum       **
 ** facilis est et expedita distinctio. Nam libero tempore, cum soluta **
 ** nobis est eligendi optio cumque nihil impedit quo minus id quod    **
 ** maxime placeat facere possimus, omnis voluptas assumenda est,      **
 ** omnis dolor repellendus. Temporibus autem quibusdam et aut         **
 ** officiis debitis aut rerum necessitatibus saepe eveniet ut et      **
 ** voluptates repudiandae sint et molestiae non recusandae. Itaque    **
 ** earum rerum hic tenetur a sapiente delectus, ut aut reiciendis     **
 ** voluptatibus maiores alias consequatur aut perferendis doloribus   **
 ** asperiores repellat.                                               **
 ************************************************************************
 """
         self.assertEqual(wrap_text_in_a_box(text), result)
 
 
 class DecodeToUnicodeTest(InvenioTestCase):
     """Test functions related to decode_to_unicode function."""
     if CHARDET_AVAILABLE:
         def test_decode_to_unicode(self):
             """textutils - decode_to_unicode."""
             self.assertEqual(decode_to_unicode('\202\203\204\205', default_encoding='latin1'), u'\x82\x83\x84\x85')
             self.assertEqual(decode_to_unicode('àèéìòù'), u'\xe0\xe8\xe9\xec\xf2\xf9')
             self.assertEqual(decode_to_unicode('Ιθάκη'), u'\u0399\u03b8\u03ac\u03ba\u03b7')
     else:
         pass
 
 
 class Latex2UnicodeTest(InvenioTestCase):
     """Test functions related to translating LaTeX symbols to Unicode."""
 
     def test_latex_to_unicode(self):
         """textutils - latex_to_unicode"""
         self.assertEqual(translate_latex2unicode("\\'a \\'i \\'U").encode('utf-8'), "á í Ú")
         self.assertEqual(translate_latex2unicode("\\'N \\k{i}"), u'\u0143 \u012f')
         self.assertEqual(translate_latex2unicode("\\AAkeson"), u'\u212bkeson')
         self.assertEqual(translate_latex2unicode("$\\mathsl{\\Zeta}$"), u'\U0001d6e7')
 
 
 class TestStripping(InvenioTestCase):
     """Test for stripping functions like accents and control characters."""
-    if UNIDECODE_AVAILABLE:
-        def test_text_to_ascii(self):
-            """textutils - transliterate to ascii using unidecode"""
-            self.assert_(translate_to_ascii(
-                ["á í Ú", "H\xc3\xb6hne", "Åge Øst Vær", "normal"]) in
-                (["a i U", "Hohne", "Age Ost Vaer", "normal"],  ## unidecode < 0.04.13
-                 ['a i U', 'Hoehne', 'Age Ost Vaer', 'normal']) ## unidecode >= 0.04.13
-            )
-            self.assertEqual(translate_to_ascii("àèéìòù"), ["aeeiou"])
-            self.assertEqual(translate_to_ascii("ß"), ["ss"])
-            self.assertEqual(translate_to_ascii(None), None)
-            self.assertEqual(translate_to_ascii([]), [])
-            self.assertEqual(translate_to_ascii([None]), [None])
-            self.assertEqual(translate_to_ascii("√"), [""])
-    else:
-        pass
+    def test_text_to_ascii(self):
+        """textutils - transliterate to ascii using unidecode"""
+        self.assert_(translate_to_ascii(
+            ["á í Ú", "H\xc3\xb6hne", "Åge Øst Vær", "normal"]) in
+            (["a i U", "Hohne", "Age Ost Vaer", "normal"],  ## unidecode < 0.04.13
+                ['a i U', 'Hoehne', 'Age Ost Vaer', 'normal']) ## unidecode >= 0.04.13
+        )
+        self.assertEqual(translate_to_ascii("àèéìòù"), ["aeeiou"])
+        self.assertEqual(translate_to_ascii("ß"), ["ss"])
+        self.assertEqual(translate_to_ascii(None), None)
+        self.assertEqual(translate_to_ascii([]), [])
+        self.assertEqual(translate_to_ascii([None]), [None])
+        self.assertEqual(translate_to_ascii("√"), [""])
 
     def test_strip_accents(self):
         """textutils - transliterate to ascii (basic)"""
         self.assertEqual("memememe",
                          strip_accents('mémêmëmè'))
         self.assertEqual("MEMEMEME",
                          strip_accents('MÉMÊMËMÈ'))
         self.assertEqual("oe",
                          strip_accents('œ'))
         self.assertEqual("OE",
                          strip_accents('Œ'))
 
 class TestDiffering(InvenioTestCase):
     """Test for differing two strings."""
 
     string1 = """Lorem ipsum dolor sit amet, consectetur adipiscing
 elit. Donec fringilla tellus eget fringilla sagittis. Pellentesque
 posuere lacus id erat tristique pulvinar. Morbi volutpat, diam
 eget interdum lobortis, lacus mi cursus leo, sit amet porttitor
 neque est vitae lectus. Donec tempor metus vel tincidunt fringilla.
 Nam iaculis lacinia nisl, enim sollicitudin
 convallis. Morbi ut mauris velit. Proin suscipit dolor id risus
 placerat sodales nec id elit. Morbi vel lacinia lectus, eget laoreet
 dui. Nunc commodo neque porttitor eros placerat, sed ultricies purus
 accumsan. In velit nisi, accumsan molestie gravida a, rutrum in augue.
 Nulla pharetra purus nec dolor ornare, ut aliquam odio placerat.
 Aenean ultrices condimentum quam vitae pharetra."""
 
     string2 = """Lorem ipsum dolor sit amet, consectetur adipiscing
 elit. Donec fringilla tellus eget fringilla sagittis. Pellentesque
 posuere lacus id erat.
 eget interdum lobortis, lacus mi cursus leo, sit amet porttitor
 neque est vitae lectus. Donec tempor metus vel tincidunt fringilla.
 Nam iaculis lacinia nisl, consectetur viverra enim sollicitudin
 convallis. Morbi ut mauris velit. Proin suscipit dolor id risus
 placerat sodales nec id elit. Morbi vel lacinia lectus, eget laoreet
 placerat sodales nec id elit. Morbi vel lacinia lectus, eget laoreet
 dui. Nunc commodo neque porttitor eros placerat, sed ultricies purus
 accumsan. In velit nisi, lorem ipsum lorem gravida a, rutrum in augue.
 Nulla pharetra purus nec dolor ornare, ut aliquam odio placerat.
 Aenean ultrices condimentum quam vitae pharetra."""
 
     def test_show_diff_plain_text(self):
         """textutils - show_diff() with plain text"""
 
         expected_result = """
  Lorem ipsum dolor sit amet, consectetur adipiscing
  elit. Donec fringilla tellus eget fringilla sagittis. Pellentesque
 -posuere lacus id erat.
 +posuere lacus id erat tristique pulvinar. Morbi volutpat, diam
  eget interdum lobortis, lacus mi cursus leo, sit amet porttitor
  neque est vitae lectus. Donec tempor metus vel tincidunt fringilla.
 -Nam iaculis lacinia nisl, consectetur viverra enim sollicitudin
 +Nam iaculis lacinia nisl, enim sollicitudin
  convallis. Morbi ut mauris velit. Proin suscipit dolor id risus
  placerat sodales nec id elit. Morbi vel lacinia lectus, eget laoreet
 -placerat sodales nec id elit. Morbi vel lacinia lectus, eget laoreet
  dui. Nunc commodo neque porttitor eros placerat, sed ultricies purus
 -accumsan. In velit nisi, lorem ipsum lorem gravida a, rutrum in augue.
 +accumsan. In velit nisi, accumsan molestie gravida a, rutrum in augue.
  Nulla pharetra purus nec dolor ornare, ut aliquam odio placerat.
  Aenean ultrices condimentum quam vitae pharetra.
 """
 
         self.assertEqual(show_diff(self.string1, self.string2), expected_result)
 
     def test_show_diff_html(self):
         """textutils - show_diff() with plain text"""
 
         expected_result = """<pre>
 Lorem ipsum dolor sit amet, consectetur adipiscing
 elit. Donec fringilla tellus eget fringilla sagittis. Pellentesque
 <strong class="diff_field_deleted">posuere lacus id erat.</strong>
 <strong class="diff_field_added">posuere lacus id erat tristique pulvinar. Morbi volutpat, diam</strong>
 eget interdum lobortis, lacus mi cursus leo, sit amet porttitor
 neque est vitae lectus. Donec tempor metus vel tincidunt fringilla.
 <strong class="diff_field_deleted">Nam iaculis lacinia nisl, consectetur viverra enim sollicitudin</strong>
 <strong class="diff_field_added">Nam iaculis lacinia nisl, enim sollicitudin</strong>
 convallis. Morbi ut mauris velit. Proin suscipit dolor id risus
 placerat sodales nec id elit. Morbi vel lacinia lectus, eget laoreet
 <strong class="diff_field_deleted">placerat sodales nec id elit. Morbi vel lacinia lectus, eget laoreet</strong>
 dui. Nunc commodo neque porttitor eros placerat, sed ultricies purus
 <strong class="diff_field_deleted">accumsan. In velit nisi, lorem ipsum lorem gravida a, rutrum in augue.</strong>
 <strong class="diff_field_added">accumsan. In velit nisi, accumsan molestie gravida a, rutrum in augue.</strong>
 Nulla pharetra purus nec dolor ornare, ut aliquam odio placerat.
 Aenean ultrices condimentum quam vitae pharetra.
 </pre>"""
 
         self.assertEqual(show_diff(self.string1,
                                    self.string2,
                                    prefix="<pre>", suffix="</pre>",
                                    prefix_unchanged='',
                                    suffix_unchanged='',
                                    prefix_removed='<strong class="diff_field_deleted">',
                                    suffix_removed='</strong>',
                                    prefix_added='<strong class="diff_field_added">',
                                    suffix_added='</strong>'), expected_result)
 
 
 class TestALALC(InvenioTestCase):
     """Test for handling ALA-LC transliteration."""
 
-    if UNIDECODE_AVAILABLE:
-        def test_alalc(self):
-            msg = "眾鳥高飛盡"
-            encoded_text, encoding = guess_minimum_encoding(msg)
-            unicode_text = unicode(encoded_text.decode(encoding))
-            self.assertEqual("Zhong Niao Gao Fei Jin ",
-                             transliterate_ala_lc(unicode_text))
+    def test_alalc(self):
+        msg = "眾鳥高飛盡"
+        encoded_text, encoding = guess_minimum_encoding(msg)
+        unicode_text = unicode(encoded_text.decode(encoding))
+        self.assertEqual("Zhong Niao Gao Fei Jin ",
+                            transliterate_ala_lc(unicode_text))
 
 
 class LatexEscape(InvenioTestCase):
     """Test for escape latex function"""
 
     def test_escape_latex(self):
         unescaped = "this is unescaped latex & % $ # _ { } ~  \ ^ and some multi-byte chars: żółw mémêmëmè"
         escaped = escape_latex(unescaped)
         self.assertEqual(escaped,
                          "this is unescaped latex \\& \\% \\$ \\# \\_ \\{ \\} \\~{}  \\textbackslash{} \\^{} and some multi-byte chars: \xc5\xbc\xc3\xb3\xc5\x82w m\xc3\xa9m\xc3\xaam\xc3\xabm\xc3\xa8")
 
 
 TEST_SUITE = make_test_suite(WrapTextInABoxTest, GuessMinimumEncodingTest,
                              WashForXMLTest, WashForUTF8Test, DecodeToUnicodeTest,
                              Latex2UnicodeTest, TestStripping,
                              TestALALC, TestDiffering)
 
 
 if __name__ == "__main__":
     run_test_suite(TEST_SUITE)