diff --git a/modules/webhelp/web/hacking/coding-style.webdoc b/modules/webhelp/web/hacking/coding-style.webdoc
index fa39b089e..496692bbe 100644
--- a/modules/webhelp/web/hacking/coding-style.webdoc
+++ b/modules/webhelp/web/hacking/coding-style.webdoc
@@ -1,213 +1,212 @@
 ## -*- mode: html; coding: utf-8; -*-
 ## $Id$
 
 ## This file is part of CDS Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN.
 ##
 ## CDS Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## CDS Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 <!-- WebDoc-Page-Title: Coding Style -->
-<!-- WebDoc-Page-Navtrailo: <a class="navtrail" href="<WEBURL>/help/hacking">Hacking CDS Invenio</a> -->
-<!-- WebDoc-Page-Navbar-Select: hacking-coding-style -->
+<!-- WebDoc-Page-Navtrail: <a class="navtrail" href="<WEBURL>/help/hacking">Hacking CDS Invenio</a> -->
 <!-- WebDoc-Page-Revision: $Id$-->
 
 <pre>
 A brief description of things we strive at, more or less unsuccessfully.
 
 1. Packaging
 
    We use the classical GNU Autoconf/Automake approach, for tutorial
    see e.g. <a href="http://www.amath.washington.edu/~lf/tutorials/autoconf/tutorial_toc.html">Learning the GNU development tools</a> or the <a href="http://sources.redhat.com/autobook/autobook/autobook_toc.html">AutoBook</a>.
 
 2. Modules
 
    CDS Invenio started as a set of pretty independent modules developed by
    independent people with independent styles.  This was even more
    pronounced by the original use of many different languages
    (e.g. Python, PHP, Perl).  Now the CDS Invenio code base is striving to
    use Python everywhere, except in speed-critical parts when a
    compiled language such as Common Lisp may come to the rescue in the
    near future.
 
    When modifying an existing module, we propose to strictly continue
    using whatever coding style the module was originally written into.
    When writing new modules, we propose to stick to the
    below-mentioned standards.
 
    The code integration across modules is happening, but is slow.
    Therefore, don't be surprised to see that there is a lot of room to
    refactor.
 
 3. WML/ePerl/etc
 
    This is not so important, because not many lines-of-code were
    written in WML/ePerl.  We prefer to loosely follow the GNU way, as
    always.
 
 4. Python
 
    We aim at following recommendations from <a
    href="http://www.python.org/peps/pep-0008.html">PEP 8</a>, although
    the existing code surely do not fulfil them here and there.
    The code indentation is done via spaces only, please do not use
    tabs.  One tab counts as four spaces.  Emacs users can look into
    our <a href="cdsware.el">cdsware.el</a> for inspiration.
 
    All the Python code should be extensively documented via
    docstrings, so you can always run pydoc file.py to peruse the
    file's documentation in one simple go.
 
    Do not forget to run pylint on your code to check for errors like
    uninitialized variables and to improve its quality and conformance
    to the coding standard.  If you develop in Emacs, run M-x pylint
    RET on your buffers frequently.  Read and implement pylint
    suggestions.  (Note that using lambda and friends may lead to false
    pylint warnings.  You can switch them off by putting block comments
    of the form ``# pylint: disable-msg=C0301''.)
 
    Do not forget to run pychecker on your code either.  It is another
    source code checker that catches some situations better and some
    situations worse than pylint.  If you develop in Emacs, run C-c C-w
    (M-x py-pychecker-run RET) on your buffers frequently.  (Note that
    using psyco on classes may lead to false pychecker warnings.)
 
    You can check the kwalitee of your code by running ``python
    modules/miscutil/lib/kwalitee.py *.py'' on your files.  You can
    also check the code kwalitee across all the modules by running
    ``make kwalitee-check'' in the main source directory.
 
    Do not hardcode magic constants in your code.  Every magic string or
    a number should be put into accompanying file_config.py with
    symbol name beginning by cfg_modulename_*.
 
    Clearly separate interfaces from implementation.  Document your
    interfaces.  Do not expose to other modules anything that does not
    have to be exposed.  Apply principle of least information.
 
    Create as few new library files as possible.  Do not create many
    nested files in nested modules; rather put all the lib files in one
    dir with bibindex_foo and bibindex_bar names.
 
    Use imperative/functional paradigm rather then OO.  If you do use
    OO, then stick to as simple class hierarchy as possible.  Recall
    that method calls and exception handling in Python are quite
    expensive.
 
    Use rather the good old foo_bar naming convention for symbols (both
    variables and function names) instead of fooBar CaMelCaSe
    convention.  (Except for Class names where UppercaseSymbolNames are
    to be used.)
 
    Pay special attention to name your symbols descriptively.  Your
    code is going to be read and work with by others and its symbols
    should be self-understandable without any comments and without
    studying other parts of the code.  For example, use proper English
    words, not abbreviations that can be misspelled in many a way; use
    words that go in pair (e.g. create/destroy, start/stop; never
    create/stop); use self-understandable symbol names
    (e.g. list_of_file_extensions rather than list2); never misname
    symbols (e.g. score_list should hold the list of scores and nothing
    else - if in the course of development you change the semantics of
    what the symbol holds then change the symbol name too).  Do not be
    afraid to use long descriptive names; good editors such as Emacs
    can tab-complete symbols for you.
 
    When hacking module A, pay close attention to ressemble existing
    coding convention in A, even if it is legacy-weird and even if we
    use a different technique elsewhere.  (Unless the whole module A is
    going to be refactored, of course.)
 
    Speed-critical parts should be profiled with pyprof.  Do not forget
    to use tricks like psyco.
 
    The code should be well tested before committed.  Testing is an
    integral part of the development process.  Test along as you
    program.  The testing process should be automatized via our unit
    test and regression test suite infrastructures.  Please read the
    <a href="test-suite">test suite strategy</a> to know more.
 
    Python promotes writing clear, readable, easily maintainable code.
    Write it as such.  Recall Albert Einstein's ``Everything should be
    made as simple as possible, but not simpler''.  Things should be
    neither overengineered nor oversimplified.
 
    Recall principles Unix is built upon.  As summarized by Eric
    S. Reymond's <a href="http://www.catb.org/esr/writings/taoup/html/ch01s06.html#id2877537">TAOUP</a>:
 
       Rule of Modularity: Write simple parts connected by clean interfaces.
       Rule of Clarity: Clarity is better than cleverness.
       Rule of Composition: Design programs to be connected with other programs.
       Rule of Separation: Separate policy from mechanism; separate interfaces from engines.
       Rule of Simplicity: Design for simplicity; add complexity only where you must.
       Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do.
       Rule of Transparency: Design for visibility to make inspection and debugging easier.
       Rule of Robustness: Robustness is the child of transparency and simplicity.
       Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust.
       Rule of Least Surprise: In interface design, always do the least surprising thing.
       Rule of Silence: When a program has nothing surprising to say, it should say nothing.
       Rule of Repair: Repair what you can -- but when you must fail, fail noisily and as soon as possible.
       Rule of Economy: Programmer time is expensive; conserve it in preference to machine time.
       Rule of Generation: Avoid hand-hacking; write programs to write programs when you can.
       Rule of Optimization: Prototype before polishing. Get it working before you optimize it.
       Rule of Diversity: Distrust all claims for one true way.
       Rule of Extensibility: Design for the future, because it will be here sooner than you think.
 
    or the golden rule that says it all: ``keep it simple''.
 
    For more hints, thoughts, and other ruminations on programming,
    see my <a href="https://twiki.cern.ch/twiki/bin/view/CDS/Invenio">CDS Invenio Wiki</a>.
 
 5. PHP
 
    We are moving slowly away out of PHP so that there may be several
    practices in place with the PHP code present in CDS Invenio.  Usually
    this is consistent within modules but inconsistent across modules.
    For example, some old code used Emacs' perl-mode, following
    traditional K&R C style, while some other old code tried to stick
    to <a href="http://pear.php.net/manual/en/standards.php">PEAR recommendations</a>.
 
 6. MySQL
 
    Table naming policy is, roughly and briefly:
 
       - "foo": table names in lowercase, without prefix, used by me
          for WebSearch
 
       - "foo_bar": underscores represent M:N relationship between
         "foo" and "bar", to tie the two tables together
 
       - "bib*": many tables to hold the metadata and relationships
          between them
 
       - "idx*": idx is the table name prefix used by BibIndex
 
       - "rnk*": rnk is the table name prefix used by BibRank
 
       - "flx*": flx is the table name prefix used by FlexElink (also known as
          BibFormat)
 
       - "sbm*": sbm is the table name prefix used by WebSubmit
 
       - "sch*": sch is the table name prefix used by BibSched
 
       - "collection*": many tables to describe collections and search
         interface pages
 
       - "user*" : many tables to describe personal features (baskets,
         alerts)
 
 - end of file -
 
 </pre>
diff --git a/modules/webhelp/web/hacking/common-concepts.webdoc b/modules/webhelp/web/hacking/common-concepts.webdoc
index 65dcbfbe0..fc11bfddc 100644
--- a/modules/webhelp/web/hacking/common-concepts.webdoc
+++ b/modules/webhelp/web/hacking/common-concepts.webdoc
@@ -1,118 +1,116 @@
 ## $Id$
 
 ## This file is part of CDS Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN.
 ##
 ## CDS Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## CDS Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 <!-- WebDoc-Page-Title: Common Concepts -->
-<!-- WebDoc-Page-Navbar-Name: hacking-common-concepts -->
 <!-- WebDoc-Page-Navtrail: <a class="navtrail" href="<WEBURL>/help/hacking">Hacking CDS Invenio</a> -->
-<!-- WebDoc-Page-Navbar-Select: hacking-common-concepts> -->
 <!-- WebDoc-Page-Revision: $Id$-->
 
 <pre>
 The description of concepts you will encounter here and there in the
 CDS Invenio.  Our interpretation may differ from the practice found in
 other products, so please read this carefully.
 
 1. sysno - (ALEPH|old) system number
 
    Stands for (ALEPH|old) system number only.  Which means that, for
    outside-CERN CDS Invenio installations, stands for an 'old system
    number' whatever it is, if they want to publicise it instead of our
    internal auto-incremented CDS Invenio record identifiers.
 
 2. recID - CDS Invenio record identifier
 
    Each record has got an auto-incremented ID in the "bibrec" table
    (formerly called "bibitem").  This is the basic "record identifier"
    concept in CDS Invenio.
 
 3. docID - eventual fulltext document identifier
 
    Each fulltext file may have eventual docID.  This will permit us to
    interconnect records (recID) with fulltext files (docID), if we
    want to.  At the moment there is only one-way connection from recID
    to docID via HTTP field 856.  This is ugly.  I think we may
    probably profit by introducing recID-docID relationship in several
    ways: file protection, reference extraction, fulltext
    indexing... (?!)
 
 4. field - logical field concept such as "reportnumber"
 
    A bibliographic record is composed of 'fields' such as title or
    author.  Note that we consider 'field' to be a logical concept,
    that is compound and may consist of several physical MARC fields.
    For example, "report number" field consists of several MARC fields
    such as 088 $a, 037 $a, 909C0 $r.  Another example: "first report
    number" consist of only one MARC field, 037 $a.
 
 5. tag - physical field concept such as "088 $a".
 
    Having defined the concept of 'logical field', let's now turn to
    the 'physical field' that denotes basically the concept of 'MARC
    field' as defined in MARC-21 standard.  In addition to tag, a field
    may contain two identifiers to describe the data content, and
    subfield codes to denote various parts of the content.  See our
    HOWTO MARC guide on this.
 
    Thus said, in the implementation of our bibliographic tables
    (bibXXx) we have sort of generalized the term 'tag' to stand for:
 
       tag = tag code + identifier1 + identifier1 + subfield code
 
    This convention, while taking some freedom from the MARC-21
    standard, enables us to write things like "field: base number, tag:
    909C0b, value: 11".  If this interpretation is indeed too free with
    respect to the standard usage of terms, we may change them in the
    future.
 
 6. collection - here we distinguish (i) primary collection concept
                 and (ii) specific collection concept.
 
    The (i) primary collections are basic organizational structure of
    how the records are grouped together in collections.  The primary
    collections are used in the navigable search interface under the
    ``Narrow search'' box.  The (ii) specific collections present an
    orthogonal view on the data organization, that is useful to group
    together some records from different primary collections, if they
    present a common pattern.  The specific collections are used in the
    search interface under the ``Focus on'' box.
 
    The primary collections are defined mainly by the collection
    identifier ("980 $a,b"); and the specific collections are as
    defined by any query that is possible for a search engine to
    execute (see also "dbquery" column in the "collection" table).
 
    In the past we used to use the term "catalogue", that is now
    deprecated, and that can be interchanged with "collection".
 
 7. doctype - stands for web document type concept, used in WebSubmit
 
    The "document type" is used solely for submission purposes, and
    fulltext access purposes ("setlink"-like).  For example, a document
    type "photo" may be used in many collections such as "Foo Photos",
    "Bar PhotoLab", etc.  Similarly, one collection can cover several
    doctypes.  (M:N relationship)
 
 8. baskets, alerts, settings - covering personal features
 
    Denote personal features, for which we previously used the terms
    "shelf" and "profile" that are now deprecated.
 
 - end of file -
 
 </pre>
diff --git a/modules/webhelp/web/hacking/directory-organization.webdoc b/modules/webhelp/web/hacking/directory-organization.webdoc
index ccf238415..89d06b5e1 100644
--- a/modules/webhelp/web/hacking/directory-organization.webdoc
+++ b/modules/webhelp/web/hacking/directory-organization.webdoc
@@ -1,201 +1,199 @@
 ## $Id$
 
 ## This file is part of CDS Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN.
 ##
 ## CDS Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## CDS Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 <!-- WebDoc-Page-Title: Directory Organization -->
-<!-- WebDoc-Page-Navbar-Name: hacking-directory-organization -->
 <!-- WebDoc-Page-Navtrail: <a class="navtrail" href="<WEBURL>/help/hacking">Hacking CDS Invenio</a> -->
-<!-- WebDoc-Page-Navbar-Select: hacking-directory-organization -->
 <!-- WebDoc-Page-Revision: $Id$-->
 
 <pre>
 Please find some notes below on how the source (as well as the target)
 directory structure is organized, where the sources get installed to,
 and how the visible URLs are organized.
 
 1. CDS Invenio will generally install into the directory taken from
    --with-prefix configuration variables.  These are discussed in points
    2 and 3 below, respectively.
 
 2. The first directory (--with-prefix) specifies general CDS Invenio
    install directory, where we'll put CLI binaries, Python and PHP
    libraries, manpages, log and cache directories for the running
    installation, and any other dirs as needed.  They will all live
    under one common hood.
 
    For example, configure --with-prefix=/opt/cds-invenio,
    and you'll obtain the following principal directories:
 
       /opt/cds-invenio/
       /opt/cds-invenio/bin
       /opt/cds-invenio/lib
       /opt/cds-invenio/lib/php
       /opt/cds-invenio/lib/python
       /opt/cds-invenio/lib/wml
       /opt/cds-invenio/var
       /opt/cds-invenio/var/cache
       /opt/cds-invenio/var/log
 
    with the obvious meaning:
 
       - bin : for command-line executable binaries and scripts
 
       - lib/php : for our own PHP libraries, see below
 
       - lib/python : for our own Python libraries, see below
 
       - lib/wml : for our own WML libraries, see below
 
       - var : for installation-specific runtime stuff
 
       - var/log : for all sorts of runtime logging, e.g. search.log
 
       - var/cache : for all sorts of runtime caching, e.g. OAI
                     retention harvesting, collection cache, etc
 
    This scheme copies to some extent the usual Unix filesystem
    convention, so it may be easily expanded later according to our
    future needs.
 
 3. The second directory (prefix/var/www) contains Web scripts (PHP,
    mod_python), HTML documents and images, and so on.  This is where
    webuser-seen files are located.  Basically, the files there contain
    only the interface to the functionality that is provided by the
    libraries stored under the library directory.
 
    The prefix/var/www directory is further structured according to
    whom it provides services.  We distinguish user-level, admin-level
    and hacker-level access to the site, as reflected by the visible
    URL structure.
 
      a) The user-level access point is provided by the main WEBURL
         address and its subdirs.  All the user-level documentation is
         available under WEBURL/help/.  The module-specific user-level
         documentation is available under WEBURL/help/&lt;module&gt;/.
 
      b) The admin-level access is provided by WEBURL/admin/ entry
         point.  The admin-level documentation is accessible from the
         same place.  The admin-level module-specific functionality and
         help is available under WEBURL/admin/&lt;module&gt;/.  (If
         it's written in mod_python, it usually points to
         WEBURL/&lt;module&gt;admin.py/ since we configure the server
         to have all mod_python scripts under the prefix/var/www root
         directory.)
 
      c) The hacker-level documentation is provided by WEBURL/hacking/
         entry point.  There is no hacker-level functionality possible
         via Web, of course, so that unlike admin-level entry point,
         the hacker-level entry point provides only a common access to
         available hacking documention, etc.  The module-specific
         information is available under WEBURL/hacking/&lt;module&gt;/.
 
 4. Let's now return a bit more closely to the role Python and PHP
    library directories outside of the Apache tree:
 
       /opt/cds-invenio/lib/php
       /opt/cds-invenio/lib/python
 
    Here we put not only (a) libraries that may be reused across CDS
    Invenio modules, but also (b) all the "core" functionality of CDS
    Invenio that is not directly callable by the end users.  The
    "callable" functionality is put under "prefix/var/www" in case of
    web scripts and documents, and under "bindir" in case of CLI
    executables.
 
    As for (a), for example in the PHP CDS Invenio library you'll find
    currently the common PHP error handling code that is shared between
    BibFormat and WebSubmit; in the Python CDS Invenio library (in fact,
    CDS Invenio Pythonic 'module', but we are reserving the word 'module'
    to denote 'CDS Invenio module' in this text) you'll find config.py
    containing WML-supplied site parameters, dbquery.py containing DB
    persistent query module, or webpage.py with templates and functions
    to produce mod_python web pages with common look and feel.  These
    could and should be reused across all our modules.  Note that I
    created only a small number of "broad" libraries at the moment.  In
    case we want to reuse more code parts, we'd refactor the code more,
    as needed.
 
    As for (b), for example the existing search engine was split into
    search.py that only contains three "callable" functions, which goes
    into prefix/var/www, while the search engine itself is composed of
    search_engine.py and search_engine_config.py living under LIBDIR.
    In this way we can easily create "real" CLI search, that will
    depend only on the search libraries in LIBDIR, and that will get
    installed into BINDIR.
 
    To recap:
 
       - For each CDS Invenio module, I'm differentiating between
         "callable" and "core" parts.  The former go into
         prefix/var/www or BINDIR, the latter into LIBDIR.
 
       - Our PHP/Pythonic libraries contain several sorts of thing:
 
           - the implementation of the "callable" functions
 
           - non-callable internal "core" or "library" code parts, as
             stated above.  Not shared across CDS Invenio modules.
 
           - utility code meant for reuse across CDS Invenio modules, such
             as dbquery.py
 
           - Pythonic config files out of user-supplied WML (non-MySQL)
             configuration parameters (see
             e.g. search_engine_config.py)
 
 5. The same strategy is reflected in the organization of source
    directories inside CDS Invenio CVS.  Each CDS Invenio module lives in a
    separate directory located under "modules" directory of the
    sources.  Further on, each module contains usually several
    subdirectories that reflect the above-mentioned packaging choice.
    For example, in case of WebSearch you'll find:
 
       ./modules/websearch
       ./modules/websearch/bin
       ./modules/websearch/doc
       ./modules/websearch/doc/hacking
       ./modules/websearch/doc/admin
       ./modules/websearch/lib
       ./modules/websearch/web
       ./modules/websearch/web/admin
 
    with the following straightforward meaning:
 
       - bin : for callable CLI binaries and scripts
 
       - doc : for documentation.  The user-level documentation is
               located in this directory.  The admin-level
               documentation is located in the "admin" subdir.  The
               programmer-level documentation is located in the
               "hacking" subdir.
 
       - lib : for uncallable "core" functionality, see the comments
               above
 
       - web : for callable web scripts and pages.  The user- and
               admin- level is separated similarly as in the "doc"
               directory (see above).
 
    The structure is respected throughout all the CDS Invenio modules, a
    notable exception being the MiscUtil module that contains subdirs
    like "sql" (for the table creating/dropping SQL commands, etc) or
    "demo" (for creation of Atlantis Institute of Science, our demo
    site.)
 
 - end of file -
 </pre>
diff --git a/modules/webhelp/web/hacking/modules-overview.webdoc b/modules/webhelp/web/hacking/modules-overview.webdoc
index 0b160ff90..64f9fd671 100644
--- a/modules/webhelp/web/hacking/modules-overview.webdoc
+++ b/modules/webhelp/web/hacking/modules-overview.webdoc
@@ -1,274 +1,273 @@
 ## $Id$
 
 ## This file is part of CDS Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN.
 ##
 ## CDS Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## CDS Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 <!-- WebDoc-Page-Title: Modules Overview -->
-<!-- WebDoc-Page-Navbar-Name: hacking-modules -->
 <!-- WebDoc-Page-Navtrail: <a class="navtrail" href="<WEBURL>/help/hacking">Hacking CDS Invenio</a> -->
-<!-- WebDoc-Page-Navbar-Select: hacking-modules -->
+<!-- WebDoc-Page-Revision: $Id$-->
 
 
 <p>CDS Invenio consists of several more or less independent modules with
 precisely defined functionality.  The general criterion for module
 names is to use the ``Bib'' prefix to denote modules that work more
 with the bibliographic data, and the ``Web'' prefix to denote modules
 that work more with the Web interface.  (The difference is of course
 blurred in some cases, as in the case of search engine that has got a
 web interface but searches bibliographic data.)
 </p>
 
 <p>Follows a brief description of what each module does.  After
 descriptions the module relationship diagram is presented.
 </p>
 
 <ul>
 
  <li><strong>BibCheck</strong> permits administrators and library
  cataloguers to automate various kind of tests on the metadata to see
  whether the metadata comply with quality standards.  For example,
  that certain metadata fields are of a certain length, that they have
  numeric content, that they must not be present when other field
  exists, that their content is governed by an authority base depending
  on values of other fields, etc.  The module can report its findings
  or can even automatically correct some kind of errors.  (FIXME: not
  included in CVS yet.)
  </li>
 
  <li><strong>BibClassify</strong> allows automatic extraction of
  keywords from fulltext documents, based on the frequency of specific
  terms, taken from a controlled vocabulary. Controlled vocabularies
  can be expressed as simple text thesauri or as structured, RDF-compliant,
  taxonomies, to allow a semantic classification.
  </li>
 
  <li><strong>BibConvert</strong> allows metadata conversion from any
  structured or semi-structured proprietary format into any other
  format, typically the <a
  href="http://www.loc.gov/standards/marcxml/">MARC XML</a> that is
  natively used in CDS Invenio. Nevertheless the input and output formats
  are fully configurable and have been tested on data importations from
  more than one hundred data sources. The power of this utility lies in
  the fact that no structural attributes of data source are presumed,
  but they are defined in an extensive data source
  configuration. Inevitably, this leads to a high complexity of the
  BibConvert configuration language. Most frequent configurations are
  provided with the CDS Invenio distribution, such as a sample
  configuration from Qualified Dublin Core into the MARCXML.
 
  <br />In general the BibConvert configuration consists from the source
  data descriptions and target data descriptions. The processor then
  analyzes and parses the input data and creates the resulting data
  structure, similarly as the XSLT processor would do. Typically the
  BibConvert is aimed at usage for input data that do not dispose of an
  XML representation. The source data is required to be structured or
  semi-structured, (i.e. not expressed in natural language that is a
  subject of information extraction task) and its processing involves
  several steps including record separation and field extraction upto
  transformation of source field values and their formatting.
 
  <li><strong>BibEdit</strong> permits one to edit the metadata via a
  Web interface.
  </li>
 
  <li><strong>BibFormat</strong> is in charge of formatting the
  bibliographic metadata in numerous ways.  This truly enables the
  separation of data content administration and formatting layout.
  BibFormat can act in the background and format the records when
  needed, or can preformat records for some often used outputs, such as the
  brief format used when displaying search results.<br/>
  The BibFormat settings can be administered either through a
  user-friendly web interface, or directly by editing human-readable
  configuration files.
  </li>
 
  <li><strong>BibHarvest</strong> represents the <a
  href="http://www.openarchives.org/">OAi-PMH</a> compatible harvester
  allowing the repository to gather metadata from fellow OAi-compliant
  repositories and the OAi-PMH repository management. Repository is
  built directly on top of the database and disposes of an OAi
  repository manager that allows to perform the administrative tasks on
  the repository aside from the principal generic data administration
  module. The database can be partially or completely open for
  harvesting in the scope of the OAi-PMH protocol. In this case, all
  data is provided in raw form, where the semantics of individual tags
  is indicated uniquely by the MARC21 naming convention. This is
  particularly interesting for institutes that are specialized in
  cross-archive and cross-disciplinary services provision, as for
  example the ARC service provider.
  </li>
 
  <li><strong>BibIndex</strong> module takes care of the indexation of
  metadata, references and full text files.  Two kinds of indexes --
  word and phrase index -- are being maintained.  The user can define
  several logical indexes (e.g. author index, title index, etc.) and
  the correspondence of which physical MARC21 metadata tag goes into
  which logical field index.  An index consists of two parts: (i) a
  forward index listing various words (or phrases) found in the given
  field, with the set of record identifiers where the given word can be
  found; and (ii) a reverse index listing record identifiers, with the
  set of words of the given record that go to the forward index.  Such
  a two-part indexing technique allows one to rapidly update only those
  words that have changed in the input metadata record.  The indexes
  were designed with the aim to provide fast user-response search times
  and are faster than native MySQL (full text) indexes.
  </li>
 
  <li><strong>BibMatch</strong> permits to filter input XML files
  against the database content, attempting to match records via certain
  criteria, for example to avoid doubly-inputted records.
  </li>
 
  <li><strong>BibRank</strong> permits to set up various ranking
  criteria that will be used later by the search engine.  For example,
  ranking by the word frequency, or by some metadata tag value such as
  journal name by means of the journal impact factor knowledge base, or
  even by the number of downloads of a particular paper.  Note that
  BibRank is independent of BibIndex.
  </li>
 
  <li><strong>BibSched</strong> The bibliographic task scheduler is
  central unit of the system that allows all other modules to access
  the bibliographic database in a controlled manner, preventing sharing
  violation threats and assuring the coherent execution of the database
  update tasks. The module comes with an administrative interface that
  allows to monitor the task queue including various possibilities of a
  manual intervention, for example to re-schedule queued tasks, change
  the task order, etc.
  </li>
 
  <li><strong>BibUpload</strong> allows to load the new bibliographic
  data into the database. To effectuate this task the data must be a
  well-formed XML file that complies with the current metadata tag
  selection schema. Usually, the properly structured input files of
  BibUpload come from the BibConvert utility.
  </li>
 
  <li><strong>ElmSubmit</strong> is an email submission gateway that
  permits for automatic document uploads from trusted sources via
  email.  (Usually web submission or harvesting is preferred.)
  </li>
 
  <li><strong>MiscUtil</strong> is a collection of miscellaneous
  utilities that other modules are using, like the international
  messages, etc.
  </li>
 
  <li><strong>WebAccess</strong> module is responsible for granting
  access to users for performing various actions within the system.  A
  Role-Based Access Control (RBAC) technique is used, where users
  belong to several groups according to their role in the system.  Each
  user group can be granted to perform certain actions depending on
  possible one more action arguments.  WebAccess is presently used
  mainly for the administrative interface.  There are basically two
  kinds of actions: (i) configuration of administrative modules and
  (ii) running administrative tasks.
  </li>
 
  <li><strong>WebAlert</strong> module allows the end user to be
  alerted whenever a new document matching her personal criteria is
  inserted into the database.  The criteria correspond to a typical
  user query as if it would be done via the search interface.  For
  example, a user may want to get notified whenever a new document
  containing certain words, or of a certain subject, is inserted.  A
  user may create several alerts with a daily, weekly, or a monthly
  frequency.  The results of alert searches are either sent back to the
  user by email or can also be stored into her baskets.
  </li>
 
  <li><strong>WebBasket</strong> module enables the end user of the
  system to store the documents she is interested in in a personal
  basket or a personal shelf.  The concept is similar to popular
  shopping carts.  One user may own several baskets.  A basket can be
  either private or public, allowing a simple document sharing
  mechanism within a group.
  </li>
 
  <li><strong>WebComment</strong> provides a community-oriented tool to
  rank documents by the readers or to share comments on the documents
  by the readers.  Integrated with the group-aware WebBasket, WebGroup,
  WebMessage tools, WebComment is at the heart of the social network
  features of the CDS Invenio software.
  </li>
 
  <li><strong>WebHelp</strong> presents some global user-level,
  admin-level, and hacker-level documenation on CDS Invenio.  The
  module-specific documentation is included within each particular
  module.
  </li>
 
  <li><strong>WebMessage</strong> permits the communication between
  (possibly anonymous) end users via web message boards, to invite
  readers to join the groups, etc.
  </li>
 
  <li><strong>WebSearch</strong> module handles user requests to search
  for a certain words or phrases in the database.  Two types of
  searching can be performed: a word search or a phrase search.  The
  system allows for complex boolean queries, regular expression
  searching, or a combined metadata, references and full text file
  searching in one go.  Users have a possibility to browse for present
  index terms.  If no direct match could have been found with the
  user-typed query pattern, the system proposes alternative matches as
  a search guidance.  The search indexes were designed to provide fast
  response times for middle-sized data collections of up to 106
  records. 
 
  <br />The metadata corpus is organized into metadata collections that
  are directly accessible through the browse function, similarly to the
  popular concept of Web Directories.  Orthogonal views on the document
  corpus are enabled in the search interface via a concept of virtual
  collections: for example, a document may be classified both according
  to its type (e.g. preprint, book) and according to its Dewey decimal
  classification number.  Such a flexible organization views allows for
  the creation of easy navigation schemata to the end users.
  </li>
 
  <li><strong>WebSession</strong> is a session and user management
  module that permits to differentiate between users.  Useful for
  personalization of the interface and services like personal baskets
  and alerts.
  </li>
 
  <li><strong>WebStat</strong> is a configurable system that permits to
  gather statistics about the health of the server, the usage of the
  system, as well as about some particular system features.
  </li>
 
  <li><strong>WebStyle</strong> is a library of design-related modules
  that defines look and feel of CDS Invenio pages.
  </li>
 
  <li><strong>WebSubmit</strong> is a comprehensive submission system
  allowing authorized individuals (authors, secretaries and repository
  maintenance staff) to submit individual documents into the
  system. The submission system disposes of a flow-control mechanism
  that assures the data approval by authorized units. In total there
  are several different exploitable submission schemas at a disposal,
  including an automated full text document conversion from various
  textual and image formats. This module also disposes of information
  extraction functionality, focusing on bibliographic entities such as
  references, authors, keywords or other implicit metadata.
  </li>
 
 </ul>
 
 <p>Relationship between the modules: <br />
 <img src="<WEBURL>/img/hacking/modules-overview-diagram.jpeg" border="0" alt="Modules overview diagram" />
 </p>
 
 
diff --git a/modules/webhelp/web/hacking/release-numbering.webdoc b/modules/webhelp/web/hacking/release-numbering.webdoc
index ff43ca3a4..880c29d54 100644
--- a/modules/webhelp/web/hacking/release-numbering.webdoc
+++ b/modules/webhelp/web/hacking/release-numbering.webdoc
@@ -1,102 +1,100 @@
 ## $Id$
 
 ## This file is part of CDS Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN.
 ##
 ## CDS Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## CDS Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 <!-- WebDoc-Page-Title: Release Numbering Scheme -->
-<!-- WebDoc-Page-Navbar-Name: hacking-release-version-numbering-scheme -->
 <!-- WebDoc-Page-Navtrail: <a class="navtrail" href="<WEBURL>/help/hacking">Hacking CDS Invenio</a> -->
-<!-- WebDoc-Page-Navbar-Select: hacking-release-version-numbering-scheme -->
 <!-- WebDoc-Page-Revision: $Id$-->
 
 <pre>
 CDS Invenio uses the classical major.minor.patchlevel release version
 numbering scheme that is commonly used in the GNU/Linux world and
 elsewhere.  Each release is labelled by
 
      major.minor.patchlevel
 
 release version number.  For example, a release version 4.0.1 means:
 
        4 - 4th major version, i.e. the whole system has been already
            4th times either fully rewritten or at least in its very
            essential components.  The upgrade from one major version
            to another may be rather hard, may require new prerequisite
            technologies, full data dump, reload and reindexing, as
            well as other major configuration adapatations, possibly
            with an important manual intervention.
 
        0 - 0th minor version, i.e. the first minor release of the 4th
            major rewrite.  (Increments go 4.1, 4.2, ... 4.9, 4.10,
            4.11, 4.12, ... until some important rewrite is done,
            e.g. the database philosophy dramatically changes, leading
            to a non-trivial upgrade, and we have 5.0.)  The upgrade
            from one minor version to another may be laborious but is
            relatively painless, in that some table changes and data
            manipulations may be necessary but they are somewhat
            smaller in nature, easier to grasp, and possibly done by an
            automated script.
 
        1 - 1st patch level to 4.0, fixing bugs in 4.0.0 but not adding
            any substantially new functionality.  That is, the only new
            functionality that is added is that of a `bug fix' nature.
            The upgrade from one patch level to another is usually
            straightforward.
 
            (Packages often seem to break this last rule, e.g. Linux
            kernel adopting new important functionality (such as
            ReiserFS) within the stable 2.4.x branch.  It can be easily
            seen that it is somewhat subjective to judge what is
            qualitatively more like a minor new functionality and what
            is more like a patch to the existing behaviour.  We have
            tried to quantify these notions with respect to whether
            table structure and/or technology change require small or
            large upgrade jobs and eventual manual efforts.)
 
 So, if we have a version 4.3, a bug fix would mean to release 4.3.1,
 some minor new functionality and upgrade would mean to release 4.4,
 some important database structure rewrite or an imaginary exchange of
 Python for Common Lisp would mean to release 5.0, etc.
 
 In addition, the two-branch release policy is adopted:
 
   a) stable branch - releases in the stable branch are numbered with
      even minor version number, like 0.2, 0.4, etc.  These releases
      are usually well tested.  The configuration files and features
      usually don't change often from one release to another.  The
      release frequency is low.
 
   b) development branch - releases in the development branch are
      number with the odd minor version number, like 0.1, 0.3, etc.
      These releases are more experimental and may be less tested than
      the stable ones.  The configuration files and features change
      more rapidly from one release to another.  The release frequency
      is higher.
 
 It can be seen that the above scheme is somewhat similar to the Linux
 kernel version numbering scheme.
 
 Currently, CDS Invenio 0.0.9 represents the stable branch release and
 0.1.0 the development branch release.  We are going to frequently
 update it to provide 0.1.1, 0.1.2, etc as the currently missing admin
 functionality is being added into the development branch, until later
 on, when some release, say 0.1.8, will achieve a status of
 satisfaction, at which point we release it as the next stable version
 (0.2 or 1.0), and start a new development branch (0.3 or 1.1).
 
 - end of file -
 </pre>
diff --git a/modules/webhelp/web/hacking/test-suite.webdoc b/modules/webhelp/web/hacking/test-suite.webdoc
index 4d9d7abae..73e0d7b3b 100644
--- a/modules/webhelp/web/hacking/test-suite.webdoc
+++ b/modules/webhelp/web/hacking/test-suite.webdoc
@@ -1,521 +1,519 @@
 ## $Id$
 
 ## This file is part of CDS Invenio.
 ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN.
 ##
 ## CDS Invenio is free software; you can redistribute it and/or
 ## modify it under the terms of the GNU General Public License as
 ## published by the Free Software Foundation; either version 2 of the
 ## License, or (at your option) any later version.
 ##
 ## CDS Invenio is distributed in the hope that it will be useful, but
 ## WITHOUT ANY WARRANTY; without even the implied warranty of
 ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 ## General Public License for more details.
 ##
 ## You should have received a copy of the GNU General Public License
 ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
 ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 
 <!-- WebDoc-Page-Title: Test Suite Strategy  -->
-<!-- WebDoc-Page-Navbar-Name: hacking-test-suite -->
 <!-- WebDoc-Page-Navtrail: <a class="navtrail" href="<WEBURL>/help/hacking">Hacking CDS Invenio</a>  -->
-<!-- WebDoc-Page-Navbar-Select: hacking-test-suite-strategy  -->
 <!-- WebDoc-Page-Revision: $Id$-->
 
 <h2>Contents</h2>
 <strong>1. <a href="#1">General considerations</a></strong><br />
 <strong>2. <a href="#2">Unit testing</a></strong><br />
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.1 <a href="#2.1">Unit testing philosophy</a><br />
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.2 <a href="#2.2">Writing unit tests</a><br />
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.3 <a href="#2.3">Running unit tests</a><br />
 <strong>3. <a href="#3">Regression testing</a></strong><br />
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.1 <a href="#3.1">Regression testing philosophy</a><br />
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.2 <a href="#3.2">Writing regression tests</a><br />
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.3 <a href="#3.3">Running regression tests</a><br />
 <strong>4. <a href="#4">Conclusions</a></strong><br />
 <strong>5. <a href="#5">Additional information</a></strong><br />
 
 <a name="1"></a><h2>1. General considerations</h2>
 
 <p>This documents presents guidelines for unit testing and regression
 testing homogenisation throughout all CDS Invenio modules.
 
 <p>Testing is an important coding activity.  Most authors believe that
 writing test cases should take between 10% and 30% of the project
 time.  But, even with such a large fraction, don't put too much belief
 on such a testing.  It cannot find bugs that aren't tested for.  So,
 while testing is an important activity inherent to safe software
 development practices, it cannot become a substitute for pro-active
 bug hunting, source code inspection, and bugfree-driven development
 approach from the start.
 
 <p>Testing should happen alongside with coding.  If you write a
 function, immediately load it into your toplevel, evaluate its
 definition, and call it for a couple of arguments to make sure the
 function works as expected.  If not, then change the function
 definition, re-evaluate it, re-call it, etc.  Dynamic languages with
 interactive toplevel such as Common Lisp or Python makes this easy for
 you.  Dynamic redefinition capabilities (full in Common Lisp, partial
 in Python) are very programmer-friendly in this respect.  If your test
 cases are interesting to be kept, then keep them in a test file.
 (It's almost all the time a good idea to store them in the test file,
 since you cannot predict whether you won't want to change something in
 the future.)  We'll see below how to store your tests in a test file.
 
 <p>When testing, it is nice to know some rules of thumb, like: check
 your edge cases (e.g. null array), check atypical input values
 (e.g. laaarge array instead of typically 5-6 elements only), check
 your termination conditions, ask whether your arguments have already
 been safe-proofed or whether it is in your mandate to check them,
 write a test case for each `if-else' branch of the code to explore all
 the possibilites, etc.  Another interesting rule of thumb is the bug
 frequency distribution.  Experience has shown that the bugs tend to
 cluster.  If you discover a bug, there are chances that other bugs are
 in the neighborhood.  The famous 80/20 rule of thumb applies here too:
 about 80% of bugs are located in about 20% of the code.  Another rule
 of thumb: if you find a bug caused by some coding practice pattern
 thay may be used elsewhere too, look and fix other pattern instances.
 </p>
 
 <p>In a nutshell, the best advice to write bug-free code is: <em>think
 ahead</em>.  Try to prepare in advance for unusual usage scenarios, to
 foresee problems before they happen.  Don't rely on typical input and
 typical usage scenarios.  Things have a tendency to become atypical
 one day.  Recall that testing is necessary, but not sufficient, to
 write good code.  Therefore, think ahead!
 </p>
 
 <a name="2"></a><h2>2. Unit testing</h2>
 
 <a name="2.1"></a><h3>2.1 Unit testing philosophy</h3>
 
 <p>Core functionality, such as the hit set intersection for the search
 engine, or the text input manipulating functions of the BibConvert
 language, should come with a couple of test cases to assure proper
 behaviour of the core functionality.  The test cases should cover
 typical input (e.g. hit set corresponding to the query for ``ellis''),
 as well as the edge cases (e.g. empty/full hit set) and other unusual
 situations (e.g. non-UTF-8 accented input for BibConvert functions to
 test a situation of different number of bytes per char).
 </p>
 
 <p>The test cases should be written for most important core
 functionality.  Not every function or class in the code is to be
 throughly tested.  Common sense will tell.
 </p>
 
 <p>Unit test cases are free of side-effects.  Users should be able to
 run them on production database without any harm to their data.  This
 is because the tests test ``units'' of the code, not the application
 as such.  If the behaviour of the function you would like to test
 depends on the status of the database, or some other parameters that
 cannot be passed to the function itself, the unit testing framework is
 not suitable for this kind of situation and you should use the
 regression testing framework instead (see below).
 </p>
 
 <p>For more information on Pythonic unit testing, see the
 documentation to the unittest module at <a
 href="http://docs.python.org/lib/module-unittest.html">http://docs.python.org/lib/module-unittest.html</a>.
 For a tutorial, see for example <a
 href="http://diveintopython.org/unit_testing/">http://diveintopython.org/unit_testing/</a>.
 </p>
 
 <a name="2.2"></a><h3>2.2 Writing unit tests</h3>
 
 <p>Each core file that is located in the lib directory (such as the
 <code>webbasketlib.py</code> in the example above) should come with a
 testing file where the test cases are stored.  The test file is to be
 named identically as the lib file it tests, but with the suffix
 <code>_tests</code> (in our example,
 <code>webbasketlib_tests.py</code>).
 </p>
 
 <p>The test cases are written using Pythonic unittest TestCase class.
 An example for testing search engine query parameter washing function:
 <blockquote>
 <pre>
 $ cat /opt/cds-invenio/lib/python/invenio/search_engine_tests.py
 [...]
 import search_engine
 import unittest
 
 class TestWashQueryParameters(unittest.TestCase):
     """Test for washing of search query parameters."""
 
     def test_wash_url_argument(self):
         """search engine washing of URL arguments"""
         self.assertEqual(1, search_engine.wash_url_argument(['1'],'int'))
         self.assertEqual("1", search_engine.wash_url_argument(['1'],'str'))
         self.assertEqual(['1'], search_engine.wash_url_argument(['1'],'list'))
         self.assertEqual(0, search_engine.wash_url_argument('ellis','int'))
         self.assertEqual("ellis", search_engine.wash_url_argument('ellis','str'))
         self.assertEqual(["ellis"], search_engine.wash_url_argument('ellis','list'))
         self.assertEqual(0, search_engine.wash_url_argument(['ellis'],'int'))
         self.assertEqual("ellis", search_engine.wash_url_argument(['ellis'],'str'))
         self.assertEqual(["ellis"], search_engine.wash_url_argument(['ellis'],'list'))
 [...]
 </pre>
 </blockquote>
 </p>
 
 <p>In addition, each test file is supposed to define a
 <code>create_test_suite()</code> function that will return test suite
 with all the tests available in this file:
 
 <blockquote>
 <pre>
 $ cat /opt/cds-invenio/lib/python/invenio/search_engine_tests.py
 [...]
 def create_test_suite():
     """Return test suite for the search engine."""
     return unittest.TestSuite((unittest.makeSuite(TestWashQueryParameters,'test'),
                                unittest.makeSuite(TestStripAccents,'test')))
 [...]
 </pre>
 </blockquote>
 </p>
 
 <p>This will enable us to later include this file into
 <code>testsuite</code> executable:
 
 <blockquote>
 <pre>
 $ cat ~/src/cds-invenio/modules/miscutil/bin/testsuite.in
 [...]
 from invenio import search_engine_tests
     from invenio import bibindex_engine_tests
 
 def create_all_test_suites():
     """Return all tests suites for all CDS Invenio modules."""
     return unittest.TestSuite((search_engine_tests.create_test_suite(),
                                bibindex_engine_tests.create_test_suite()))
 [...]
 </pre>
 </blockquote>
 </p>
 
 <p>In this way, all the test cases defined in the file
 <code>search_engine_tests.py</code> will be executed when the global
 <code>testcase</code> executable is called.
 
 <p>Note that it may be time-consuming to run all the tests in one go.
 If you are interested in running tests only on a certain file (say
 <code>search_engine_tests.py</code>), then launch:
 
 <blockquote>
 <pre>
 $ python /opt/cds-invenio/lib/python/invenio/search_engine_tests.py
 </pre>
 </blockquote>
 </p>
 
 <p>For full-scale examples, you may follow
 <code>search_engine_tests.py<code> and other <code>_tests.py</code>
 files in the source distribution.
 </p>
 
 <a name="2.3"></a><h3>2.3 Running unit tests</h3>
 
 <p>CDS Invenio test suite can be run in the source directory:
 
 <blockquote>
 <pre>
 $ make test
 </pre>
 </blockquote>
 
 or anytime after the installation:
 
 <blockquote>
 <pre>
 $ /opt/cds-invenio/bin/testsuite
 </pre>
 </blockquote>
 
 The ``testsuite'' executable will run all available unit tests
 provided with CDS Invenio.
 </p>
 
 <p>The informative output is of the form:
 
 <blockquote>
 <pre>
 $ make test
 CDS Invenio v0.3.2.20040519 test suite results:
 ===========================================
 search engine washing of query patterns ... ok
 search engine washing of URL arguments ... ok
 search engine stripping of accented letters ... ok
 bibindex engine list union ... ok
 
 ----------------------------------------------------------------------
 Ran 4 tests in 0.121s
 
 OK
 </pre>
 </blockquote>
 
 In case of problems you will see failures like:
 
 <blockquote>
 <pre>
 CDS Invenio v0.3.2.20040519 test suite results:
 ===========================================
 search engine washing of query patterns ... FAIL
 search engine washing of URL arguments ... ok
 search engine stripping of accented letters ... ok
 bibindex engine list union ... ok
 
 ======================================================================
 FAIL: search engine washing of query patterns
 ----------------------------------------------------------------------
 Traceback (most recent call last):
   File "/opt/cds-invenio/lib/python/invenio/search_engine_tests.py", line 25, in test_wash_pattern
     self.assertEqual("ell*", search_engine.wash_pattern('ell*'))
   File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual
     raise self.failureException, \
 AssertionError: 'ell*' != 'ell'
 
 ----------------------------------------------------------------------
 Ran 4 tests in 0.091s
 
 FAILED (failures=1)
 </pre>
 </blockquote>
 </p>
 
 <p>The test suite compliance should be checked before each CVS commit.
 (And, obviously, double-checked before each CDS Invenio release.)
 </p>
 
 <a name="3"></a><h2>3. Regression testing</h2>
 
 <a name="3.1"></a><h3>3.1 Regression testing philosophy</h3>
 
 <p>In addition to the above-mentioned unit testing of important
 functions, a regression testing should ensure that the overall
 application functionality is behaving well and is not altered by code
 changes.  This is especially important if a bug had been previously
 found.  Then a regression test case should be written to assure that
 it will never reappear.  (It also helps to scan the neighborhood of
 the bug, or the whole codebase for occurrences of the same kind of
 bug, see the 80/20 thumb rule cited above.)
 </p>
 
 <p>Moreover, the regression test suite should be used when the
 functionality of the item we would like to test depends on
 extra-parametrical status, such as the database content.  Also, the
 regression framework is suitable for testing the web pages overall
 behaviour.  (In extreme programming, the regression testing is called
 <em>acceptance testing</em>, the name that evolved from previous
 <em>functionality testing</em>.)
 </p>
 
 <p>Within the framework of the regression test suite, we have liberty
 to alter database content, unlike that of the unit testing framework.
 We can also simulate the web browser in order to test web
 applications.
 </p>
 
 <p>As an example of a regression test, we can test whether the web
 pages are alive; whether searching for Ellis in the demo site produces
 indeed 12 records; whether searching for aoeuidhtns produces no hits
 but the box of nearest terms, and with which content; whether
 accessing the Theses collection page search prompts an Apache password
 prompt; whether the admin interface is really accessible only to
 admins or also to guests, etc.
 </p>
 
 <p>For more information on regression testing, see for example <a
 href="http://c2.com/cgi/wiki?RegressionTesting">http://c2.com/cgi/wiki?RegressionTesting</a>.
 </p>
 
 <a name="3.2"></a><h3>3.2 Writing regression tests</h3>
 
 <p>Regression tests are written per application (or sub-module) in
 files named like <code>websearch_regression_tests.py</code> or
 <code>websubmitadmin_regression_tests.py</code>.
 </p>
 
 <p>When writing regression tests, you can assume that the site is in
 the fresh demo mode (Atlantis Institute of Fictive Science).  You can
 also safely write not only database-read-only tests, but you can also
 safely insert/update/delete into/from the database whatever values you
 need for testing.  Users are warned prior to running the regression
 test suite about its possibly destructive side-effects. (See below.)
 Therefore you can create users, create user groups, attach users to
 groups to test the group joining process etc, as needed.
 </p>
 
 <p>For testing web pages using GET arguments, you can take advantage
 of the following helper function:
 
 <blockquote>
 <pre>
 $ cat /opt/cds-invenio/lib/python/invenio/testutils.py
 [...]
 def test_web_page_content(url, username="guest", expected_text="</html>"):
     """Test whether web page URL as seen by user USERNAME contains
        text EXPECTED_TEXT.  Before doing the tests, login as USERNAME.
        (E.g. interesting values are "guest" or "admin".)
 
        Return empty list in case of problems, otherwise list of error
        messages that may have been encountered during processing of
        page.
     """
 </pre>
 </blockquote>
 
 For example you can test whether admins can access WebSearch Admin
 interface but guests cannot:
 
 <blockquote>
 <pre>
 test_web_page_content(weburl + '/admin/websearch/websearchadmin.py',
                       username='admin')
 
 test_web_page_content(weburl + '/admin/websearch/websearchadmin.py',
                       username='guest',
                       expected_text='Authorization failure')
 </pre>
 </blockquote>
 
 or you can test whether searching for aoeuidhtns produces nearest
 terms box:
 
 <blockquote>
 <pre>
 test_web_page_content(weburl + '/search?p=aoeuidhtns',
                       expected_text='Nearest terms in any collection are')
 </pre>
 </blockquote>
 </p>
 
 <p>For testing web pages using POST argumens or for other more
 advanced testing you should use directly <code>mechanize</code> Python
 module that simulates the browser.  It can post forms, follow links,
 go back to previous pages, etc.  An example of how to test the login
 page functionality:
 
 <blockquote>
 <pre>
 browser = mechanize.Browser()
 browser.open(sweburl + "/youraccount/login")
 browser.select_form(nr=0)
 browser['p_un'] = 'userfoo'
 browser['p_pw'] = 'passbar'
 browser.submit()
 username_account_page_body = browser.response().read()
 try:
     string.index(username_account_page_body,
                  "You are logged in as userfoo.")
 except ValueError:
     self.fail('ERROR: Cannot login as userfoo.')
 </pre>
 </blockquote>
 
 <p>For full-scale examples, you may follow
 <code>websearch_regression_tests.py<code> and other
 <code>_regression_tests.py</code> files in the source distribution.
 </p>
 
 <a name="3.3"></a><h3>3.3 Running regression test suite</h3>
 
 <p>The regression test suite can be run by invoking:
 
 <blockquote>
 <pre>
 $ /opt/cds-invenio/bin/regressiontestsuite
 </pre>
 </blockquote>
 
 similarly to the unit test suite cited above.  The notable exception
 when compared to running the unit test suite is:
 </p>
 
 <ul>
 
 <li><code>regressiontestsuite</code> script assumes the site to be in
     demo mode (Atlantis Institute of Fictive Science)
 
 <li><code>regressiontestsuite</code> will pollute the database with
     test data as it sees fit for the regression testing purposes.
 
 </ul>
 
 <p>
 Therefore beware, <strong>running regression test suite requires clean
 demo site and may destroy your data forever</strong>.  The user is
 warned about this prior to running the suite and is given a chance to
 abort the process:
 
 
 <blockquote>
 <pre>
 $ /opt/cds-invenio/bin/regressiontestsuite
 regressiontestsuite: detected 19 regression test modules
 **********************************************************************
 **                                                                  **
 **  ***  I M P O R T A N T   W A R N I N G  ***                     **
 **                                                                  **
 ** The regression test suite needs to be run on a clean demo site   **
 ** that you can obtain by doing:                                    **
 **                                                                  **
 **    $ make drop-tables                                            **
 **    $ make create-tables                                          **
 **    $ make create-demo-site                                       **
 **    $ make load-demo-records                                      **
 **                                                                  **
 ** Note that DOING THE ABOVE WILL ERASE YOUR ENTIRE DATABASE.       **
 **                                                                  **
 ** (In addition, due to the write nature of some of the tests,      **
 ** the demo database may be altered with junk data, so that         **
 ** it is recommended to rebuild the demo site anew afterwards.)     **
 **                                                                  **
 **********************************************************************
 
 Please confirm by typing "Yes, I know!": NO
 Aborted.
 </pre>
 </blockquote>
 </p>
 
 <p>If you choose to continue, the regression test suite will produce
 the output similar to the unit test suite that was discussed
 previously.
 </p>
 
 <a name="4"></a><h2>4. Conclusions</h2>
 
 <p>A uniform testing technique and two test suites (unit test suite,
 regression test suite) were discussed.  Each programmer should plan to
 write the test code alongside the core code development to test the
 building blocks of his/her code (unit tests) as well as the overall
 application behaviour (regression tests).  The guidelines were given
 how to do so.
 </p>
 
 <a name="5"></a><h2>5. Additional information</h2>
 
 <dl>
 <dt>More information can be found on the URLs mentioned above:
 <dd>
 <pre>
 <a href="http://c2.com/cgi/wiki?UnitTest">http://c2.com/cgi/wiki?UnitTest</a>
 <a href="http://c2.com/cgi/wiki?RegressionTesting">http://c2.com/cgi/wiki?RegressionTesting</a>
 <a href="http://docs.python.org/lib/module-unittest.html">http://docs.python.org/lib/module-unittest.html</a>
 <a href="http://diveintopython.org/unit_testing/">http://diveintopython.org/unit_testing/</a>
 <a href="http://wwwsearch.sourceforge.net/mechanize/">http://wwwsearch.sourceforge.net/mechanize/</a>
 </pre>
 </dl>
 
 <dl>
 <dt>and elsewhere:
 <dd>
 <pre>
 Steve McConnell: "Code Complete"
 FIXME: list of other interesting references, like Kent Beck papers, etc
 </pre>
 </dl>