diff --git a/modules/webhelp/web/hacking/coding-style.webdoc b/modules/webhelp/web/hacking/coding-style.webdoc index fa39b089e..496692bbe 100644 --- a/modules/webhelp/web/hacking/coding-style.webdoc +++ b/modules/webhelp/web/hacking/coding-style.webdoc @@ -1,213 +1,212 @@ ## -*- mode: html; coding: utf-8; -*- ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - - +
 A brief description of things we strive at, more or less unsuccessfully.
 
 1. Packaging
 
    We use the classical GNU Autoconf/Automake approach, for tutorial
    see e.g. Learning the GNU development tools or the AutoBook.
 
 2. Modules
 
    CDS Invenio started as a set of pretty independent modules developed by
    independent people with independent styles.  This was even more
    pronounced by the original use of many different languages
    (e.g. Python, PHP, Perl).  Now the CDS Invenio code base is striving to
    use Python everywhere, except in speed-critical parts when a
    compiled language such as Common Lisp may come to the rescue in the
    near future.
 
    When modifying an existing module, we propose to strictly continue
    using whatever coding style the module was originally written into.
    When writing new modules, we propose to stick to the
    below-mentioned standards.
 
    The code integration across modules is happening, but is slow.
    Therefore, don't be surprised to see that there is a lot of room to
    refactor.
 
 3. WML/ePerl/etc
 
    This is not so important, because not many lines-of-code were
    written in WML/ePerl.  We prefer to loosely follow the GNU way, as
    always.
 
 4. Python
 
    We aim at following recommendations from PEP 8, although
    the existing code surely do not fulfil them here and there.
    The code indentation is done via spaces only, please do not use
    tabs.  One tab counts as four spaces.  Emacs users can look into
    our cdsware.el for inspiration.
 
    All the Python code should be extensively documented via
    docstrings, so you can always run pydoc file.py to peruse the
    file's documentation in one simple go.
 
    Do not forget to run pylint on your code to check for errors like
    uninitialized variables and to improve its quality and conformance
    to the coding standard.  If you develop in Emacs, run M-x pylint
    RET on your buffers frequently.  Read and implement pylint
    suggestions.  (Note that using lambda and friends may lead to false
    pylint warnings.  You can switch them off by putting block comments
    of the form ``# pylint: disable-msg=C0301''.)
 
    Do not forget to run pychecker on your code either.  It is another
    source code checker that catches some situations better and some
    situations worse than pylint.  If you develop in Emacs, run C-c C-w
    (M-x py-pychecker-run RET) on your buffers frequently.  (Note that
    using psyco on classes may lead to false pychecker warnings.)
 
    You can check the kwalitee of your code by running ``python
    modules/miscutil/lib/kwalitee.py *.py'' on your files.  You can
    also check the code kwalitee across all the modules by running
    ``make kwalitee-check'' in the main source directory.
 
    Do not hardcode magic constants in your code.  Every magic string or
    a number should be put into accompanying file_config.py with
    symbol name beginning by cfg_modulename_*.
 
    Clearly separate interfaces from implementation.  Document your
    interfaces.  Do not expose to other modules anything that does not
    have to be exposed.  Apply principle of least information.
 
    Create as few new library files as possible.  Do not create many
    nested files in nested modules; rather put all the lib files in one
    dir with bibindex_foo and bibindex_bar names.
 
    Use imperative/functional paradigm rather then OO.  If you do use
    OO, then stick to as simple class hierarchy as possible.  Recall
    that method calls and exception handling in Python are quite
    expensive.
 
    Use rather the good old foo_bar naming convention for symbols (both
    variables and function names) instead of fooBar CaMelCaSe
    convention.  (Except for Class names where UppercaseSymbolNames are
    to be used.)
 
    Pay special attention to name your symbols descriptively.  Your
    code is going to be read and work with by others and its symbols
    should be self-understandable without any comments and without
    studying other parts of the code.  For example, use proper English
    words, not abbreviations that can be misspelled in many a way; use
    words that go in pair (e.g. create/destroy, start/stop; never
    create/stop); use self-understandable symbol names
    (e.g. list_of_file_extensions rather than list2); never misname
    symbols (e.g. score_list should hold the list of scores and nothing
    else - if in the course of development you change the semantics of
    what the symbol holds then change the symbol name too).  Do not be
    afraid to use long descriptive names; good editors such as Emacs
    can tab-complete symbols for you.
 
    When hacking module A, pay close attention to ressemble existing
    coding convention in A, even if it is legacy-weird and even if we
    use a different technique elsewhere.  (Unless the whole module A is
    going to be refactored, of course.)
 
    Speed-critical parts should be profiled with pyprof.  Do not forget
    to use tricks like psyco.
 
    The code should be well tested before committed.  Testing is an
    integral part of the development process.  Test along as you
    program.  The testing process should be automatized via our unit
    test and regression test suite infrastructures.  Please read the
    test suite strategy to know more.
 
    Python promotes writing clear, readable, easily maintainable code.
    Write it as such.  Recall Albert Einstein's ``Everything should be
    made as simple as possible, but not simpler''.  Things should be
    neither overengineered nor oversimplified.
 
    Recall principles Unix is built upon.  As summarized by Eric
    S. Reymond's TAOUP:
 
       Rule of Modularity: Write simple parts connected by clean interfaces.
       Rule of Clarity: Clarity is better than cleverness.
       Rule of Composition: Design programs to be connected with other programs.
       Rule of Separation: Separate policy from mechanism; separate interfaces from engines.
       Rule of Simplicity: Design for simplicity; add complexity only where you must.
       Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do.
       Rule of Transparency: Design for visibility to make inspection and debugging easier.
       Rule of Robustness: Robustness is the child of transparency and simplicity.
       Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust.
       Rule of Least Surprise: In interface design, always do the least surprising thing.
       Rule of Silence: When a program has nothing surprising to say, it should say nothing.
       Rule of Repair: Repair what you can -- but when you must fail, fail noisily and as soon as possible.
       Rule of Economy: Programmer time is expensive; conserve it in preference to machine time.
       Rule of Generation: Avoid hand-hacking; write programs to write programs when you can.
       Rule of Optimization: Prototype before polishing. Get it working before you optimize it.
       Rule of Diversity: Distrust all claims for one true way.
       Rule of Extensibility: Design for the future, because it will be here sooner than you think.
 
    or the golden rule that says it all: ``keep it simple''.
 
    For more hints, thoughts, and other ruminations on programming,
    see my CDS Invenio Wiki.
 
 5. PHP
 
    We are moving slowly away out of PHP so that there may be several
    practices in place with the PHP code present in CDS Invenio.  Usually
    this is consistent within modules but inconsistent across modules.
    For example, some old code used Emacs' perl-mode, following
    traditional K&R C style, while some other old code tried to stick
    to PEAR recommendations.
 
 6. MySQL
 
    Table naming policy is, roughly and briefly:
 
       - "foo": table names in lowercase, without prefix, used by me
          for WebSearch
 
       - "foo_bar": underscores represent M:N relationship between
         "foo" and "bar", to tie the two tables together
 
       - "bib*": many tables to hold the metadata and relationships
          between them
 
       - "idx*": idx is the table name prefix used by BibIndex
 
       - "rnk*": rnk is the table name prefix used by BibRank
 
       - "flx*": flx is the table name prefix used by FlexElink (also known as
          BibFormat)
 
       - "sbm*": sbm is the table name prefix used by WebSubmit
 
       - "sch*": sch is the table name prefix used by BibSched
 
       - "collection*": many tables to describe collections and search
         interface pages
 
       - "user*" : many tables to describe personal features (baskets,
         alerts)
 
 - end of file -
 
 
diff --git a/modules/webhelp/web/hacking/common-concepts.webdoc b/modules/webhelp/web/hacking/common-concepts.webdoc index 65dcbfbe0..fc11bfddc 100644 --- a/modules/webhelp/web/hacking/common-concepts.webdoc +++ b/modules/webhelp/web/hacking/common-concepts.webdoc @@ -1,118 +1,116 @@ ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - -
 The description of concepts you will encounter here and there in the
 CDS Invenio.  Our interpretation may differ from the practice found in
 other products, so please read this carefully.
 
 1. sysno - (ALEPH|old) system number
 
    Stands for (ALEPH|old) system number only.  Which means that, for
    outside-CERN CDS Invenio installations, stands for an 'old system
    number' whatever it is, if they want to publicise it instead of our
    internal auto-incremented CDS Invenio record identifiers.
 
 2. recID - CDS Invenio record identifier
 
    Each record has got an auto-incremented ID in the "bibrec" table
    (formerly called "bibitem").  This is the basic "record identifier"
    concept in CDS Invenio.
 
 3. docID - eventual fulltext document identifier
 
    Each fulltext file may have eventual docID.  This will permit us to
    interconnect records (recID) with fulltext files (docID), if we
    want to.  At the moment there is only one-way connection from recID
    to docID via HTTP field 856.  This is ugly.  I think we may
    probably profit by introducing recID-docID relationship in several
    ways: file protection, reference extraction, fulltext
    indexing... (?!)
 
 4. field - logical field concept such as "reportnumber"
 
    A bibliographic record is composed of 'fields' such as title or
    author.  Note that we consider 'field' to be a logical concept,
    that is compound and may consist of several physical MARC fields.
    For example, "report number" field consists of several MARC fields
    such as 088 $a, 037 $a, 909C0 $r.  Another example: "first report
    number" consist of only one MARC field, 037 $a.
 
 5. tag - physical field concept such as "088 $a".
 
    Having defined the concept of 'logical field', let's now turn to
    the 'physical field' that denotes basically the concept of 'MARC
    field' as defined in MARC-21 standard.  In addition to tag, a field
    may contain two identifiers to describe the data content, and
    subfield codes to denote various parts of the content.  See our
    HOWTO MARC guide on this.
 
    Thus said, in the implementation of our bibliographic tables
    (bibXXx) we have sort of generalized the term 'tag' to stand for:
 
       tag = tag code + identifier1 + identifier1 + subfield code
 
    This convention, while taking some freedom from the MARC-21
    standard, enables us to write things like "field: base number, tag:
    909C0b, value: 11".  If this interpretation is indeed too free with
    respect to the standard usage of terms, we may change them in the
    future.
 
 6. collection - here we distinguish (i) primary collection concept
                 and (ii) specific collection concept.
 
    The (i) primary collections are basic organizational structure of
    how the records are grouped together in collections.  The primary
    collections are used in the navigable search interface under the
    ``Narrow search'' box.  The (ii) specific collections present an
    orthogonal view on the data organization, that is useful to group
    together some records from different primary collections, if they
    present a common pattern.  The specific collections are used in the
    search interface under the ``Focus on'' box.
 
    The primary collections are defined mainly by the collection
    identifier ("980 $a,b"); and the specific collections are as
    defined by any query that is possible for a search engine to
    execute (see also "dbquery" column in the "collection" table).
 
    In the past we used to use the term "catalogue", that is now
    deprecated, and that can be interchanged with "collection".
 
 7. doctype - stands for web document type concept, used in WebSubmit
 
    The "document type" is used solely for submission purposes, and
    fulltext access purposes ("setlink"-like).  For example, a document
    type "photo" may be used in many collections such as "Foo Photos",
    "Bar PhotoLab", etc.  Similarly, one collection can cover several
    doctypes.  (M:N relationship)
 
 8. baskets, alerts, settings - covering personal features
 
    Denote personal features, for which we previously used the terms
    "shelf" and "profile" that are now deprecated.
 
 - end of file -
 
 
diff --git a/modules/webhelp/web/hacking/directory-organization.webdoc b/modules/webhelp/web/hacking/directory-organization.webdoc index ccf238415..89d06b5e1 100644 --- a/modules/webhelp/web/hacking/directory-organization.webdoc +++ b/modules/webhelp/web/hacking/directory-organization.webdoc @@ -1,201 +1,199 @@ ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - -
 Please find some notes below on how the source (as well as the target)
 directory structure is organized, where the sources get installed to,
 and how the visible URLs are organized.
 
 1. CDS Invenio will generally install into the directory taken from
    --with-prefix configuration variables.  These are discussed in points
    2 and 3 below, respectively.
 
 2. The first directory (--with-prefix) specifies general CDS Invenio
    install directory, where we'll put CLI binaries, Python and PHP
    libraries, manpages, log and cache directories for the running
    installation, and any other dirs as needed.  They will all live
    under one common hood.
 
    For example, configure --with-prefix=/opt/cds-invenio,
    and you'll obtain the following principal directories:
 
       /opt/cds-invenio/
       /opt/cds-invenio/bin
       /opt/cds-invenio/lib
       /opt/cds-invenio/lib/php
       /opt/cds-invenio/lib/python
       /opt/cds-invenio/lib/wml
       /opt/cds-invenio/var
       /opt/cds-invenio/var/cache
       /opt/cds-invenio/var/log
 
    with the obvious meaning:
 
       - bin : for command-line executable binaries and scripts
 
       - lib/php : for our own PHP libraries, see below
 
       - lib/python : for our own Python libraries, see below
 
       - lib/wml : for our own WML libraries, see below
 
       - var : for installation-specific runtime stuff
 
       - var/log : for all sorts of runtime logging, e.g. search.log
 
       - var/cache : for all sorts of runtime caching, e.g. OAI
                     retention harvesting, collection cache, etc
 
    This scheme copies to some extent the usual Unix filesystem
    convention, so it may be easily expanded later according to our
    future needs.
 
 3. The second directory (prefix/var/www) contains Web scripts (PHP,
    mod_python), HTML documents and images, and so on.  This is where
    webuser-seen files are located.  Basically, the files there contain
    only the interface to the functionality that is provided by the
    libraries stored under the library directory.
 
    The prefix/var/www directory is further structured according to
    whom it provides services.  We distinguish user-level, admin-level
    and hacker-level access to the site, as reflected by the visible
    URL structure.
 
      a) The user-level access point is provided by the main WEBURL
         address and its subdirs.  All the user-level documentation is
         available under WEBURL/help/.  The module-specific user-level
         documentation is available under WEBURL/help/<module>/.
 
      b) The admin-level access is provided by WEBURL/admin/ entry
         point.  The admin-level documentation is accessible from the
         same place.  The admin-level module-specific functionality and
         help is available under WEBURL/admin/<module>/.  (If
         it's written in mod_python, it usually points to
         WEBURL/<module>admin.py/ since we configure the server
         to have all mod_python scripts under the prefix/var/www root
         directory.)
 
      c) The hacker-level documentation is provided by WEBURL/hacking/
         entry point.  There is no hacker-level functionality possible
         via Web, of course, so that unlike admin-level entry point,
         the hacker-level entry point provides only a common access to
         available hacking documention, etc.  The module-specific
         information is available under WEBURL/hacking/<module>/.
 
 4. Let's now return a bit more closely to the role Python and PHP
    library directories outside of the Apache tree:
 
       /opt/cds-invenio/lib/php
       /opt/cds-invenio/lib/python
 
    Here we put not only (a) libraries that may be reused across CDS
    Invenio modules, but also (b) all the "core" functionality of CDS
    Invenio that is not directly callable by the end users.  The
    "callable" functionality is put under "prefix/var/www" in case of
    web scripts and documents, and under "bindir" in case of CLI
    executables.
 
    As for (a), for example in the PHP CDS Invenio library you'll find
    currently the common PHP error handling code that is shared between
    BibFormat and WebSubmit; in the Python CDS Invenio library (in fact,
    CDS Invenio Pythonic 'module', but we are reserving the word 'module'
    to denote 'CDS Invenio module' in this text) you'll find config.py
    containing WML-supplied site parameters, dbquery.py containing DB
    persistent query module, or webpage.py with templates and functions
    to produce mod_python web pages with common look and feel.  These
    could and should be reused across all our modules.  Note that I
    created only a small number of "broad" libraries at the moment.  In
    case we want to reuse more code parts, we'd refactor the code more,
    as needed.
 
    As for (b), for example the existing search engine was split into
    search.py that only contains three "callable" functions, which goes
    into prefix/var/www, while the search engine itself is composed of
    search_engine.py and search_engine_config.py living under LIBDIR.
    In this way we can easily create "real" CLI search, that will
    depend only on the search libraries in LIBDIR, and that will get
    installed into BINDIR.
 
    To recap:
 
       - For each CDS Invenio module, I'm differentiating between
         "callable" and "core" parts.  The former go into
         prefix/var/www or BINDIR, the latter into LIBDIR.
 
       - Our PHP/Pythonic libraries contain several sorts of thing:
 
           - the implementation of the "callable" functions
 
           - non-callable internal "core" or "library" code parts, as
             stated above.  Not shared across CDS Invenio modules.
 
           - utility code meant for reuse across CDS Invenio modules, such
             as dbquery.py
 
           - Pythonic config files out of user-supplied WML (non-MySQL)
             configuration parameters (see
             e.g. search_engine_config.py)
 
 5. The same strategy is reflected in the organization of source
    directories inside CDS Invenio CVS.  Each CDS Invenio module lives in a
    separate directory located under "modules" directory of the
    sources.  Further on, each module contains usually several
    subdirectories that reflect the above-mentioned packaging choice.
    For example, in case of WebSearch you'll find:
 
       ./modules/websearch
       ./modules/websearch/bin
       ./modules/websearch/doc
       ./modules/websearch/doc/hacking
       ./modules/websearch/doc/admin
       ./modules/websearch/lib
       ./modules/websearch/web
       ./modules/websearch/web/admin
 
    with the following straightforward meaning:
 
       - bin : for callable CLI binaries and scripts
 
       - doc : for documentation.  The user-level documentation is
               located in this directory.  The admin-level
               documentation is located in the "admin" subdir.  The
               programmer-level documentation is located in the
               "hacking" subdir.
 
       - lib : for uncallable "core" functionality, see the comments
               above
 
       - web : for callable web scripts and pages.  The user- and
               admin- level is separated similarly as in the "doc"
               directory (see above).
 
    The structure is respected throughout all the CDS Invenio modules, a
    notable exception being the MiscUtil module that contains subdirs
    like "sql" (for the table creating/dropping SQL commands, etc) or
    "demo" (for creation of Atlantis Institute of Science, our demo
    site.)
 
 - end of file -
 
diff --git a/modules/webhelp/web/hacking/modules-overview.webdoc b/modules/webhelp/web/hacking/modules-overview.webdoc index 0b160ff90..64f9fd671 100644 --- a/modules/webhelp/web/hacking/modules-overview.webdoc +++ b/modules/webhelp/web/hacking/modules-overview.webdoc @@ -1,274 +1,273 @@ ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - - +

CDS Invenio consists of several more or less independent modules with precisely defined functionality. The general criterion for module names is to use the ``Bib'' prefix to denote modules that work more with the bibliographic data, and the ``Web'' prefix to denote modules that work more with the Web interface. (The difference is of course blurred in some cases, as in the case of search engine that has got a web interface but searches bibliographic data.)

Follows a brief description of what each module does. After descriptions the module relationship diagram is presented.

Relationship between the modules:
Modules overview diagram

diff --git a/modules/webhelp/web/hacking/release-numbering.webdoc b/modules/webhelp/web/hacking/release-numbering.webdoc index ff43ca3a4..880c29d54 100644 --- a/modules/webhelp/web/hacking/release-numbering.webdoc +++ b/modules/webhelp/web/hacking/release-numbering.webdoc @@ -1,102 +1,100 @@ ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - -
 CDS Invenio uses the classical major.minor.patchlevel release version
 numbering scheme that is commonly used in the GNU/Linux world and
 elsewhere.  Each release is labelled by
 
      major.minor.patchlevel
 
 release version number.  For example, a release version 4.0.1 means:
 
        4 - 4th major version, i.e. the whole system has been already
            4th times either fully rewritten or at least in its very
            essential components.  The upgrade from one major version
            to another may be rather hard, may require new prerequisite
            technologies, full data dump, reload and reindexing, as
            well as other major configuration adapatations, possibly
            with an important manual intervention.
 
        0 - 0th minor version, i.e. the first minor release of the 4th
            major rewrite.  (Increments go 4.1, 4.2, ... 4.9, 4.10,
            4.11, 4.12, ... until some important rewrite is done,
            e.g. the database philosophy dramatically changes, leading
            to a non-trivial upgrade, and we have 5.0.)  The upgrade
            from one minor version to another may be laborious but is
            relatively painless, in that some table changes and data
            manipulations may be necessary but they are somewhat
            smaller in nature, easier to grasp, and possibly done by an
            automated script.
 
        1 - 1st patch level to 4.0, fixing bugs in 4.0.0 but not adding
            any substantially new functionality.  That is, the only new
            functionality that is added is that of a `bug fix' nature.
            The upgrade from one patch level to another is usually
            straightforward.
 
            (Packages often seem to break this last rule, e.g. Linux
            kernel adopting new important functionality (such as
            ReiserFS) within the stable 2.4.x branch.  It can be easily
            seen that it is somewhat subjective to judge what is
            qualitatively more like a minor new functionality and what
            is more like a patch to the existing behaviour.  We have
            tried to quantify these notions with respect to whether
            table structure and/or technology change require small or
            large upgrade jobs and eventual manual efforts.)
 
 So, if we have a version 4.3, a bug fix would mean to release 4.3.1,
 some minor new functionality and upgrade would mean to release 4.4,
 some important database structure rewrite or an imaginary exchange of
 Python for Common Lisp would mean to release 5.0, etc.
 
 In addition, the two-branch release policy is adopted:
 
   a) stable branch - releases in the stable branch are numbered with
      even minor version number, like 0.2, 0.4, etc.  These releases
      are usually well tested.  The configuration files and features
      usually don't change often from one release to another.  The
      release frequency is low.
 
   b) development branch - releases in the development branch are
      number with the odd minor version number, like 0.1, 0.3, etc.
      These releases are more experimental and may be less tested than
      the stable ones.  The configuration files and features change
      more rapidly from one release to another.  The release frequency
      is higher.
 
 It can be seen that the above scheme is somewhat similar to the Linux
 kernel version numbering scheme.
 
 Currently, CDS Invenio 0.0.9 represents the stable branch release and
 0.1.0 the development branch release.  We are going to frequently
 update it to provide 0.1.1, 0.1.2, etc as the currently missing admin
 functionality is being added into the development branch, until later
 on, when some release, say 0.1.8, will achieve a status of
 satisfaction, at which point we release it as the next stable version
 (0.2 or 1.0), and start a new development branch (0.3 or 1.1).
 
 - end of file -
 
diff --git a/modules/webhelp/web/hacking/test-suite.webdoc b/modules/webhelp/web/hacking/test-suite.webdoc index 4d9d7abae..73e0d7b3b 100644 --- a/modules/webhelp/web/hacking/test-suite.webdoc +++ b/modules/webhelp/web/hacking/test-suite.webdoc @@ -1,521 +1,519 @@ ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - -

Contents

1. General considerations
2. Unit testing
       2.1 Unit testing philosophy
       2.2 Writing unit tests
       2.3 Running unit tests
3. Regression testing
       3.1 Regression testing philosophy
       3.2 Writing regression tests
       3.3 Running regression tests
4. Conclusions
5. Additional information

1. General considerations

This documents presents guidelines for unit testing and regression testing homogenisation throughout all CDS Invenio modules.

Testing is an important coding activity. Most authors believe that writing test cases should take between 10% and 30% of the project time. But, even with such a large fraction, don't put too much belief on such a testing. It cannot find bugs that aren't tested for. So, while testing is an important activity inherent to safe software development practices, it cannot become a substitute for pro-active bug hunting, source code inspection, and bugfree-driven development approach from the start.

Testing should happen alongside with coding. If you write a function, immediately load it into your toplevel, evaluate its definition, and call it for a couple of arguments to make sure the function works as expected. If not, then change the function definition, re-evaluate it, re-call it, etc. Dynamic languages with interactive toplevel such as Common Lisp or Python makes this easy for you. Dynamic redefinition capabilities (full in Common Lisp, partial in Python) are very programmer-friendly in this respect. If your test cases are interesting to be kept, then keep them in a test file. (It's almost all the time a good idea to store them in the test file, since you cannot predict whether you won't want to change something in the future.) We'll see below how to store your tests in a test file.

When testing, it is nice to know some rules of thumb, like: check your edge cases (e.g. null array), check atypical input values (e.g. laaarge array instead of typically 5-6 elements only), check your termination conditions, ask whether your arguments have already been safe-proofed or whether it is in your mandate to check them, write a test case for each `if-else' branch of the code to explore all the possibilites, etc. Another interesting rule of thumb is the bug frequency distribution. Experience has shown that the bugs tend to cluster. If you discover a bug, there are chances that other bugs are in the neighborhood. The famous 80/20 rule of thumb applies here too: about 80% of bugs are located in about 20% of the code. Another rule of thumb: if you find a bug caused by some coding practice pattern thay may be used elsewhere too, look and fix other pattern instances.

In a nutshell, the best advice to write bug-free code is: think ahead. Try to prepare in advance for unusual usage scenarios, to foresee problems before they happen. Don't rely on typical input and typical usage scenarios. Things have a tendency to become atypical one day. Recall that testing is necessary, but not sufficient, to write good code. Therefore, think ahead!

2. Unit testing

2.1 Unit testing philosophy

Core functionality, such as the hit set intersection for the search engine, or the text input manipulating functions of the BibConvert language, should come with a couple of test cases to assure proper behaviour of the core functionality. The test cases should cover typical input (e.g. hit set corresponding to the query for ``ellis''), as well as the edge cases (e.g. empty/full hit set) and other unusual situations (e.g. non-UTF-8 accented input for BibConvert functions to test a situation of different number of bytes per char).

The test cases should be written for most important core functionality. Not every function or class in the code is to be throughly tested. Common sense will tell.

Unit test cases are free of side-effects. Users should be able to run them on production database without any harm to their data. This is because the tests test ``units'' of the code, not the application as such. If the behaviour of the function you would like to test depends on the status of the database, or some other parameters that cannot be passed to the function itself, the unit testing framework is not suitable for this kind of situation and you should use the regression testing framework instead (see below).

For more information on Pythonic unit testing, see the documentation to the unittest module at http://docs.python.org/lib/module-unittest.html. For a tutorial, see for example http://diveintopython.org/unit_testing/.

2.2 Writing unit tests

Each core file that is located in the lib directory (such as the webbasketlib.py in the example above) should come with a testing file where the test cases are stored. The test file is to be named identically as the lib file it tests, but with the suffix _tests (in our example, webbasketlib_tests.py).

The test cases are written using Pythonic unittest TestCase class. An example for testing search engine query parameter washing function:

 $ cat /opt/cds-invenio/lib/python/invenio/search_engine_tests.py
 [...]
 import search_engine
 import unittest
 
 class TestWashQueryParameters(unittest.TestCase):
     """Test for washing of search query parameters."""
 
     def test_wash_url_argument(self):
         """search engine washing of URL arguments"""
         self.assertEqual(1, search_engine.wash_url_argument(['1'],'int'))
         self.assertEqual("1", search_engine.wash_url_argument(['1'],'str'))
         self.assertEqual(['1'], search_engine.wash_url_argument(['1'],'list'))
         self.assertEqual(0, search_engine.wash_url_argument('ellis','int'))
         self.assertEqual("ellis", search_engine.wash_url_argument('ellis','str'))
         self.assertEqual(["ellis"], search_engine.wash_url_argument('ellis','list'))
         self.assertEqual(0, search_engine.wash_url_argument(['ellis'],'int'))
         self.assertEqual("ellis", search_engine.wash_url_argument(['ellis'],'str'))
         self.assertEqual(["ellis"], search_engine.wash_url_argument(['ellis'],'list'))
 [...]
 

In addition, each test file is supposed to define a create_test_suite() function that will return test suite with all the tests available in this file:

 $ cat /opt/cds-invenio/lib/python/invenio/search_engine_tests.py
 [...]
 def create_test_suite():
     """Return test suite for the search engine."""
     return unittest.TestSuite((unittest.makeSuite(TestWashQueryParameters,'test'),
                                unittest.makeSuite(TestStripAccents,'test')))
 [...]
 

This will enable us to later include this file into testsuite executable:

 $ cat ~/src/cds-invenio/modules/miscutil/bin/testsuite.in
 [...]
 from invenio import search_engine_tests
     from invenio import bibindex_engine_tests
 
 def create_all_test_suites():
     """Return all tests suites for all CDS Invenio modules."""
     return unittest.TestSuite((search_engine_tests.create_test_suite(),
                                bibindex_engine_tests.create_test_suite()))
 [...]
 

In this way, all the test cases defined in the file search_engine_tests.py will be executed when the global testcase executable is called.

Note that it may be time-consuming to run all the tests in one go. If you are interested in running tests only on a certain file (say search_engine_tests.py), then launch:

 $ python /opt/cds-invenio/lib/python/invenio/search_engine_tests.py
 

For full-scale examples, you may follow search_engine_tests.py and other _tests.py files in the source distribution.

2.3 Running unit tests

CDS Invenio test suite can be run in the source directory:

 $ make test
 
or anytime after the installation:
 $ /opt/cds-invenio/bin/testsuite
 
The ``testsuite'' executable will run all available unit tests provided with CDS Invenio.

The informative output is of the form:

 $ make test
 CDS Invenio v0.3.2.20040519 test suite results:
 ===========================================
 search engine washing of query patterns ... ok
 search engine washing of URL arguments ... ok
 search engine stripping of accented letters ... ok
 bibindex engine list union ... ok
 
 ----------------------------------------------------------------------
 Ran 4 tests in 0.121s
 
 OK
 
In case of problems you will see failures like:
 CDS Invenio v0.3.2.20040519 test suite results:
 ===========================================
 search engine washing of query patterns ... FAIL
 search engine washing of URL arguments ... ok
 search engine stripping of accented letters ... ok
 bibindex engine list union ... ok
 
 ======================================================================
 FAIL: search engine washing of query patterns
 ----------------------------------------------------------------------
 Traceback (most recent call last):
   File "/opt/cds-invenio/lib/python/invenio/search_engine_tests.py", line 25, in test_wash_pattern
     self.assertEqual("ell*", search_engine.wash_pattern('ell*'))
   File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual
     raise self.failureException, \
 AssertionError: 'ell*' != 'ell'
 
 ----------------------------------------------------------------------
 Ran 4 tests in 0.091s
 
 FAILED (failures=1)
 

The test suite compliance should be checked before each CVS commit. (And, obviously, double-checked before each CDS Invenio release.)

3. Regression testing

3.1 Regression testing philosophy

In addition to the above-mentioned unit testing of important functions, a regression testing should ensure that the overall application functionality is behaving well and is not altered by code changes. This is especially important if a bug had been previously found. Then a regression test case should be written to assure that it will never reappear. (It also helps to scan the neighborhood of the bug, or the whole codebase for occurrences of the same kind of bug, see the 80/20 thumb rule cited above.)

Moreover, the regression test suite should be used when the functionality of the item we would like to test depends on extra-parametrical status, such as the database content. Also, the regression framework is suitable for testing the web pages overall behaviour. (In extreme programming, the regression testing is called acceptance testing, the name that evolved from previous functionality testing.)

Within the framework of the regression test suite, we have liberty to alter database content, unlike that of the unit testing framework. We can also simulate the web browser in order to test web applications.

As an example of a regression test, we can test whether the web pages are alive; whether searching for Ellis in the demo site produces indeed 12 records; whether searching for aoeuidhtns produces no hits but the box of nearest terms, and with which content; whether accessing the Theses collection page search prompts an Apache password prompt; whether the admin interface is really accessible only to admins or also to guests, etc.

For more information on regression testing, see for example http://c2.com/cgi/wiki?RegressionTesting.

3.2 Writing regression tests

Regression tests are written per application (or sub-module) in files named like websearch_regression_tests.py or websubmitadmin_regression_tests.py.

When writing regression tests, you can assume that the site is in the fresh demo mode (Atlantis Institute of Fictive Science). You can also safely write not only database-read-only tests, but you can also safely insert/update/delete into/from the database whatever values you need for testing. Users are warned prior to running the regression test suite about its possibly destructive side-effects. (See below.) Therefore you can create users, create user groups, attach users to groups to test the group joining process etc, as needed.

For testing web pages using GET arguments, you can take advantage of the following helper function:

 $ cat /opt/cds-invenio/lib/python/invenio/testutils.py
 [...]
 def test_web_page_content(url, username="guest", expected_text=""):
     """Test whether web page URL as seen by user USERNAME contains
        text EXPECTED_TEXT.  Before doing the tests, login as USERNAME.
        (E.g. interesting values are "guest" or "admin".)
 
        Return empty list in case of problems, otherwise list of error
        messages that may have been encountered during processing of
        page.
     """
 
For example you can test whether admins can access WebSearch Admin interface but guests cannot:
 test_web_page_content(weburl + '/admin/websearch/websearchadmin.py',
                       username='admin')
 
 test_web_page_content(weburl + '/admin/websearch/websearchadmin.py',
                       username='guest',
                       expected_text='Authorization failure')
 
or you can test whether searching for aoeuidhtns produces nearest terms box:
 test_web_page_content(weburl + '/search?p=aoeuidhtns',
                       expected_text='Nearest terms in any collection are')
 

For testing web pages using POST argumens or for other more advanced testing you should use directly mechanize Python module that simulates the browser. It can post forms, follow links, go back to previous pages, etc. An example of how to test the login page functionality:

 browser = mechanize.Browser()
 browser.open(sweburl + "/youraccount/login")
 browser.select_form(nr=0)
 browser['p_un'] = 'userfoo'
 browser['p_pw'] = 'passbar'
 browser.submit()
 username_account_page_body = browser.response().read()
 try:
     string.index(username_account_page_body,
                  "You are logged in as userfoo.")
 except ValueError:
     self.fail('ERROR: Cannot login as userfoo.')
 

For full-scale examples, you may follow websearch_regression_tests.py and other _regression_tests.py files in the source distribution.

3.3 Running regression test suite

The regression test suite can be run by invoking:

 $ /opt/cds-invenio/bin/regressiontestsuite
 
similarly to the unit test suite cited above. The notable exception when compared to running the unit test suite is:

  • regressiontestsuite script assumes the site to be in demo mode (Atlantis Institute of Fictive Science)
  • regressiontestsuite will pollute the database with test data as it sees fit for the regression testing purposes.

Therefore beware, running regression test suite requires clean demo site and may destroy your data forever. The user is warned about this prior to running the suite and is given a chance to abort the process:

 $ /opt/cds-invenio/bin/regressiontestsuite
 regressiontestsuite: detected 19 regression test modules
 **********************************************************************
 **                                                                  **
 **  ***  I M P O R T A N T   W A R N I N G  ***                     **
 **                                                                  **
 ** The regression test suite needs to be run on a clean demo site   **
 ** that you can obtain by doing:                                    **
 **                                                                  **
 **    $ make drop-tables                                            **
 **    $ make create-tables                                          **
 **    $ make create-demo-site                                       **
 **    $ make load-demo-records                                      **
 **                                                                  **
 ** Note that DOING THE ABOVE WILL ERASE YOUR ENTIRE DATABASE.       **
 **                                                                  **
 ** (In addition, due to the write nature of some of the tests,      **
 ** the demo database may be altered with junk data, so that         **
 ** it is recommended to rebuild the demo site anew afterwards.)     **
 **                                                                  **
 **********************************************************************
 
 Please confirm by typing "Yes, I know!": NO
 Aborted.
 

If you choose to continue, the regression test suite will produce the output similar to the unit test suite that was discussed previously.

4. Conclusions

A uniform testing technique and two test suites (unit test suite, regression test suite) were discussed. Each programmer should plan to write the test code alongside the core code development to test the building blocks of his/her code (unit tests) as well as the overall application behaviour (regression tests). The guidelines were given how to do so.

5. Additional information

More information can be found on the URLs mentioned above:
 http://c2.com/cgi/wiki?UnitTest
 http://c2.com/cgi/wiki?RegressionTesting
 http://docs.python.org/lib/module-unittest.html
 http://diveintopython.org/unit_testing/
 http://wwwsearch.sourceforge.net/mechanize/
 
and elsewhere:
 Steve McConnell: "Code Complete"
 FIXME: list of other interesting references, like Kent Beck papers, etc