diff --git a/modules/webhelp/web/hacking/coding-style.webdoc b/modules/webhelp/web/hacking/coding-style.webdoc index fa39b089e..496692bbe 100644 --- a/modules/webhelp/web/hacking/coding-style.webdoc +++ b/modules/webhelp/web/hacking/coding-style.webdoc @@ -1,213 +1,212 @@ ## -*- mode: html; coding: utf-8; -*- ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - - +
A brief description of things we strive at, more or less unsuccessfully. 1. Packaging We use the classical GNU Autoconf/Automake approach, for tutorial see e.g. Learning the GNU development tools or the AutoBook. 2. Modules CDS Invenio started as a set of pretty independent modules developed by independent people with independent styles. This was even more pronounced by the original use of many different languages (e.g. Python, PHP, Perl). Now the CDS Invenio code base is striving to use Python everywhere, except in speed-critical parts when a compiled language such as Common Lisp may come to the rescue in the near future. When modifying an existing module, we propose to strictly continue using whatever coding style the module was originally written into. When writing new modules, we propose to stick to the below-mentioned standards. The code integration across modules is happening, but is slow. Therefore, don't be surprised to see that there is a lot of room to refactor. 3. WML/ePerl/etc This is not so important, because not many lines-of-code were written in WML/ePerl. We prefer to loosely follow the GNU way, as always. 4. Python We aim at following recommendations from PEP 8, although the existing code surely do not fulfil them here and there. The code indentation is done via spaces only, please do not use tabs. One tab counts as four spaces. Emacs users can look into our cdsware.el for inspiration. All the Python code should be extensively documented via docstrings, so you can always run pydoc file.py to peruse the file's documentation in one simple go. Do not forget to run pylint on your code to check for errors like uninitialized variables and to improve its quality and conformance to the coding standard. If you develop in Emacs, run M-x pylint RET on your buffers frequently. Read and implement pylint suggestions. (Note that using lambda and friends may lead to false pylint warnings. You can switch them off by putting block comments of the form ``# pylint: disable-msg=C0301''.) Do not forget to run pychecker on your code either. It is another source code checker that catches some situations better and some situations worse than pylint. If you develop in Emacs, run C-c C-w (M-x py-pychecker-run RET) on your buffers frequently. (Note that using psyco on classes may lead to false pychecker warnings.) You can check the kwalitee of your code by running ``python modules/miscutil/lib/kwalitee.py *.py'' on your files. You can also check the code kwalitee across all the modules by running ``make kwalitee-check'' in the main source directory. Do not hardcode magic constants in your code. Every magic string or a number should be put into accompanying file_config.py with symbol name beginning by cfg_modulename_*. Clearly separate interfaces from implementation. Document your interfaces. Do not expose to other modules anything that does not have to be exposed. Apply principle of least information. Create as few new library files as possible. Do not create many nested files in nested modules; rather put all the lib files in one dir with bibindex_foo and bibindex_bar names. Use imperative/functional paradigm rather then OO. If you do use OO, then stick to as simple class hierarchy as possible. Recall that method calls and exception handling in Python are quite expensive. Use rather the good old foo_bar naming convention for symbols (both variables and function names) instead of fooBar CaMelCaSe convention. (Except for Class names where UppercaseSymbolNames are to be used.) Pay special attention to name your symbols descriptively. Your code is going to be read and work with by others and its symbols should be self-understandable without any comments and without studying other parts of the code. For example, use proper English words, not abbreviations that can be misspelled in many a way; use words that go in pair (e.g. create/destroy, start/stop; never create/stop); use self-understandable symbol names (e.g. list_of_file_extensions rather than list2); never misname symbols (e.g. score_list should hold the list of scores and nothing else - if in the course of development you change the semantics of what the symbol holds then change the symbol name too). Do not be afraid to use long descriptive names; good editors such as Emacs can tab-complete symbols for you. When hacking module A, pay close attention to ressemble existing coding convention in A, even if it is legacy-weird and even if we use a different technique elsewhere. (Unless the whole module A is going to be refactored, of course.) Speed-critical parts should be profiled with pyprof. Do not forget to use tricks like psyco. The code should be well tested before committed. Testing is an integral part of the development process. Test along as you program. The testing process should be automatized via our unit test and regression test suite infrastructures. Please read the test suite strategy to know more. Python promotes writing clear, readable, easily maintainable code. Write it as such. Recall Albert Einstein's ``Everything should be made as simple as possible, but not simpler''. Things should be neither overengineered nor oversimplified. Recall principles Unix is built upon. As summarized by Eric S. Reymond's TAOUP: Rule of Modularity: Write simple parts connected by clean interfaces. Rule of Clarity: Clarity is better than cleverness. Rule of Composition: Design programs to be connected with other programs. Rule of Separation: Separate policy from mechanism; separate interfaces from engines. Rule of Simplicity: Design for simplicity; add complexity only where you must. Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do. Rule of Transparency: Design for visibility to make inspection and debugging easier. Rule of Robustness: Robustness is the child of transparency and simplicity. Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust. Rule of Least Surprise: In interface design, always do the least surprising thing. Rule of Silence: When a program has nothing surprising to say, it should say nothing. Rule of Repair: Repair what you can -- but when you must fail, fail noisily and as soon as possible. Rule of Economy: Programmer time is expensive; conserve it in preference to machine time. Rule of Generation: Avoid hand-hacking; write programs to write programs when you can. Rule of Optimization: Prototype before polishing. Get it working before you optimize it. Rule of Diversity: Distrust all claims for one true way. Rule of Extensibility: Design for the future, because it will be here sooner than you think. or the golden rule that says it all: ``keep it simple''. For more hints, thoughts, and other ruminations on programming, see my CDS Invenio Wiki. 5. PHP We are moving slowly away out of PHP so that there may be several practices in place with the PHP code present in CDS Invenio. Usually this is consistent within modules but inconsistent across modules. For example, some old code used Emacs' perl-mode, following traditional K&R C style, while some other old code tried to stick to PEAR recommendations. 6. MySQL Table naming policy is, roughly and briefly: - "foo": table names in lowercase, without prefix, used by me for WebSearch - "foo_bar": underscores represent M:N relationship between "foo" and "bar", to tie the two tables together - "bib*": many tables to hold the metadata and relationships between them - "idx*": idx is the table name prefix used by BibIndex - "rnk*": rnk is the table name prefix used by BibRank - "flx*": flx is the table name prefix used by FlexElink (also known as BibFormat) - "sbm*": sbm is the table name prefix used by WebSubmit - "sch*": sch is the table name prefix used by BibSched - "collection*": many tables to describe collections and search interface pages - "user*" : many tables to describe personal features (baskets, alerts) - end of file -diff --git a/modules/webhelp/web/hacking/common-concepts.webdoc b/modules/webhelp/web/hacking/common-concepts.webdoc index 65dcbfbe0..fc11bfddc 100644 --- a/modules/webhelp/web/hacking/common-concepts.webdoc +++ b/modules/webhelp/web/hacking/common-concepts.webdoc @@ -1,118 +1,116 @@ ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - -
The description of concepts you will encounter here and there in the CDS Invenio. Our interpretation may differ from the practice found in other products, so please read this carefully. 1. sysno - (ALEPH|old) system number Stands for (ALEPH|old) system number only. Which means that, for outside-CERN CDS Invenio installations, stands for an 'old system number' whatever it is, if they want to publicise it instead of our internal auto-incremented CDS Invenio record identifiers. 2. recID - CDS Invenio record identifier Each record has got an auto-incremented ID in the "bibrec" table (formerly called "bibitem"). This is the basic "record identifier" concept in CDS Invenio. 3. docID - eventual fulltext document identifier Each fulltext file may have eventual docID. This will permit us to interconnect records (recID) with fulltext files (docID), if we want to. At the moment there is only one-way connection from recID to docID via HTTP field 856. This is ugly. I think we may probably profit by introducing recID-docID relationship in several ways: file protection, reference extraction, fulltext indexing... (?!) 4. field - logical field concept such as "reportnumber" A bibliographic record is composed of 'fields' such as title or author. Note that we consider 'field' to be a logical concept, that is compound and may consist of several physical MARC fields. For example, "report number" field consists of several MARC fields such as 088 $a, 037 $a, 909C0 $r. Another example: "first report number" consist of only one MARC field, 037 $a. 5. tag - physical field concept such as "088 $a". Having defined the concept of 'logical field', let's now turn to the 'physical field' that denotes basically the concept of 'MARC field' as defined in MARC-21 standard. In addition to tag, a field may contain two identifiers to describe the data content, and subfield codes to denote various parts of the content. See our HOWTO MARC guide on this. Thus said, in the implementation of our bibliographic tables (bibXXx) we have sort of generalized the term 'tag' to stand for: tag = tag code + identifier1 + identifier1 + subfield code This convention, while taking some freedom from the MARC-21 standard, enables us to write things like "field: base number, tag: 909C0b, value: 11". If this interpretation is indeed too free with respect to the standard usage of terms, we may change them in the future. 6. collection - here we distinguish (i) primary collection concept and (ii) specific collection concept. The (i) primary collections are basic organizational structure of how the records are grouped together in collections. The primary collections are used in the navigable search interface under the ``Narrow search'' box. The (ii) specific collections present an orthogonal view on the data organization, that is useful to group together some records from different primary collections, if they present a common pattern. The specific collections are used in the search interface under the ``Focus on'' box. The primary collections are defined mainly by the collection identifier ("980 $a,b"); and the specific collections are as defined by any query that is possible for a search engine to execute (see also "dbquery" column in the "collection" table). In the past we used to use the term "catalogue", that is now deprecated, and that can be interchanged with "collection". 7. doctype - stands for web document type concept, used in WebSubmit The "document type" is used solely for submission purposes, and fulltext access purposes ("setlink"-like). For example, a document type "photo" may be used in many collections such as "Foo Photos", "Bar PhotoLab", etc. Similarly, one collection can cover several doctypes. (M:N relationship) 8. baskets, alerts, settings - covering personal features Denote personal features, for which we previously used the terms "shelf" and "profile" that are now deprecated. - end of file -diff --git a/modules/webhelp/web/hacking/directory-organization.webdoc b/modules/webhelp/web/hacking/directory-organization.webdoc index ccf238415..89d06b5e1 100644 --- a/modules/webhelp/web/hacking/directory-organization.webdoc +++ b/modules/webhelp/web/hacking/directory-organization.webdoc @@ -1,201 +1,199 @@ ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - -
Please find some notes below on how the source (as well as the target) directory structure is organized, where the sources get installed to, and how the visible URLs are organized. 1. CDS Invenio will generally install into the directory taken from --with-prefix configuration variables. These are discussed in points 2 and 3 below, respectively. 2. The first directory (--with-prefix) specifies general CDS Invenio install directory, where we'll put CLI binaries, Python and PHP libraries, manpages, log and cache directories for the running installation, and any other dirs as needed. They will all live under one common hood. For example, configure --with-prefix=/opt/cds-invenio, and you'll obtain the following principal directories: /opt/cds-invenio/ /opt/cds-invenio/bin /opt/cds-invenio/lib /opt/cds-invenio/lib/php /opt/cds-invenio/lib/python /opt/cds-invenio/lib/wml /opt/cds-invenio/var /opt/cds-invenio/var/cache /opt/cds-invenio/var/log with the obvious meaning: - bin : for command-line executable binaries and scripts - lib/php : for our own PHP libraries, see below - lib/python : for our own Python libraries, see below - lib/wml : for our own WML libraries, see below - var : for installation-specific runtime stuff - var/log : for all sorts of runtime logging, e.g. search.log - var/cache : for all sorts of runtime caching, e.g. OAI retention harvesting, collection cache, etc This scheme copies to some extent the usual Unix filesystem convention, so it may be easily expanded later according to our future needs. 3. The second directory (prefix/var/www) contains Web scripts (PHP, mod_python), HTML documents and images, and so on. This is where webuser-seen files are located. Basically, the files there contain only the interface to the functionality that is provided by the libraries stored under the library directory. The prefix/var/www directory is further structured according to whom it provides services. We distinguish user-level, admin-level and hacker-level access to the site, as reflected by the visible URL structure. a) The user-level access point is provided by the main WEBURL address and its subdirs. All the user-level documentation is available under WEBURL/help/. The module-specific user-level documentation is available under WEBURL/help/<module>/. b) The admin-level access is provided by WEBURL/admin/ entry point. The admin-level documentation is accessible from the same place. The admin-level module-specific functionality and help is available under WEBURL/admin/<module>/. (If it's written in mod_python, it usually points to WEBURL/<module>admin.py/ since we configure the server to have all mod_python scripts under the prefix/var/www root directory.) c) The hacker-level documentation is provided by WEBURL/hacking/ entry point. There is no hacker-level functionality possible via Web, of course, so that unlike admin-level entry point, the hacker-level entry point provides only a common access to available hacking documention, etc. The module-specific information is available under WEBURL/hacking/<module>/. 4. Let's now return a bit more closely to the role Python and PHP library directories outside of the Apache tree: /opt/cds-invenio/lib/php /opt/cds-invenio/lib/python Here we put not only (a) libraries that may be reused across CDS Invenio modules, but also (b) all the "core" functionality of CDS Invenio that is not directly callable by the end users. The "callable" functionality is put under "prefix/var/www" in case of web scripts and documents, and under "bindir" in case of CLI executables. As for (a), for example in the PHP CDS Invenio library you'll find currently the common PHP error handling code that is shared between BibFormat and WebSubmit; in the Python CDS Invenio library (in fact, CDS Invenio Pythonic 'module', but we are reserving the word 'module' to denote 'CDS Invenio module' in this text) you'll find config.py containing WML-supplied site parameters, dbquery.py containing DB persistent query module, or webpage.py with templates and functions to produce mod_python web pages with common look and feel. These could and should be reused across all our modules. Note that I created only a small number of "broad" libraries at the moment. In case we want to reuse more code parts, we'd refactor the code more, as needed. As for (b), for example the existing search engine was split into search.py that only contains three "callable" functions, which goes into prefix/var/www, while the search engine itself is composed of search_engine.py and search_engine_config.py living under LIBDIR. In this way we can easily create "real" CLI search, that will depend only on the search libraries in LIBDIR, and that will get installed into BINDIR. To recap: - For each CDS Invenio module, I'm differentiating between "callable" and "core" parts. The former go into prefix/var/www or BINDIR, the latter into LIBDIR. - Our PHP/Pythonic libraries contain several sorts of thing: - the implementation of the "callable" functions - non-callable internal "core" or "library" code parts, as stated above. Not shared across CDS Invenio modules. - utility code meant for reuse across CDS Invenio modules, such as dbquery.py - Pythonic config files out of user-supplied WML (non-MySQL) configuration parameters (see e.g. search_engine_config.py) 5. The same strategy is reflected in the organization of source directories inside CDS Invenio CVS. Each CDS Invenio module lives in a separate directory located under "modules" directory of the sources. Further on, each module contains usually several subdirectories that reflect the above-mentioned packaging choice. For example, in case of WebSearch you'll find: ./modules/websearch ./modules/websearch/bin ./modules/websearch/doc ./modules/websearch/doc/hacking ./modules/websearch/doc/admin ./modules/websearch/lib ./modules/websearch/web ./modules/websearch/web/admin with the following straightforward meaning: - bin : for callable CLI binaries and scripts - doc : for documentation. The user-level documentation is located in this directory. The admin-level documentation is located in the "admin" subdir. The programmer-level documentation is located in the "hacking" subdir. - lib : for uncallable "core" functionality, see the comments above - web : for callable web scripts and pages. The user- and admin- level is separated similarly as in the "doc" directory (see above). The structure is respected throughout all the CDS Invenio modules, a notable exception being the MiscUtil module that contains subdirs like "sql" (for the table creating/dropping SQL commands, etc) or "demo" (for creation of Atlantis Institute of Science, our demo site.) - end of file -diff --git a/modules/webhelp/web/hacking/modules-overview.webdoc b/modules/webhelp/web/hacking/modules-overview.webdoc index 0b160ff90..64f9fd671 100644 --- a/modules/webhelp/web/hacking/modules-overview.webdoc +++ b/modules/webhelp/web/hacking/modules-overview.webdoc @@ -1,274 +1,273 @@ ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - - +
CDS Invenio consists of several more or less independent modules with precisely defined functionality. The general criterion for module names is to use the ``Bib'' prefix to denote modules that work more with the bibliographic data, and the ``Web'' prefix to denote modules that work more with the Web interface. (The difference is of course blurred in some cases, as in the case of search engine that has got a web interface but searches bibliographic data.)
Follows a brief description of what each module does. After descriptions the module relationship diagram is presented.
Relationship between the modules:
CDS Invenio uses the classical major.minor.patchlevel release version numbering scheme that is commonly used in the GNU/Linux world and elsewhere. Each release is labelled by major.minor.patchlevel release version number. For example, a release version 4.0.1 means: 4 - 4th major version, i.e. the whole system has been already 4th times either fully rewritten or at least in its very essential components. The upgrade from one major version to another may be rather hard, may require new prerequisite technologies, full data dump, reload and reindexing, as well as other major configuration adapatations, possibly with an important manual intervention. 0 - 0th minor version, i.e. the first minor release of the 4th major rewrite. (Increments go 4.1, 4.2, ... 4.9, 4.10, 4.11, 4.12, ... until some important rewrite is done, e.g. the database philosophy dramatically changes, leading to a non-trivial upgrade, and we have 5.0.) The upgrade from one minor version to another may be laborious but is relatively painless, in that some table changes and data manipulations may be necessary but they are somewhat smaller in nature, easier to grasp, and possibly done by an automated script. 1 - 1st patch level to 4.0, fixing bugs in 4.0.0 but not adding any substantially new functionality. That is, the only new functionality that is added is that of a `bug fix' nature. The upgrade from one patch level to another is usually straightforward. (Packages often seem to break this last rule, e.g. Linux kernel adopting new important functionality (such as ReiserFS) within the stable 2.4.x branch. It can be easily seen that it is somewhat subjective to judge what is qualitatively more like a minor new functionality and what is more like a patch to the existing behaviour. We have tried to quantify these notions with respect to whether table structure and/or technology change require small or large upgrade jobs and eventual manual efforts.) So, if we have a version 4.3, a bug fix would mean to release 4.3.1, some minor new functionality and upgrade would mean to release 4.4, some important database structure rewrite or an imaginary exchange of Python for Common Lisp would mean to release 5.0, etc. In addition, the two-branch release policy is adopted: a) stable branch - releases in the stable branch are numbered with even minor version number, like 0.2, 0.4, etc. These releases are usually well tested. The configuration files and features usually don't change often from one release to another. The release frequency is low. b) development branch - releases in the development branch are number with the odd minor version number, like 0.1, 0.3, etc. These releases are more experimental and may be less tested than the stable ones. The configuration files and features change more rapidly from one release to another. The release frequency is higher. It can be seen that the above scheme is somewhat similar to the Linux kernel version numbering scheme. Currently, CDS Invenio 0.0.9 represents the stable branch release and 0.1.0 the development branch release. We are going to frequently update it to provide 0.1.1, 0.1.2, etc as the currently missing admin functionality is being added into the development branch, until later on, when some release, say 0.1.8, will achieve a status of satisfaction, at which point we release it as the next stable version (0.2 or 1.0), and start a new development branch (0.3 or 1.1). - end of file -diff --git a/modules/webhelp/web/hacking/test-suite.webdoc b/modules/webhelp/web/hacking/test-suite.webdoc index 4d9d7abae..73e0d7b3b 100644 --- a/modules/webhelp/web/hacking/test-suite.webdoc +++ b/modules/webhelp/web/hacking/test-suite.webdoc @@ -1,521 +1,519 @@ ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - -
This documents presents guidelines for unit testing and regression testing homogenisation throughout all CDS Invenio modules.
Testing is an important coding activity. Most authors believe that writing test cases should take between 10% and 30% of the project time. But, even with such a large fraction, don't put too much belief on such a testing. It cannot find bugs that aren't tested for. So, while testing is an important activity inherent to safe software development practices, it cannot become a substitute for pro-active bug hunting, source code inspection, and bugfree-driven development approach from the start.
Testing should happen alongside with coding. If you write a function, immediately load it into your toplevel, evaluate its definition, and call it for a couple of arguments to make sure the function works as expected. If not, then change the function definition, re-evaluate it, re-call it, etc. Dynamic languages with interactive toplevel such as Common Lisp or Python makes this easy for you. Dynamic redefinition capabilities (full in Common Lisp, partial in Python) are very programmer-friendly in this respect. If your test cases are interesting to be kept, then keep them in a test file. (It's almost all the time a good idea to store them in the test file, since you cannot predict whether you won't want to change something in the future.) We'll see below how to store your tests in a test file.
When testing, it is nice to know some rules of thumb, like: check your edge cases (e.g. null array), check atypical input values (e.g. laaarge array instead of typically 5-6 elements only), check your termination conditions, ask whether your arguments have already been safe-proofed or whether it is in your mandate to check them, write a test case for each `if-else' branch of the code to explore all the possibilites, etc. Another interesting rule of thumb is the bug frequency distribution. Experience has shown that the bugs tend to cluster. If you discover a bug, there are chances that other bugs are in the neighborhood. The famous 80/20 rule of thumb applies here too: about 80% of bugs are located in about 20% of the code. Another rule of thumb: if you find a bug caused by some coding practice pattern thay may be used elsewhere too, look and fix other pattern instances.
In a nutshell, the best advice to write bug-free code is: think ahead. Try to prepare in advance for unusual usage scenarios, to foresee problems before they happen. Don't rely on typical input and typical usage scenarios. Things have a tendency to become atypical one day. Recall that testing is necessary, but not sufficient, to write good code. Therefore, think ahead!
Core functionality, such as the hit set intersection for the search engine, or the text input manipulating functions of the BibConvert language, should come with a couple of test cases to assure proper behaviour of the core functionality. The test cases should cover typical input (e.g. hit set corresponding to the query for ``ellis''), as well as the edge cases (e.g. empty/full hit set) and other unusual situations (e.g. non-UTF-8 accented input for BibConvert functions to test a situation of different number of bytes per char).
The test cases should be written for most important core functionality. Not every function or class in the code is to be throughly tested. Common sense will tell.
Unit test cases are free of side-effects. Users should be able to run them on production database without any harm to their data. This is because the tests test ``units'' of the code, not the application as such. If the behaviour of the function you would like to test depends on the status of the database, or some other parameters that cannot be passed to the function itself, the unit testing framework is not suitable for this kind of situation and you should use the regression testing framework instead (see below).
For more information on Pythonic unit testing, see the documentation to the unittest module at http://docs.python.org/lib/module-unittest.html. For a tutorial, see for example http://diveintopython.org/unit_testing/.
Each core file that is located in the lib directory (such as the
webbasketlib.py
in the example above) should come with a
testing file where the test cases are stored. The test file is to be
named identically as the lib file it tests, but with the suffix
_tests
(in our example,
webbasketlib_tests.py
).
The test cases are written using Pythonic unittest TestCase class. An example for testing search engine query parameter washing function:
$ cat /opt/cds-invenio/lib/python/invenio/search_engine_tests.py [...] import search_engine import unittest class TestWashQueryParameters(unittest.TestCase): """Test for washing of search query parameters.""" def test_wash_url_argument(self): """search engine washing of URL arguments""" self.assertEqual(1, search_engine.wash_url_argument(['1'],'int')) self.assertEqual("1", search_engine.wash_url_argument(['1'],'str')) self.assertEqual(['1'], search_engine.wash_url_argument(['1'],'list')) self.assertEqual(0, search_engine.wash_url_argument('ellis','int')) self.assertEqual("ellis", search_engine.wash_url_argument('ellis','str')) self.assertEqual(["ellis"], search_engine.wash_url_argument('ellis','list')) self.assertEqual(0, search_engine.wash_url_argument(['ellis'],'int')) self.assertEqual("ellis", search_engine.wash_url_argument(['ellis'],'str')) self.assertEqual(["ellis"], search_engine.wash_url_argument(['ellis'],'list')) [...]
In addition, each test file is supposed to define a
create_test_suite()
function that will return test suite
with all the tests available in this file:
$ cat /opt/cds-invenio/lib/python/invenio/search_engine_tests.py [...] def create_test_suite(): """Return test suite for the search engine.""" return unittest.TestSuite((unittest.makeSuite(TestWashQueryParameters,'test'), unittest.makeSuite(TestStripAccents,'test'))) [...]
This will enable us to later include this file into
testsuite
executable:
$ cat ~/src/cds-invenio/modules/miscutil/bin/testsuite.in [...] from invenio import search_engine_tests from invenio import bibindex_engine_tests def create_all_test_suites(): """Return all tests suites for all CDS Invenio modules.""" return unittest.TestSuite((search_engine_tests.create_test_suite(), bibindex_engine_tests.create_test_suite())) [...]
In this way, all the test cases defined in the file
search_engine_tests.py
will be executed when the global
testcase
executable is called.
Note that it may be time-consuming to run all the tests in one go.
If you are interested in running tests only on a certain file (say
search_engine_tests.py
), then launch:
$ python /opt/cds-invenio/lib/python/invenio/search_engine_tests.py
For full-scale examples, you may follow
search_engine_tests.py
and other
_tests.py
files in the source distribution.
CDS Invenio test suite can be run in the source directory:
or anytime after the installation:$ make test
The ``testsuite'' executable will run all available unit tests provided with CDS Invenio.$ /opt/cds-invenio/bin/testsuite
The informative output is of the form:
In case of problems you will see failures like:$ make test CDS Invenio v0.3.2.20040519 test suite results: =========================================== search engine washing of query patterns ... ok search engine washing of URL arguments ... ok search engine stripping of accented letters ... ok bibindex engine list union ... ok ---------------------------------------------------------------------- Ran 4 tests in 0.121s OK
CDS Invenio v0.3.2.20040519 test suite results: =========================================== search engine washing of query patterns ... FAIL search engine washing of URL arguments ... ok search engine stripping of accented letters ... ok bibindex engine list union ... ok ====================================================================== FAIL: search engine washing of query patterns ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/cds-invenio/lib/python/invenio/search_engine_tests.py", line 25, in test_wash_pattern self.assertEqual("ell*", search_engine.wash_pattern('ell*')) File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual raise self.failureException, \ AssertionError: 'ell*' != 'ell' ---------------------------------------------------------------------- Ran 4 tests in 0.091s FAILED (failures=1)
The test suite compliance should be checked before each CVS commit. (And, obviously, double-checked before each CDS Invenio release.)
In addition to the above-mentioned unit testing of important functions, a regression testing should ensure that the overall application functionality is behaving well and is not altered by code changes. This is especially important if a bug had been previously found. Then a regression test case should be written to assure that it will never reappear. (It also helps to scan the neighborhood of the bug, or the whole codebase for occurrences of the same kind of bug, see the 80/20 thumb rule cited above.)
Moreover, the regression test suite should be used when the functionality of the item we would like to test depends on extra-parametrical status, such as the database content. Also, the regression framework is suitable for testing the web pages overall behaviour. (In extreme programming, the regression testing is called acceptance testing, the name that evolved from previous functionality testing.)
Within the framework of the regression test suite, we have liberty to alter database content, unlike that of the unit testing framework. We can also simulate the web browser in order to test web applications.
As an example of a regression test, we can test whether the web pages are alive; whether searching for Ellis in the demo site produces indeed 12 records; whether searching for aoeuidhtns produces no hits but the box of nearest terms, and with which content; whether accessing the Theses collection page search prompts an Apache password prompt; whether the admin interface is really accessible only to admins or also to guests, etc.
For more information on regression testing, see for example http://c2.com/cgi/wiki?RegressionTesting.
Regression tests are written per application (or sub-module) in
files named like websearch_regression_tests.py
or
websubmitadmin_regression_tests.py
.
When writing regression tests, you can assume that the site is in the fresh demo mode (Atlantis Institute of Fictive Science). You can also safely write not only database-read-only tests, but you can also safely insert/update/delete into/from the database whatever values you need for testing. Users are warned prior to running the regression test suite about its possibly destructive side-effects. (See below.) Therefore you can create users, create user groups, attach users to groups to test the group joining process etc, as needed.
For testing web pages using GET arguments, you can take advantage of the following helper function:
$ cat /opt/cds-invenio/lib/python/invenio/testutils.py [...] def test_web_page_content(url, username="guest", expected_text="