Homec4science

BibMatch: convert to simple search method

Authored by Jan Aage Lavik <jan.age.lavik@cern.ch> on Feb 25 2011, 16:28.

Description

BibMatch: convert to simple search method

  • BibMatch now defaults to simple search method instead of advanced search (fixes #255):
    • -q, --query-string option now expects an Invenio search query or a query-template, it is also backwards compatible with old-style querystrings.
    • When -m, --mode is specified the search will fall back to advanced search in given mode, like before.
    • All queries are now performed with retries, as introduced in invenio_connector.
    • Added support for cases where several datafields of the same type exists in the records. The different values for each field will generate unique search queries using the Cartesian product of all the values in the query.
    • Introduced the concept of "query-completeness" which indicates if all fields specified in matching query can exists in input records. If a particular field cannot be found, the match is deemed "incomplete", which will result in a fuzzy/ambiguous/non match regardless.
    • Fuzzy matching has been extended to perform set intersection of results for every field specified in matching query. Various options for the fuzzy matching can be tweaked in invenio.conf.
    • If number of search results exceed 10 it will be treated as a non-match.
    • Added option to sanitize final queries WRT double quotes.
    • The Querystring class have been completely revamped to conform with these new changes.
  • Batch output files suffixes are now named instead of numbered (i.e. filename.matched instead of filename.0).
  • invenio.conf has been populated with various configuration options to BibMatch. These options are mostly related to fuzzy matching, but query-templates are also maintained there.
  • Updated regression tests and documentation.
  • Several other minor additions and fixes.

Details

Committed
Tibor Simko <tibor.simko@cern.ch>Mar 9 2011, 19:17
Parents
R3600:56e412355d1c: InvenioConnector: add search with retries
Branches
Unknown
Tags
Unknown

Event Timeline

Tibor Simko <tibor.simko@cern.ch> committed R3600:d454f408cc54: BibMatch: convert to simple search method (authored by Jan Aage Lavik <jan.age.lavik@cern.ch>).Mar 9 2011, 19:17