BibMatch: match validation
- Adds a new sub-module for comparing records after searching for potentially matching records, called the match validation step. (fixes #548)
- Various methods are used when comparing records, for example special metrics for comparing authors, titles and identifiers.
These comparison methods are configurable per (sub-)field and acts as rules for matching records. These rules can be grouped in rulesets using regular expressions, allowing records to be compared differently based on content. (fixes #183) - For an exact match to happen all defined comparison rules must succeed. If they do not all succeed, but the ratio of success is above a certain (configurable) limit, the match is considered fuzzy. Two or more matching fields MUST be found, unless certain MARC fields have been configured as 'final' or 'joker' types, i.e. identifier fields such as DOI or ISBN.
- Another configurable is added to control the limit of maximum number of search results to compare for a single search query.
- Various methods are used when comparing records, for example special metrics for comparing authors, titles and identifiers.
- Both match validation and fuzzy searching are toggleable using the CLI commands '--no-valid' and '--no-fuzzy' respectively.
- New command available, '--ascii', for transliterating record values to ASCII before being used in searching and matching. XML entities, like &, are transformed to UTF-8 before searches.
- Adds a configuration module specific for BibMatch internal globals.
- Enables automatic logging of BibMatch runs, providing information about record matching results.
- Also adds applicable regression tests, a new unit-test module and brand new admin and hacking guides.
- Detects if any input records are badly parsed by BibRecord.