Homec4science

refextract: Addition of author knowledge base

Description

refextract: Addition of author knowledge base

  • A file 'refextract-authors.kb' has been created which can hold, line

by line, the set of authors which should always be recognised as such,
and therefore included into the $h subfields. Refextract reads this, and
converts all lines into a single regex, which includes the necessary
space characters and options. Authors found using this regex do not
effect the splitting choices for a reference.

  • 'and's found at the start of an author group will only be marked as

'bad' (dirty) if there exists only one author after it. An author group
of more than one author will likely mean that this leads into a new
reference, and as such should be recognised so that the reference is
properly split and/or marked up.

  • Author groups ending with semi-colons have been catered for, and in such

a case, the semi-colon is placed back into misc text so that it can
indicate any possible citation splits.

  • Added more comments, as well as some further config variables (to

hold 'h' for authors)

  • Two dangerous journal title regex's have been removed. They were breaking

authors who had initials of the form A. A. or, A. J., as well as
inserting 'Astron. J' or 'Astron. Astrophys.' into the reference without
due care.

  • Added the refextract-authors.kb to Makefile.am

Event Timeline

Christopher Hayward <christopher.james.hayward@cern.ch> committed R3600:06a4b473d45b: refextract: Addition of author knowledge base (authored by Christopher Hayward <christopher.james.hayward@cern.ch>).Feb 3 2011, 18:08