Homec4science

refextract: improve author/affiliation extraction

Description

refextract: improve author/affiliation extraction

  • Added extra 'end of author section keyword'. Added a new 'bad unicode' character, and made sure to check for bad characters in both author lines and affiliation lines.
  • Improve overall program flow of handling requests to extract authors or affiliations.
  • Use both author matching and keyword matching when locating the top section of a document, in both author extraction and affiliation extraction modes. Plus other improvements.
  • Enrich keyword kb.
  • Extend author name numeration to include brackets.
  • Update usage() to include '--affiliations' option.
  • Add a comment explaining why a top section would not be found, and remove some redundant spaces between statements.

Details

Committed
Tibor Simko <tibor.simko@cern.ch>Nov 23 2011, 00:32
Parents
R3600:095c23eb729f: refextract: attempt at integrating giva
Branches
Unknown
Tags
Unknown

Event Timeline

Tibor Simko <tibor.simko@cern.ch> committed R3600:6b79ae83551a: refextract: improve author/affiliation extraction (authored by Christopher Hayward <christopher.james.hayward@cern.ch>).Nov 23 2011, 00:32