Homec4science

refextract: improved author recognition

Description

refextract: improved author recognition

  • Improves author recognition (increased the number of recognised author formats), added comments.
  • Added 'surname [and surname] et al' recognition (et al must be present)
  • Improved underscore author text validation (escapes all tags and all tagged content now, rather than just titles). Completely removes the change that part of tagged text (or a tag itself) is seen as an author.
  • Improved author split/dump heuristics (will dump into misc if two author groups are found in a row, with minimal misc text between them)
  • Added some more test reference lines
  • Added comments to some methods (still need to complete this)

Details

Committed
Tibor Simko <tibor.simko@cern.ch>Nov 23 2011, 00:27
Parents
R3600:3a38615ce2d7: refextract: further changes
Branches
Unknown
Tags
Unknown

Event Timeline

Tibor Simko <tibor.simko@cern.ch> committed R3600:1286b5a3bdae: refextract: improved author recognition (authored by Christopher Hayward <christopher.james.hayward@cern.ch>).Nov 23 2011, 00:27