Homec4science

refextract: add DOI recognition functionality

Description

refextract: add DOI recognition functionality

  • refextract is now able to identify DOI numbers inside a citation and correctly markup a found DOI into the new 'a' subfield. (closes #245)
  • Improved some comments. Created checks against spaces at the end of raw reference lines which are being processed.

    Will now add a space at the end of raw references, if there is a need to concatenate two reference lines as a result of bad pdf2txt parsing. The space will only be added if a space does not already exist for the base raw reference.
  • Removed some unused variables such as misc_text_dict.
  • (Merge note: this commit may not split references properly when there is more than one report number. This will be addressed properly in the next refextract branch recognizing author parts.)

Details

Committed
Tibor Simko <tibor.simko@cern.ch>Oct 15 2010, 16:08
Parents
R3600:4379ce4050e1: Merge branch 'maint'
Branches
Unknown
Tags
Unknown

Event Timeline

Tibor Simko <tibor.simko@cern.ch> committed R3600:ff97c0942af0: refextract: add DOI recognition functionality (authored by Christopher Hayward <christopher.james.hayward@cern.ch>).Oct 15 2010, 16:08