DocExtract: multiple fixes
- Fixes for --raw-references for refextract.
- Handle extra vars for taks_run_core.
- Adds missing import.
- Fixes arXiv records selection.
- Fixes storing last run date for arXiv records.
- Handles PoS(LAT2005)239.
- Removes B from volume for nucl.phys.proc.suppl.
- Adds math-ph to arXiv prefixes.
- Never re-extract curated records.
- Does not default to page 1 for special journals.
- When the page looks like a year (19xx or 20xx) then converts the reference to misc text.
- Remove citation splitting heuristics by authors.
- New way of splitting references.
- Extracts year from references.
- Creates a subfield "y" with the year when it is known.
- Reworks arXiv report numbers processing.
- Adds CMS report numbers.
- Adds CERN-2004-003 to recognized reports.
- Does not default to page 1 when looking for numerotation.
- Removes use of all and any (for Python 2.4 compatibility)
- Removes extra write_message.
- Handles Phys.Lett. 100B (1981), 117.
- Discards references with many lines.
- Handles refs with only a report number.
- Fixes [6] ATL-PHYS-INT-2009-110 reportnumber detection.
- Adds test for recognizing a journal/reportnumber/doi alone.
- Bump refextract version.
- Fixes references splitting.
- Fixes refs extraction from string.
- Handles refs that do not start at the beginning of a line.
- Increases allowed refs len to 9 lines.
- Fixes report numbers replacing.
- Fixes report number kb format.
- Increases max number of lines for a reference. (closes #966)
- Handles figures within refs.
- Adds RT tickets info to logs,
- Fixes extra vars passing to tasks.
- Updates stats message.
- Updates refextract tmp filename in inveniogc.
- Fixes -c option.
- Forces Inspire format for tests.
- Fixes api unit tests.
- Fixes for printing refs.
- Extract collaboration into $$c subfield if CFG_INSPIRE_SITE=1. (closes #958)
Conflicts:
modules/docextract/lib/docextract_task.py