refextract: improve author/affiliation extraction
6b79ae83551a
Actions

Authored by Christopher Hayward <christopher.james.hayward@cern.ch> on Jan 27 2011, 18:07.

Description

refextract: improve author/affiliation extraction

Added extra 'end of author section keyword'. Added a new 'bad unicode' character, and made sure to check for bad characters in both author lines and affiliation lines.

Improve overall program flow of handling requests to extract authors or affiliations.

Use both author matching and keyword matching when locating the top section of a document, in both author extraction and affiliation extraction modes. Plus other improvements.

Add a comment explaining why a top section would not be found, and remove some redundant spaces between statements.

Committed

Tibor Simko <tibor.simko@cern.ch>

Nov 23 2011, 00:32

Parents

R3600:095c23eb729f: refextract: attempt at integrating giva

Branches

Unknown

Tags

Unknown

Tibor Simko <tibor.simko@cern.ch> committed R3600:6b79ae83551a: refextract: improve author/affiliation extraction (authored by Christopher Hayward <christopher.james.hayward@cern.ch>).Nov 23 2011, 00:32

				Path
	M			modules/bibedit/lib/refextract.py