refextract: Increased the number of recognised author formats
- Added 'surname [and surname] et al' recognition (et al must be present)
- Improved underscore author text validation (escapes all tags and all
tagged content now, rather than just titles). Completely removes
the change that part of tagged text (or a tag itself) is seen as an
author.
- Improved author split/dump heuristics (will dump into misc if two
author groups are found in a row, with minimal misc text between them)
- Added some more test reference lines
- Added comments to some methods (still need to complete this)