refextract: improved author recognition
- Improves author recognition (increased the number of recognised author formats), added comments.
- Added 'surname [and surname] et al' recognition (et al must be present)
- Improved underscore author text validation (escapes all tags and all tagged content now, rather than just titles). Completely removes the change that part of tagged text (or a tag itself) is seen as an author.
- Improved author split/dump heuristics (will dump into misc if two author groups are found in a row, with minimal misc text between them)
- Added some more test reference lines
- Added comments to some methods (still need to complete this)