refextract: add DOI recognition functionality
- refextract is now able to identify DOI numbers inside a citation and correctly markup a found DOI into the new 'a' subfield. (closes #245)
- Improved some comments. Created checks against spaces at the end of raw reference lines which are being processed.
Will now add a space at the end of raw references, if there is a need to concatenate two reference lines as a result of bad pdf2txt parsing. The space will only be added if a space does not already exist for the base raw reference.
- Removed some unused variables such as misc_text_dict.
- (Merge note: this commit may not split references properly when there is more than one report number. This will be addressed properly in the next refextract branch recognizing author parts.)