plotextractor: support LaTeX context extraction
- Added context extraction to plots referenced in LaTeX sources. The context for each image is a text file attached as an hidden subformat of the image. Closes #303.
- The plotextractor will now treat plots with no captions or caption length less then a certain amount as miscellaneous plots. These will be uploaded as PlotMisc doctype with HIDDEN parameter.
- Added a configuration file which contains context-extraction parameters and some relevant global parameters.
- The test-cases for this module are now actually passing.
- Added and fixed some CLI options:
- '-l' '--refno-url=' lets you specify a URL to the Invenio instance you would like to retrieve refno from. I.e. attach a record id to the resulting MARCXML for upload.
- '-c' '--clean' will remove any intermediate files used during extraction, leaving only generated files and converted images.
- '-k' '--skip-refno' allows you to skip any refno check
- '-a' '--arXiv=' is now working properly
- Streamlined the API for extracting and harvesting single tarballs. (I.e. for use in harvesting workflow in BibHarvest)
- Fixed some minor bugs and refactored some code throughout the module. This includes fixes for pylint related issues.