Homec4science

plotextractor: support LaTeX context extraction

Authored by Jan Aage Lavik <jan.age.lavik@cern.ch> on Oct 1 2010, 14:30.

Description

plotextractor: support LaTeX context extraction

  • Added context extraction to plots referenced in LaTeX sources. The context for each image is a text file attached as an hidden subformat of the image. Closes #303.
  • The plotextractor will now treat plots with no captions or caption length less then a certain amount as miscellaneous plots. These will be uploaded as PlotMisc doctype with HIDDEN parameter.
  • Added a configuration file which contains context-extraction parameters and some relevant global parameters.
  • The test-cases for this module are now actually passing.
  • Added and fixed some CLI options:
    • '-l' '--refno-url=' lets you specify a URL to the Invenio instance you would like to retrieve refno from. I.e. attach a record id to the resulting MARCXML for upload.
    • '-c' '--clean' will remove any intermediate files used during extraction, leaving only generated files and converted images.
    • '-k' '--skip-refno' allows you to skip any refno check
    • '-a' '--arXiv=' is now working properly
  • Streamlined the API for extracting and harvesting single tarballs. (I.e. for use in harvesting workflow in BibHarvest)
  • Fixed some minor bugs and refactored some code throughout the module. This includes fixes for pylint related issues.

Details

Event Timeline

Tibor Simko <tibor.simko@cern.ch> committed R3600:3054fb8f5266: plotextractor: support LaTeX context extraction (authored by Jan Aage Lavik <jan.age.lavik@cern.ch>).Dec 6 2010, 10:36