Homec4science

plotextractor: improves PDF harvesting from arXiv

Authored by Jan Aage Lavik <jan.age.lavik@cern.ch> on Feb 28 2012, 10:54.

Description

plotextractor: improves PDF harvesting from arXiv

  • Changes from urllib to urllib2 when downloading PDFs in order to take advantage of better error handling upon non-successful download.
  • Adds suffix '.pdf' to all PDF download URLs to arXiv to avoid the internal arXiv redirect from 'arXivID' to 'arXivID.pdf'.

Details

Committed
Tibor Simko <tibor.simko@cern.ch>Mar 6 2012, 18:33
Parents
R3600:244c11713270: Merge branch 'maint-1.0'
Branches
Unknown
Tags
Unknown

Event Timeline

Tibor Simko <tibor.simko@cern.ch> committed R3600:cad9673baa15: plotextractor: improves PDF harvesting from arXiv (authored by Jan Aage Lavik <jan.age.lavik@cern.ch>).Mar 6 2012, 18:33