classifier: support colons in file paths
- FIX Properly handles file paths containing a colon (:), avoiding bad text extraction that causes (1) wrong results and (2) much slower execution.
- Improves the reporting of problems in the ontology.
- Removes check if PDF text is English as it is irrelevant.
- Refactors a bit the code to download remote files.
Signed-off-by: Jan Aage Lavik <jan.age.lavik@cern.ch>