Fixed issues #101, #103, #115
Issue #101 Write Pig UDF to extract plain text from PDFs
Issue #103 Look into building Warcbase bindings for Spark
Issue #115 Update README
lintool <jimmylin@umd.edu> | May 24 2015, 20:56 |
dportabella | Oct 19 2016, 16:29 |
Commit | Author | Details | Committed | |||
---|---|---|---|---|---|---|
04dffbc5db7d | lintool | Merge branch 'patch-1' of github.com:machawk1/warcbase into feature-integration | May 24 2015 | |||
4b0b8f61c298 | lintool | Merge branch 'spark-integration-notes' into feature-integration | May 24 2015 | |||
1d7b3a0e0451 | lintool | Updated README about Spark integration. | May 24 2015 | |||
eec01a41b18e | lintool | Cleanup. | May 24 2015 | |||
1c9b25027e73 | lintool | Wraps a try block around everything to catch all errors. | May 24 2015 | |||
9acc8ca74c0a | lintool | Merge branch 'master' into extract-pdf-udf | May 24 2015 | |||
5ea3fbd756fb | Mat Kelly | Updated README removing text about "only ARC, not WARC" limits per #64 | May 21 2015 | |||
5c2540099370 | lintool | Reformatting. | Dec 18 2014 | |||
c49c58f76ed3 | rwolniak | ExtractTextFromPDFs updated | Dec 9 2014 | |||
cea1f0cae974 | rwolniak | Updated PIG Tika Parser with new code that should work. Awaiting test on Hadoop | Dec 9 2014 | |||
8a3fed9342fc | rwolniak | Still testing ExtractTextFromPDFs | Nov 12 2014 | |||
10b92d78f6ed | rwolniak | Extract text from PDF UDF. Currently still debugging | Nov 10 2014 | |||
7e659c0eb678 | Ryan Wolniak | Merge pull request #1 from lintool/master | Nov 10 2014 |