Homec4science

Fixed issues #101, #103, #115

Authored by lintool <jimmylin@umd.edu> on May 24 2015, 20:56.

Description

Fixed issues #101, #103, #115

Issue #101 Write Pig UDF to extract plain text from PDFs
Issue #103 Look into building Warcbase bindings for Spark
Issue #115 Update README

Event Timeline

lintool <jimmylin@umd.edu> committed R1473:46a6069f9b07: Fixed issues #101, #103, #115 (authored by lintool <jimmylin@umd.edu>).May 24 2015, 20:56

Merged Changes

CommitAuthorDetailsCommitted
04dffbc5db7dlintool
Merge branch 'patch-1' of github.com:machawk1/warcbase into feature-integration 
May 24 2015
4b0b8f61c298lintool
Merge branch 'spark-integration-notes' into feature-integration 
May 24 2015
1d7b3a0e0451lintool
Updated README about Spark integration. 
May 24 2015
eec01a41b18elintool
Cleanup. 
May 24 2015
1c9b25027e73lintool
Wraps a try block around everything to catch all errors. 
May 24 2015
9acc8ca74c0alintool
Merge branch 'master' into extract-pdf-udf 
May 24 2015
5ea3fbd756fbMat Kelly
Updated README removing text about "only ARC, not WARC" limits per #64 
May 21 2015
5c2540099370lintool
Reformatting. 
Dec 18 2014
c49c58f76ed3rwolniak
ExtractTextFromPDFs updated 
Dec 9 2014
cea1f0cae974rwolniak
Updated PIG Tika Parser with new code that should work. Awaiting test on Hadoop 
Dec 9 2014
8a3fed9342fcrwolniak
Still testing ExtractTextFromPDFs 
Nov 12 2014
10b92d78f6edrwolniak
Extract text from PDF UDF. Currently still debugging 
Nov 10 2014
7e659c0eb678Ryan Wolniak
Merge pull request #1 from lintool/master 
Nov 10 2014