Homec4science

Merge remote-tracking branch 'upstream/master'

Authored by Jeremy Wiebe <jrwiebe@streeling.umiacs.umd.edu> on May 26 2015, 18:09.

Description

Merge remote-tracking branch 'upstream/master'

Details

Event Timeline

Jeremy Wiebe <jrwiebe@streeling.umiacs.umd.edu> committed R1473:bb394dbc81cf: Merge remote-tracking branch 'upstream/master' (authored by Jeremy Wiebe <jrwiebe@streeling.umiacs.umd.edu>).May 26 2015, 18:09

Merged Changes

CommitAuthorDetailsCommitted
46a6069f9b07lintool
Fixed issues #101, #103, #115 
May 24 2015
04dffbc5db7dlintool
Merge branch 'patch-1' of github.com:machawk1/warcbase into feature-integration 
May 24 2015
4b0b8f61c298lintool
Merge branch 'spark-integration-notes' into feature-integration 
May 24 2015
1d7b3a0e0451lintool
Updated README about Spark integration. 
May 24 2015
eec01a41b18elintool
Cleanup. 
May 24 2015
1c9b25027e73lintool
Wraps a try block around everything to catch all errors. 
May 24 2015
9acc8ca74c0alintool
Merge branch 'master' into extract-pdf-udf 
May 24 2015
5ea3fbd756fbMat Kelly
Updated README removing text about "only ARC, not WARC" limits per #64 
May 21 2015
0a4cae2e79d7lintool
Fixed issue #118 ExtractLinks UDF doesn't properly handle relative URLs 
May 13 2015
a3a2cd7853c8lintool
tries to fix issues with relative links 
May 13 2015
063d7a4b2593lintool
Fixed issues #104, #117 
May 7 2015
0a781f364480lintool
Added test case for ExtractLinks UDF. 
May 6 2015
5b10497c2d53lintool
Updated UDF to handle relative paths (with source page). Added test case. 
May 6 2015
fb95f3ae562elintool
UDF for extracting top-level domain from URL. 
May 6 2015
6ef497988df1lintool
Fixed issues #88, #109, #110, #111, #112 
Dec 25 2014
f263769c86b7lintool
Added option to change MAX_CONTENT_SIZE in IngestFiles, Issues #112 
Dec 25 2014
52b9696fba87lintool
Fixed typo. 
Dec 24 2014
5d471280e7dblintool
Add ingestion option to select either Snappy or GZ compression, Issue #110 
Dec 24 2014
fb3edb4e40c7lintool
Renaming. 
Dec 23 2014
f6caf0f6f226lintool
Fixed OOM issues. 
Dec 23 2014
b1c6995aa034lintool
Fixed warnings. 
Dec 23 2014
59949b3d7af4lintool
Fixed OOM errors. 
Dec 23 2014
e6b21cc557d6lintool
Fixed Issue #109: OOM when running UrlMappingBuilder 
Dec 23 2014
0fd8f3f72822lintool
Updated versions of some artifacts. 
Dec 23 2014
6acf52f320cflintool
Merge branch 'hbase-api-refactoring' into dev 
Dec 22 2014
e03a08388734lintool
Fixed issues #64, #105, #108 
Dec 22 2014
7f6d571424a9lintool
Changed compression back from GZ to Snappy. 
Dec 21 2014
12a00be93209lintool
Fixed minor code formatting issues. 
Dec 21 2014
5c2540099370lintool
Reformatting. 
Dec 18 2014
c49c58f76ed3rwolniak
ExtractTextFromPDFs updated 
Dec 9 2014
cea1f0cae974rwolniak
Updated PIG Tika Parser with new code that should work. Awaiting test on Hadoop 
Dec 9 2014
8a3fed9342fcrwolniak
Still testing ExtractTextFromPDFs 
Nov 12 2014
10b92d78f6edrwolniak
Extract text from PDF UDF. Currently still debugging 
Nov 10 2014
7e659c0eb678Ryan Wolniak
Merge pull request #1 from lintool/master 
Nov 10 2014
f97523f440fflintool
Merge branch 'master' into hbase-api-refactoring 
Oct 19 2014
2b8e721063falintool
Refactored to eliminate deprecated HBase APIs. 
Sep 14 2014