Homec4science

Merge branch 'master' into extract-pdf-udf

Authored by lintool <jimmylin@umd.edu> on May 24 2015, 19:37.

Description

Merge branch 'master' into extract-pdf-udf

Event Timeline

lintool <jimmylin@umd.edu> committed R1473:9acc8ca74c0a: Merge branch 'master' into extract-pdf-udf (authored by lintool <jimmylin@umd.edu>).May 24 2015, 19:37

Merged Changes

CommitAuthorDetailsCommitted
0a4cae2e79d7lintool
Fixed issue #118 ExtractLinks UDF doesn't properly handle relative URLs 
May 13 2015
a3a2cd7853c8lintool
tries to fix issues with relative links 
May 13 2015
063d7a4b2593lintool
Fixed issues #104, #117 
May 7 2015
0a781f364480lintool
Added test case for ExtractLinks UDF. 
May 6 2015
5b10497c2d53lintool
Updated UDF to handle relative paths (with source page). Added test case. 
May 6 2015
fb95f3ae562elintool
UDF for extracting top-level domain from URL. 
May 6 2015
6ef497988df1lintool
Fixed issues #88, #109, #110, #111, #112 
Dec 25 2014
f263769c86b7lintool
Added option to change MAX_CONTENT_SIZE in IngestFiles, Issues #112 
Dec 25 2014
52b9696fba87lintool
Fixed typo. 
Dec 24 2014
5d471280e7dblintool
Add ingestion option to select either Snappy or GZ compression, Issue #110 
Dec 24 2014
fb3edb4e40c7lintool
Renaming. 
Dec 23 2014
f6caf0f6f226lintool
Fixed OOM issues. 
Dec 23 2014
b1c6995aa034lintool
Fixed warnings. 
Dec 23 2014
59949b3d7af4lintool
Fixed OOM errors. 
Dec 23 2014
e6b21cc557d6lintool
Fixed Issue #109: OOM when running UrlMappingBuilder 
Dec 23 2014
0fd8f3f72822lintool
Updated versions of some artifacts. 
Dec 23 2014
6acf52f320cflintool
Merge branch 'hbase-api-refactoring' into dev 
Dec 22 2014
e03a08388734lintool
Fixed issues #64, #105, #108 
Dec 22 2014
7f6d571424a9lintool
Changed compression back from GZ to Snappy. 
Dec 21 2014
12a00be93209lintool
Fixed minor code formatting issues. 
Dec 21 2014
dbaa917eb8b2Jeremy Wiebe
Updated regex for "Content-Type" (RFC2616 sec. 4.2 says HTTP header field names… 
Dec 11 2014
329ddf9ce247Jeremy Wiebe
Use WarcRecordUtils.getWarcResponseMimeType() for ingest 
Dec 11 2014
025a0a63ec02Jeremy Wiebe
Error in README 
Dec 11 2014
8b1e82f16828Jeremy Wiebe
Added WARC support to analysis/graph classes (except ExtractLinksJwat -- since… 
Dec 11 2014
eb28dc02a8d2Jeremy Wiebe
Added getBodyContent(WARCRecord r) method to WarcRecordUtils 
Dec 11 2014
c2b059444750Jeremy Wiebe
Merge branch 'master' into warc-support 
Dec 10 2014
cdbada044f9dJeremy Wiebe
Fixed typos in README.md 
Dec 10 2014
ec4285807f39Jeremy Wiebe
Added WARC support to analysis: 
Dec 10 2014
65837e09aa3aJeremy Wiebe
Added WARC support to UrlMappingMapReduceBuilder. It can now accept a path… 
Dec 9 2014
724eed27b771Jeremy Wiebe
Added WARC support to WarcbaseResoureStore. OpenWayback replay of WARC files… 
Dec 8 2014
4239f1f9ae87Jeremy Wiebe
Added WARC file support to IngestFiles. 
Dec 8 2014
3d8672c631dbJeremy Wiebe
Fixed bug in ExtractLinksWac where fst.loadMapping() would choke on filename… 
Dec 8 2014
5590850ab2f0Jeremy Wiebe
For building on OS X, changed HBase compression from Snappy to GZ. 
Dec 8 2014
97550c3727c5Jeremy Wiebe
Ooops. *Now* synced with lintool's master… 
Dec 8 2014
5ce9524aab80Jeremy Wiebe
Brought fork back into sync with lintool's master branch… 
Dec 8 2014
0379553cc55cJeremy Wiebe
Merge branch 'master' of https://github.com/lintool/warcbase 
Dec 8 2014
f97523f440fflintool
Merge branch 'master' into hbase-api-refactoring 
Oct 19 2014
2b8e721063falintool
Refactored to eliminate deprecated HBase APIs. 
Sep 14 2014
dcffdc5f21c2Jeremy Wiebe
Fixed dependency issues and JVM memory allocation, allowing for successful… 
Sep 10 2014
e05a7b2f92eeJeremy Wiebe
Updated pom.xml and code to use current stable releases of Hadoop and HBase, et… 
Sep 3 2014