Homec4science

Fixed issues #64, #105, #108

Authored by lintool <jimmylin@umd.edu> on Dec 22 2014, 21:30.

Description

Fixed issues #64, #105, #108

Issue #64 Ingestion support for WARC
Issue #105 ExtractLinksWac fails on opening local cache files
Issue #108 Integrate contributions from jrwiebe

Details

Event Timeline

lintool <jimmylin@umd.edu> committed R1473:e03a08388734: Fixed issues #64, #105, #108 (authored by lintool <jimmylin@umd.edu>).Dec 22 2014, 21:30

Merged Changes

CommitAuthorDetailsCommitted
7f6d571424a9lintool
Changed compression back from GZ to Snappy. 
Dec 21 2014
12a00be93209lintool
Fixed minor code formatting issues. 
Dec 21 2014
dbaa917eb8b2Jeremy Wiebe
Updated regex for "Content-Type" (RFC2616 sec. 4.2 says HTTP header field names… 
Dec 11 2014
329ddf9ce247Jeremy Wiebe
Use WarcRecordUtils.getWarcResponseMimeType() for ingest 
Dec 11 2014
025a0a63ec02Jeremy Wiebe
Error in README 
Dec 11 2014
8b1e82f16828Jeremy Wiebe
Added WARC support to analysis/graph classes (except ExtractLinksJwat -- since… 
Dec 11 2014
eb28dc02a8d2Jeremy Wiebe
Added getBodyContent(WARCRecord r) method to WarcRecordUtils 
Dec 11 2014
c2b059444750Jeremy Wiebe
Merge branch 'master' into warc-support 
Dec 10 2014
cdbada044f9dJeremy Wiebe
Fixed typos in README.md 
Dec 10 2014
ec4285807f39Jeremy Wiebe
Added WARC support to analysis: 
Dec 10 2014
65837e09aa3aJeremy Wiebe
Added WARC support to UrlMappingMapReduceBuilder. It can now accept a path… 
Dec 9 2014
724eed27b771Jeremy Wiebe
Added WARC support to WarcbaseResoureStore. OpenWayback replay of WARC files… 
Dec 8 2014
4239f1f9ae87Jeremy Wiebe
Added WARC file support to IngestFiles. 
Dec 8 2014
3d8672c631dbJeremy Wiebe
Fixed bug in ExtractLinksWac where fst.loadMapping() would choke on filename… 
Dec 8 2014
5590850ab2f0Jeremy Wiebe
For building on OS X, changed HBase compression from Snappy to GZ. 
Dec 8 2014
97550c3727c5Jeremy Wiebe
Ooops. *Now* synced with lintool's master… 
Dec 8 2014
5ce9524aab80Jeremy Wiebe
Brought fork back into sync with lintool's master branch… 
Dec 8 2014
0379553cc55cJeremy Wiebe
Merge branch 'master' of https://github.com/lintool/warcbase 
Dec 8 2014
dcffdc5f21c2Jeremy Wiebe
Fixed dependency issues and JVM memory allocation, allowing for successful… 
Sep 10 2014
e05a7b2f92eeJeremy Wiebe
Updated pom.xml and code to use current stable releases of Hadoop and HBase, et… 
Sep 3 2014