Fixed issues #64, #105, #108
Issue #64 Ingestion support for WARC
Issue #105 ExtractLinksWac fails on opening local cache files
Issue #108 Integrate contributions from jrwiebe
lintool <jimmylin@umd.edu> | Dec 22 2014, 21:30 |
dportabella | Oct 19 2016, 16:29 |
Commit | Author | Details | Committed | |||
---|---|---|---|---|---|---|
7f6d571424a9 | lintool | Changed compression back from GZ to Snappy. | Dec 21 2014 | |||
12a00be93209 | lintool | Fixed minor code formatting issues. | Dec 21 2014 | |||
dbaa917eb8b2 | Jeremy Wiebe | Updated regex for "Content-Type" (RFC2616 sec. 4.2 says HTTP header field names… | Dec 11 2014 | |||
329ddf9ce247 | Jeremy Wiebe | Use WarcRecordUtils.getWarcResponseMimeType() for ingest | Dec 11 2014 | |||
025a0a63ec02 | Jeremy Wiebe | Error in README | Dec 11 2014 | |||
8b1e82f16828 | Jeremy Wiebe | Added WARC support to analysis/graph classes (except ExtractLinksJwat -- since… | Dec 11 2014 | |||
eb28dc02a8d2 | Jeremy Wiebe | Added getBodyContent(WARCRecord r) method to WarcRecordUtils | Dec 11 2014 | |||
c2b059444750 | Jeremy Wiebe | Merge branch 'master' into warc-support | Dec 10 2014 | |||
cdbada044f9d | Jeremy Wiebe | Fixed typos in README.md | Dec 10 2014 | |||
ec4285807f39 | Jeremy Wiebe | Added WARC support to analysis: | Dec 10 2014 | |||
65837e09aa3a | Jeremy Wiebe | Added WARC support to UrlMappingMapReduceBuilder. It can now accept a path… | Dec 9 2014 | |||
724eed27b771 | Jeremy Wiebe | Added WARC support to WarcbaseResoureStore. OpenWayback replay of WARC files… | Dec 8 2014 | |||
4239f1f9ae87 | Jeremy Wiebe | Added WARC file support to IngestFiles. | Dec 8 2014 | |||
3d8672c631db | Jeremy Wiebe | Fixed bug in ExtractLinksWac where fst.loadMapping() would choke on filename… | Dec 8 2014 | |||
5590850ab2f0 | Jeremy Wiebe | For building on OS X, changed HBase compression from Snappy to GZ. | Dec 8 2014 | |||
97550c3727c5 | Jeremy Wiebe | Ooops. *Now* synced with lintool's master… | Dec 8 2014 | |||
5ce9524aab80 | Jeremy Wiebe | Brought fork back into sync with lintool's master branch… | Dec 8 2014 | |||
0379553cc55c | Jeremy Wiebe | Merge branch 'master' of https://github.com/lintool/warcbase | Dec 8 2014 | |||
dcffdc5f21c2 | Jeremy Wiebe | Fixed dependency issues and JVM memory allocation, allowing for successful… | Sep 10 2014 | |||
e05a7b2f92ee | Jeremy Wiebe | Updated pom.xml and code to use current stable releases of Hadoop and HBase, et… | Sep 3 2014 |