Merge branch 'master' into extract-pdf-udf
Description
Description
Details
Details
- Committed
lintool <jimmylin@umd.edu> May 24 2015, 19:37 - Pushed
dportabella Oct 19 2016, 16:29 - Parents
- R1473:0a4cae2e79d7: Fixed issue #118 ExtractLinks UDF doesn't properly handle relative URLs
R1473:5c2540099370: Reformatting. - Branches
- Unknown
- Tags
Merged Changes
Merged Changes
Commit | Author | Details | Committed | |||
---|---|---|---|---|---|---|
0a4cae2e79d7 | lintool | Fixed issue #118 ExtractLinks UDF doesn't properly handle relative URLs | May 13 2015 | |||
a3a2cd7853c8 | lintool | tries to fix issues with relative links | May 13 2015 | |||
063d7a4b2593 | lintool | Fixed issues #104, #117 | May 7 2015 | |||
0a781f364480 | lintool | Added test case for ExtractLinks UDF. | May 6 2015 | |||
5b10497c2d53 | lintool | Updated UDF to handle relative paths (with source page). Added test case. | May 6 2015 | |||
fb95f3ae562e | lintool | UDF for extracting top-level domain from URL. | May 6 2015 | |||
6ef497988df1 | lintool | Fixed issues #88, #109, #110, #111, #112 | Dec 25 2014 | |||
f263769c86b7 | lintool | Added option to change MAX_CONTENT_SIZE in IngestFiles, Issues #112 | Dec 25 2014 | |||
52b9696fba87 | lintool | Fixed typo. | Dec 24 2014 | |||
5d471280e7db | lintool | Add ingestion option to select either Snappy or GZ compression, Issue #110 | Dec 24 2014 | |||
fb3edb4e40c7 | lintool | Renaming. | Dec 23 2014 | |||
f6caf0f6f226 | lintool | Fixed OOM issues. | Dec 23 2014 | |||
b1c6995aa034 | lintool | Fixed warnings. | Dec 23 2014 | |||
59949b3d7af4 | lintool | Fixed OOM errors. | Dec 23 2014 | |||
e6b21cc557d6 | lintool | Fixed Issue #109: OOM when running UrlMappingBuilder | Dec 23 2014 | |||
0fd8f3f72822 | lintool | Updated versions of some artifacts. | Dec 23 2014 | |||
6acf52f320cf | lintool | Merge branch 'hbase-api-refactoring' into dev | Dec 22 2014 | |||
e03a08388734 | lintool | Fixed issues #64, #105, #108 | Dec 22 2014 | |||
7f6d571424a9 | lintool | Changed compression back from GZ to Snappy. | Dec 21 2014 | |||
12a00be93209 | lintool | Fixed minor code formatting issues. | Dec 21 2014 | |||
dbaa917eb8b2 | Jeremy Wiebe | Updated regex for "Content-Type" (RFC2616 sec. 4.2 says HTTP header field names… | Dec 11 2014 | |||
329ddf9ce247 | Jeremy Wiebe | Use WarcRecordUtils.getWarcResponseMimeType() for ingest | Dec 11 2014 | |||
025a0a63ec02 | Jeremy Wiebe | Error in README | Dec 11 2014 | |||
8b1e82f16828 | Jeremy Wiebe | Added WARC support to analysis/graph classes (except ExtractLinksJwat -- since… | Dec 11 2014 | |||
eb28dc02a8d2 | Jeremy Wiebe | Added getBodyContent(WARCRecord r) method to WarcRecordUtils | Dec 11 2014 | |||
c2b059444750 | Jeremy Wiebe | Merge branch 'master' into warc-support | Dec 10 2014 | |||
cdbada044f9d | Jeremy Wiebe | Fixed typos in README.md | Dec 10 2014 | |||
ec4285807f39 | Jeremy Wiebe | Added WARC support to analysis: | Dec 10 2014 | |||
65837e09aa3a | Jeremy Wiebe | Added WARC support to UrlMappingMapReduceBuilder. It can now accept a path… | Dec 9 2014 | |||
724eed27b771 | Jeremy Wiebe | Added WARC support to WarcbaseResoureStore. OpenWayback replay of WARC files… | Dec 8 2014 | |||
4239f1f9ae87 | Jeremy Wiebe | Added WARC file support to IngestFiles. | Dec 8 2014 | |||
3d8672c631db | Jeremy Wiebe | Fixed bug in ExtractLinksWac where fst.loadMapping() would choke on filename… | Dec 8 2014 | |||
5590850ab2f0 | Jeremy Wiebe | For building on OS X, changed HBase compression from Snappy to GZ. | Dec 8 2014 | |||
97550c3727c5 | Jeremy Wiebe | Ooops. *Now* synced with lintool's master… | Dec 8 2014 | |||
5ce9524aab80 | Jeremy Wiebe | Brought fork back into sync with lintool's master branch… | Dec 8 2014 | |||
0379553cc55c | Jeremy Wiebe | Merge branch 'master' of https://github.com/lintool/warcbase | Dec 8 2014 | |||
f97523f440ff | lintool | Merge branch 'master' into hbase-api-refactoring | Oct 19 2014 | |||
2b8e721063fa | lintool | Refactored to eliminate deprecated HBase APIs. | Sep 14 2014 | |||
dcffdc5f21c2 | Jeremy Wiebe | Fixed dependency issues and JVM memory allocation, allowing for successful… | Sep 10 2014 | |||
e05a7b2f92ee | Jeremy Wiebe | Updated pom.xml and code to use current stable releases of Hadoop and HBase, et… | Sep 3 2014 |