Graphmaster
Graph
master
History Graph
History Graph
Commit | Author | Details | Committed | ||||
---|---|---|---|---|---|---|---|
bb394dbc81cf | Jeremy Wiebe | Merge remote-tracking branch 'upstream/master' | May 26 2015 | ||||
46a6069f9b07 | lintool | Fixed issues #101, #103, #115 | May 24 2015 | ||||
04dffbc5db7d | lintool | Merge branch 'patch-1' of github.com:machawk1/warcbase into feature-integration | May 24 2015 | ||||
4b0b8f61c298 | lintool | Merge branch 'spark-integration-notes' into feature-integration | May 24 2015 | ||||
1d7b3a0e0451 | lintool | Updated README about Spark integration. | May 24 2015 | ||||
eec01a41b18e | lintool | Cleanup. | May 24 2015 | ||||
1c9b25027e73 | lintool | Wraps a try block around everything to catch all errors. | May 24 2015 | ||||
9acc8ca74c0a | lintool | Merge branch 'master' into extract-pdf-udf | May 24 2015 | ||||
5ea3fbd756fb | Mat Kelly | Updated README removing text about "only ARC, not WARC" limits per #64 | May 21 2015 | ||||
0a4cae2e79d7 | lintool | Fixed issue #118 ExtractLinks UDF doesn't properly handle relative URLs | May 13 2015 | ||||
a3a2cd7853c8 | lintool | tries to fix issues with relative links | May 13 2015 | ||||
063d7a4b2593 | lintool | Fixed issues #104, #117 | May 7 2015 | ||||
0a781f364480 | lintool | Added test case for ExtractLinks UDF. | May 6 2015 | ||||
5b10497c2d53 | lintool | Updated UDF to handle relative paths (with source page). Added test case. | May 6 2015 | ||||
fb95f3ae562e | lintool | UDF for extracting top-level domain from URL. | May 6 2015 | ||||
451322ac2cf8 | Jeremy Wiebe | Update README.OSX.md | Jan 27 2015 | ||||
4dc290c46b7a | jrwiebe | Create README.OSX.md | Jan 21 2015 | ||||
6ef497988df1 | lintool | Fixed issues #88, #109, #110, #111, #112 | Dec 25 2014 | ||||
f263769c86b7 | lintool | Added option to change MAX_CONTENT_SIZE in IngestFiles, Issues #112 | Dec 25 2014 | ||||
52b9696fba87 | lintool | Fixed typo. | Dec 24 2014 | ||||
5d471280e7db | lintool | Add ingestion option to select either Snappy or GZ compression, Issue #110 | Dec 24 2014 | ||||
fb3edb4e40c7 | lintool | Renaming. | Dec 23 2014 | ||||
f6caf0f6f226 | lintool | Fixed OOM issues. | Dec 23 2014 | ||||
b1c6995aa034 | lintool | Fixed warnings. | Dec 23 2014 | ||||
59949b3d7af4 | lintool | Fixed OOM errors. | Dec 23 2014 | ||||
e6b21cc557d6 | lintool | Fixed Issue #109: OOM when running UrlMappingBuilder | Dec 23 2014 | ||||
0fd8f3f72822 | lintool | Updated versions of some artifacts. | Dec 23 2014 | ||||
6acf52f320cf | lintool | Merge branch 'hbase-api-refactoring' into dev | Dec 22 2014 | ||||
e03a08388734 | lintool | Fixed issues #64, #105, #108 | Dec 22 2014 | ||||
7f6d571424a9 | lintool | Changed compression back from GZ to Snappy. | Dec 21 2014 | ||||
12a00be93209 | lintool | Fixed minor code formatting issues. | Dec 21 2014 | ||||
5c2540099370 | lintool | Reformatting. | Dec 18 2014 | ||||
dbaa917eb8b2 | Jeremy Wiebe | Updated regex for "Content-Type" (RFC2616 sec. 4.2 says HTTP header field names… | Dec 11 2014 | ||||
329ddf9ce247 | Jeremy Wiebe | Use WarcRecordUtils.getWarcResponseMimeType() for ingest | Dec 11 2014 | ||||
025a0a63ec02 | Jeremy Wiebe | Error in README | Dec 11 2014 | ||||
8b1e82f16828 | Jeremy Wiebe | Added WARC support to analysis/graph classes (except ExtractLinksJwat -- since… | Dec 11 2014 | ||||
eb28dc02a8d2 | Jeremy Wiebe | Added getBodyContent(WARCRecord r) method to WarcRecordUtils | Dec 11 2014 | ||||
c2b059444750 | Jeremy Wiebe | Merge branch 'master' into warc-support | Dec 10 2014 | ||||
cdbada044f9d | Jeremy Wiebe | Fixed typos in README.md | Dec 10 2014 | ||||
ec4285807f39 | Jeremy Wiebe | Added WARC support to analysis: | Dec 10 2014 | ||||
65837e09aa3a | Jeremy Wiebe | Added WARC support to UrlMappingMapReduceBuilder. It can now accept a path… | Dec 9 2014 | ||||
c49c58f76ed3 | rwolniak | ExtractTextFromPDFs updated | Dec 9 2014 | ||||
cea1f0cae974 | rwolniak | Updated PIG Tika Parser with new code that should work. Awaiting test on Hadoop | Dec 9 2014 | ||||
724eed27b771 | Jeremy Wiebe | Added WARC support to WarcbaseResoureStore. OpenWayback replay of WARC files… | Dec 8 2014 | ||||
4239f1f9ae87 | Jeremy Wiebe | Added WARC file support to IngestFiles. | Dec 8 2014 | ||||
3d8672c631db | Jeremy Wiebe | Fixed bug in ExtractLinksWac where fst.loadMapping() would choke on filename… | Dec 8 2014 | ||||
5590850ab2f0 | Jeremy Wiebe | For building on OS X, changed HBase compression from Snappy to GZ. | Dec 8 2014 | ||||
97550c3727c5 | Jeremy Wiebe | Ooops. *Now* synced with lintool's master… | Dec 8 2014 | ||||
5ce9524aab80 | Jeremy Wiebe | Brought fork back into sync with lintool's master branch… | Dec 8 2014 | ||||
0379553cc55c | Jeremy Wiebe | Merge branch 'master' of https://github.com/lintool/warcbase | Dec 8 2014 | ||||
8a3fed9342fc | rwolniak | Still testing ExtractTextFromPDFs | Nov 12 2014 | ||||
10b92d78f6ed | rwolniak | Extract text from PDF UDF. Currently still debugging | Nov 10 2014 | ||||
7e659c0eb678 | Ryan Wolniak | Merge pull request #1 from lintool/master | Nov 10 2014 | ||||
df7f8711c64f | lintool | Fixed issues #96, #98, #99 | Oct 31 2014 | ||||
b8915343f5c9 | lintool | Minor refactoring. | Oct 31 2014 | ||||
0145d74ed35e | lintool | Refactoring to create method that extracts MIME from WARC response records. | Oct 22 2014 | ||||
d491020c8287 | lintool | Merge branch 'pig' into warc | Oct 19 2014 | ||||
410cfd81a069 | lintool | WARC-related Hadoop bindings. | Oct 19 2014 | ||||
f4a249469cbc | lintool | Minor refactoring. | Oct 19 2014 | ||||
0c52caed50cf | lintool | Pig ArcLoader exports its own ResourceSchema. | Oct 19 2014 | ||||
f97523f440ff | lintool | Merge branch 'master' into hbase-api-refactoring | Oct 19 2014 | ||||
e791bec61020 | lintool | Fixed issues #93, #94, #95 | Oct 19 2014 | ||||
da7a94be2c70 | lintool | Added simple Pig script. | Oct 19 2014 | ||||
3af097ea655d | lintool | Fixed test cases. | Oct 19 2014 | ||||
4ceefef78cdc | lintool | Pig loader materializes the actual content. | Oct 19 2014 | ||||
58df7da00880 | lintool | Refactored Pig loaded to use WAC API. | Oct 18 2014 | ||||
63e8352c664b | lintool | Fixed issues #90, #91, #92 | Oct 18 2014 | ||||
722b6402b98b | lintool | Updates. | Oct 18 2014 | ||||
a48bb9781feb | lintool | Merge branch 'master' of github.com:rwolniak/warcbase into warc | Oct 18 2014 | ||||
21397e4e4ff3 | lintool | upgraded to webarchive-commons 1.1.4. | Oct 18 2014 | ||||
da890acb0be1 | rwolniak | Updated Building the URL Mapping Instructions to include instruction to move… | Oct 13 2014 | ||||
de4267bf28f1 | lintool | Updated documentation. | Oct 13 2014 | ||||
27b0038b631f | rwolniak | Testing commit and push + README update | Oct 8 2014 | ||||
457a71345d2b | lintool | Merge branch 'master' into warc | Sep 15 2014 | ||||
2b8e721063fa | lintool | Refactored to eliminate deprecated HBase APIs. | Sep 14 2014 | ||||
8fae0c067d68 | lintool | Fixed issue #87 | Sep 14 2014 | ||||
05db518ccd83 | lintool | Added timing info. | Sep 14 2014 | ||||
157df31c0f15 | lintool | Cleanup. | Sep 14 2014 | ||||
85c5b4a2ecc5 | lintool | Added debug output; Fixed deprecated HBase APIs. | Sep 14 2014 | ||||
159596e9b378 | lintool | Fixed build issues in upgrade to CDH 5.1.2. | Sep 13 2014 | ||||
dcffdc5f21c2 | Jeremy Wiebe | Fixed dependency issues and JVM memory allocation, allowing for successful… | Sep 10 2014 | ||||
e05a7b2f92ee | Jeremy Wiebe | Updated pom.xml and code to use current stable releases of Hadoop and HBase, et… | Sep 3 2014 | ||||
4798a4314b6e | lintool | Figured out how to extract MIME type and date from WARC. | Aug 30 2014 | ||||
f3516c7fd7f0 | lintool | Added test cases to try loading WARC records from a stream; back-ported same… | Aug 29 2014 | ||||
28c5c007f4fd | lintool | Added simple test case. | Aug 28 2014 | ||||
8e67d49d44b3 | lintool | WARC sample from https://archive.org/details/ExampleArcAndWarcFiles | Aug 28 2014 | ||||
fb07114090f5 | lintool | Fixed issues #86, #85, #74 | Aug 23 2014 | ||||
9faed3817dcf | lintool | Refactored WacMapReduceHBaseWrapperDemo, now takes advantage of… | Aug 23 2014 | ||||
999fa0af0280 | lintool | Merge branch 'working' into table-wrapper | Aug 23 2014 | ||||
08530c1cbe52 | lintool | WacArcInputFormat now generates ArcRecordWritables. | Aug 23 2014 | ||||
98d7150f5eec | lintool | Switched ExtractSiteLinks and InvertAnchorText over to WacArcInputFormat; link… | Aug 23 2014 | ||||
369ba2731f5b | lintool | Minor refactoring, revised counters. | Aug 23 2014 | ||||
74b023ed8b16 | lintool | Both the Jwat and Wac versions of ExtractLinks gives the same exact output. | Aug 23 2014 | ||||
ccee8fd2204a | lintool | Implemented ArcRecordWritable. | Aug 23 2014 | ||||
1e1c2dcde9e2 | lintool | Fixed issues #84, #72 | Aug 23 2014 | ||||
846ebb216100 | lintool | Minor refactoring. | Aug 23 2014 | ||||
c5f4b9efa212 | lintool | Very rough prototype of wrapper that allows interoperability between HBase… | Aug 23 2014 | ||||
47f3c46c099d | lintool | Added Hadoop bindings for webarchive-commons ARC readers, demo, test cases. | Aug 22 2014 | ||||
b07376aba3fe | lintool | Added/refactored test cases for JWAT. | Aug 22 2014 | ||||
4663558b1d1a | lintool | Added @Override annotations to appropriate methods. | Aug 22 2014 |
c4science · Help