R1473/srcmaster
R1473/src
master
History Graph
History Graph
Commit | Author | Details | Committed | |||
---|---|---|---|---|---|---|
0a781f364480 | lintool | Added test case for ExtractLinks UDF. | May 6 2015 | |||
5b10497c2d53 | lintool | Updated UDF to handle relative paths (with source page). Added test case. | May 6 2015 | |||
fb95f3ae562e | lintool | UDF for extracting top-level domain from URL. | May 6 2015 | |||
f263769c86b7 | lintool | Added option to change MAX_CONTENT_SIZE in IngestFiles, Issues #112 | Dec 25 2014 | |||
52b9696fba87 | lintool | Fixed typo. | Dec 24 2014 | |||
5d471280e7db | lintool | Add ingestion option to select either Snappy or GZ compression, Issue #110 | Dec 24 2014 | |||
fb3edb4e40c7 | lintool | Renaming. | Dec 23 2014 | |||
f6caf0f6f226 | lintool | Fixed OOM issues. | Dec 23 2014 | |||
b1c6995aa034 | lintool | Fixed warnings. | Dec 23 2014 | |||
59949b3d7af4 | lintool | Fixed OOM errors. | Dec 23 2014 | |||
e6b21cc557d6 | lintool | Fixed Issue #109: OOM when running UrlMappingBuilder | Dec 23 2014 | |||
6acf52f320cf | lintool | Merge branch 'hbase-api-refactoring' into dev | Dec 22 2014 | |||
7f6d571424a9 | lintool | Changed compression back from GZ to Snappy. | Dec 21 2014 | |||
12a00be93209 | lintool | Fixed minor code formatting issues. | Dec 21 2014 | |||
dbaa917eb8b2 | Jeremy Wiebe | Updated regex for "Content-Type" (RFC2616 sec. 4.2 says HTTP header field names… | Dec 11 2014 | |||
329ddf9ce247 | Jeremy Wiebe | Use WarcRecordUtils.getWarcResponseMimeType() for ingest | Dec 11 2014 | |||
8b1e82f16828 | Jeremy Wiebe | Added WARC support to analysis/graph classes (except ExtractLinksJwat -- since… | Dec 11 2014 | |||
eb28dc02a8d2 | Jeremy Wiebe | Added getBodyContent(WARCRecord r) method to WarcRecordUtils | Dec 11 2014 | |||
ec4285807f39 | Jeremy Wiebe | Added WARC support to analysis: | Dec 10 2014 | |||
65837e09aa3a | Jeremy Wiebe | Added WARC support to UrlMappingMapReduceBuilder. It can now accept a path… | Dec 9 2014 | |||
724eed27b771 | Jeremy Wiebe | Added WARC support to WarcbaseResoureStore. OpenWayback replay of WARC files… | Dec 8 2014 | |||
4239f1f9ae87 | Jeremy Wiebe | Added WARC file support to IngestFiles. | Dec 8 2014 | |||
3d8672c631db | Jeremy Wiebe | Fixed bug in ExtractLinksWac where fst.loadMapping() would choke on filename… | Dec 8 2014 | |||
5590850ab2f0 | Jeremy Wiebe | For building on OS X, changed HBase compression from Snappy to GZ. | Dec 8 2014 | |||
97550c3727c5 | Jeremy Wiebe | Ooops. *Now* synced with lintool's master… | Dec 8 2014 | |||
0379553cc55c | Jeremy Wiebe | Merge branch 'master' of https://github.com/lintool/warcbase | Dec 8 2014 | |||
b8915343f5c9 | lintool | Minor refactoring. | Oct 31 2014 | |||
0145d74ed35e | lintool | Refactoring to create method that extracts MIME from WARC response records. | Oct 22 2014 | |||
d491020c8287 | lintool | Merge branch 'pig' into warc | Oct 19 2014 | |||
410cfd81a069 | lintool | WARC-related Hadoop bindings. | Oct 19 2014 | |||
f4a249469cbc | lintool | Minor refactoring. | Oct 19 2014 | |||
0c52caed50cf | lintool | Pig ArcLoader exports its own ResourceSchema. | Oct 19 2014 | |||
f97523f440ff | lintool | Merge branch 'master' into hbase-api-refactoring | Oct 19 2014 | |||
3af097ea655d | lintool | Fixed test cases. | Oct 19 2014 | |||
4ceefef78cdc | lintool | Pig loader materializes the actual content. | Oct 19 2014 | |||
58df7da00880 | lintool | Refactored Pig loaded to use WAC API. | Oct 18 2014 | |||
de4267bf28f1 | lintool | Updated documentation. | Oct 13 2014 | |||
457a71345d2b | lintool | Merge branch 'master' into warc | Sep 15 2014 | |||
2b8e721063fa | lintool | Refactored to eliminate deprecated HBase APIs. | Sep 14 2014 | |||
05db518ccd83 | lintool | Added timing info. | Sep 14 2014 | |||
85c5b4a2ecc5 | lintool | Added debug output; Fixed deprecated HBase APIs. | Sep 14 2014 | |||
159596e9b378 | lintool | Fixed build issues in upgrade to CDH 5.1.2. | Sep 13 2014 | |||
e05a7b2f92ee | Jeremy Wiebe | Updated pom.xml and code to use current stable releases of Hadoop and HBase, et… | Sep 3 2014 | |||
4798a4314b6e | lintool | Figured out how to extract MIME type and date from WARC. | Aug 30 2014 | |||
f3516c7fd7f0 | lintool | Added test cases to try loading WARC records from a stream; back-ported same… | Aug 29 2014 | |||
28c5c007f4fd | lintool | Added simple test case. | Aug 28 2014 | |||
8e67d49d44b3 | lintool | WARC sample from https://archive.org/details/ExampleArcAndWarcFiles | Aug 28 2014 | |||
9faed3817dcf | lintool | Refactored WacMapReduceHBaseWrapperDemo, now takes advantage of… | Aug 23 2014 | |||
999fa0af0280 | lintool | Merge branch 'working' into table-wrapper | Aug 23 2014 | |||
08530c1cbe52 | lintool | WacArcInputFormat now generates ArcRecordWritables. | Aug 23 2014 | |||
98d7150f5eec | lintool | Switched ExtractSiteLinks and InvertAnchorText over to WacArcInputFormat; link… | Aug 23 2014 | |||
369ba2731f5b | lintool | Minor refactoring, revised counters. | Aug 23 2014 | |||
74b023ed8b16 | lintool | Both the Jwat and Wac versions of ExtractLinks gives the same exact output. | Aug 23 2014 | |||
ccee8fd2204a | lintool | Implemented ArcRecordWritable. | Aug 23 2014 | |||
846ebb216100 | lintool | Minor refactoring. | Aug 23 2014 | |||
c5f4b9efa212 | lintool | Very rough prototype of wrapper that allows interoperability between HBase… | Aug 23 2014 | |||
47f3c46c099d | lintool | Added Hadoop bindings for webarchive-commons ARC readers, demo, test cases. | Aug 22 2014 | |||
b07376aba3fe | lintool | Added/refactored test cases for JWAT. | Aug 22 2014 | |||
4663558b1d1a | lintool | Added @Override annotations to appropriate methods. | Aug 22 2014 | |||
9cbd26b2e3bb | lintool | Minor tweaks. | Aug 19 2014 | |||
43e952bfdd99 | lintool | Merge branch 'selenium' into working | Aug 19 2014 | |||
3c7da3b094fd | lintool | Refactoring of Wayback/Warcbase integration points. | Aug 19 2014 | |||
c5b4973aba96 | lintool | Merge branch 'master' into selenium | Aug 17 2014 | |||
4090318d2f99 | lintool | Cleaned up scripts. | Aug 17 2014 | |||
b02fd402870f | lintool | Cute Selenium browser to conduct a random walk through the archive. | Aug 17 2014 | |||
52cf7bb90940 | lintool | Cleaned up programs for manipulating graphs; removed scanning HBase option for… | Aug 17 2014 | |||
f7eefd5da751 | lintool | Removed outdated code; cleaned up WarcBrowserServlet. | Aug 16 2014 | |||
a57c6b83eaf2 | lintool | Refactoring: created ArcRecordUtils. | Aug 16 2014 | |||
d58afca206fc | lintool | Refactoring packge org.warcbase.demo; added Jwat prefix to Hadoop InputFormats… | Aug 16 2014 | |||
377a3e3a429f | lintool | Better handling of ARC parse errors. | Aug 15 2014 | |||
2d16fd032060 | lintool | Bump max value size up to 10 MB, tweak WAL settings. | Aug 14 2014 | |||
bf606c07d453 | lintool | Refactoring UrlUtil and related classes. | Aug 14 2014 | |||
b6431bed6f8b | lintool | Minor tweaks. | Aug 14 2014 | |||
72c1afbca77c | lintool | Merge branch 'master' into cneud-integration | Aug 14 2014 | |||
f29edf052e19 | lintool | Added command-line options. | Aug 14 2014 | |||
14fa0f329072 | lintool | More refactoring. | Aug 14 2014 | |||
6eeb0ea431f1 | lintool | Refactoring on local UrlMappingBuilder. | Aug 14 2014 | |||
5022874b83e5 | lintool | Lightweight refactoring. | Aug 14 2014 | |||
0bc5d4876199 | lintool | Uri -> Url classes renaming. | Aug 14 2014 | |||
ab64384ff204 | lintool | Light refactoring | Aug 14 2014 | |||
bc47be0fea19 | Jeffyrao | resolve conflicts | Aug 13 2014 | |||
0623b52cd05c | Jeffyrao | reformat code | Aug 13 2014 | |||
83414e51b351 | lintool | Configuration for Wayback/Warcbase integration. | Aug 13 2014 | |||
910c2f03e6b1 | lintool | Removed extra command-line argument. | Aug 13 2014 | |||
a14cda2217fa | lintool | Commented out LibmagicJnaWrapper functionality because the jar isn't generally… | Aug 12 2014 | |||
93f6d42f4ff4 | lintool | Fixed compile and broken test issues. | Aug 12 2014 | |||
accd1978862d | lintool | Merge branch 'master' of github.com:cneud/warcbase into cneud-integration | Aug 12 2014 | |||
eb4893e7201c | lintool | Merge branch 'cleanup' into wayback-integration | Aug 12 2014 | |||
c665a23bf1f3 | lintool | Fixes issue #58: Wayback reads directly from REST API instead of writing and… | Aug 12 2014 | |||
ffec5b2caa3d | lintool | Fixed issue #59: Unable to fetch URLs from archive with '?' in them | Aug 12 2014 | |||
0f4f541c24d7 | lintool | Fixed issues with fetching URLs with spaces in them. | Aug 12 2014 | |||
b5ccb86d330e | lintool | Better handling of errors: when REST API is unavailable, when URL isn't found… | Aug 11 2014 | |||
515bab098cff | lintool | Code cleanup for browser code; removed unneeded files and associated web files. | Aug 11 2014 | |||
74f3dece6c62 | lintool | Merge branch 'rest-api-bug-fix' of github.com:lintool/warcbase into wayback… | Aug 11 2014 | |||
3075ca19d76d | lintool | Converted host/port/table information to bean settings. | Aug 11 2014 | |||
37e97073d57c | lintool | Simplified code. | Aug 11 2014 | |||
bca5b6944ac7 | lintool | Refactoring; mostly reformatting. | Aug 11 2014 | |||
9e8765b4e3f0 | lintool | Minor fix for NPE when capture isn't in HBase. | Aug 11 2014 | |||
4f387e8952a7 | lintool | Initial check-in of Warcbase integration points with Open Wayback. | Aug 11 2014 | |||
03d5c5365481 | lintool | /*/ query returns MIME type. | Aug 11 2014 |
c4science · Help