Homec4science

Prototype integration of Wayback/Warcbase via REST API on HBase.

Authored by lintool <jimmylin@umd.edu> on Aug 12 2014, 02:42.

Description

Prototype integration of Wayback/Warcbase via REST API on HBase.

Issues #53 and #54.

Merge branch 'wayback-integration'

Event Timeline

lintool <jimmylin@umd.edu> committed R1473:a0a594f92b94: Prototype integration of Wayback/Warcbase via REST API on HBase. (authored by lintool <jimmylin@umd.edu>).Aug 12 2014, 02:42

Merged Changes

CommitAuthorDetailsCommitted
eb4893e7201clintool
Merge branch 'cleanup' into wayback-integration 
Aug 12 2014
c665a23bf1f3lintool
Fixes issue #58: Wayback reads directly from REST API instead of writing and… 
Aug 12 2014
ffec5b2caa3dlintool
Fixed issue #59: Unable to fetch URLs from archive with '?' in them 
Aug 12 2014
0f4f541c24d7lintool
Fixed issues with fetching URLs with spaces in them. 
Aug 12 2014
b5ccb86d330elintool
Better handling of errors: when REST API is unavailable, when URL isn't found… 
Aug 11 2014
515bab098cfflintool
Code cleanup for browser code; removed unneeded files and associated web files. 
Aug 11 2014
74f3dece6c62lintool
Merge branch 'rest-api-bug-fix' of github.com:lintool/warcbase into wayback… 
Aug 11 2014
3075ca19d76dlintool
Converted host/port/table information to bean settings. 
Aug 11 2014
37e97073d57clintool
Simplified code. 
Aug 11 2014
bca5b6944ac7lintool
Refactoring; mostly reformatting. 
Aug 11 2014
9e8765b4e3f0lintool
Minor fix for NPE when capture isn't in HBase. 
Aug 11 2014
4f387e8952a7lintool
Initial check-in of Warcbase integration points with Open Wayback. 
Aug 11 2014
03d5c5365481lintool
/*/ query returns MIME type. 
Aug 11 2014
65b6dc5b7138lintool
Refactor to confirm to /*/ of Wayback to fetch list of available versions. 
Aug 10 2014
096878f5f932lintool
Switched over to 14 digit dates for URLs to align with Wayback. Further… 
Aug 10 2014
7f9764b1a793lintool
Cleaned up servlet fetch code. 
Aug 10 2014
b685d65eba7blintool
Fixed 14 digit date parsing issue (now uses ArchiveUtils); was an issue with… 
Aug 10 2014
bc6aab1ffd21lintool
Fixed a few minor ingestion issues. 
Aug 10 2014
b20ef84e5df9lintool
Refactoring; removing WARC ingestion for now. 
Aug 10 2014
cfe508a831f0lintool
Tweaks to ingest code. 
Aug 10 2014
180b57fa5dc1lintool
Janky, but seems to work: ingesting and serving up raw ARC records. 
Aug 10 2014
8b5a6be9db61lintool
Quick and dirty switch over to webarchive-commons API; stores raw ARC records. 
Aug 10 2014