History Graph
History Graph
Commit | Author | Details | Committed | |||
---|---|---|---|---|---|---|
c5f4b9efa212 | lintool | Very rough prototype of wrapper that allows interoperability between HBase… | Aug 23 2014 | |||
47f3c46c099d | lintool | Added Hadoop bindings for webarchive-commons ARC readers, demo, test cases. | Aug 22 2014 | |||
4663558b1d1a | lintool | Added @Override annotations to appropriate methods. | Aug 22 2014 | |||
9cbd26b2e3bb | lintool | Minor tweaks. | Aug 19 2014 | |||
43e952bfdd99 | lintool | Merge branch 'selenium' into working | Aug 19 2014 | |||
3c7da3b094fd | lintool | Refactoring of Wayback/Warcbase integration points. | Aug 19 2014 | |||
c5b4973aba96 | lintool | Merge branch 'master' into selenium | Aug 17 2014 | |||
4090318d2f99 | lintool | Cleaned up scripts. | Aug 17 2014 | |||
b02fd402870f | lintool | Cute Selenium browser to conduct a random walk through the archive. | Aug 17 2014 | |||
52cf7bb90940 | lintool | Cleaned up programs for manipulating graphs; removed scanning HBase option for… | Aug 17 2014 | |||
f7eefd5da751 | lintool | Removed outdated code; cleaned up WarcBrowserServlet. | Aug 16 2014 | |||
a57c6b83eaf2 | lintool | Refactoring: created ArcRecordUtils. | Aug 16 2014 | |||
d58afca206fc | lintool | Refactoring packge org.warcbase.demo; added Jwat prefix to Hadoop InputFormats… | Aug 16 2014 | |||
377a3e3a429f | lintool | Better handling of ARC parse errors. | Aug 15 2014 | |||
2d16fd032060 | lintool | Bump max value size up to 10 MB, tweak WAL settings. | Aug 14 2014 | |||
bf606c07d453 | lintool | Refactoring UrlUtil and related classes. | Aug 14 2014 | |||
b6431bed6f8b | lintool | Minor tweaks. | Aug 14 2014 | |||
72c1afbca77c | lintool | Merge branch 'master' into cneud-integration | Aug 14 2014 | |||
f29edf052e19 | lintool | Added command-line options. | Aug 14 2014 | |||
14fa0f329072 | lintool | More refactoring. | Aug 14 2014 | |||
6eeb0ea431f1 | lintool | Refactoring on local UrlMappingBuilder. | Aug 14 2014 | |||
5022874b83e5 | lintool | Lightweight refactoring. | Aug 14 2014 | |||
0bc5d4876199 | lintool | Uri -> Url classes renaming. | Aug 14 2014 | |||
ab64384ff204 | lintool | Light refactoring | Aug 14 2014 | |||
bc47be0fea19 | Jeffyrao | resolve conflicts | Aug 13 2014 | |||
0623b52cd05c | Jeffyrao | reformat code | Aug 13 2014 | |||
910c2f03e6b1 | lintool | Removed extra command-line argument. | Aug 13 2014 | |||
a14cda2217fa | lintool | Commented out LibmagicJnaWrapper functionality because the jar isn't generally… | Aug 12 2014 | |||
accd1978862d | lintool | Merge branch 'master' of github.com:cneud/warcbase into cneud-integration | Aug 12 2014 | |||
eb4893e7201c | lintool | Merge branch 'cleanup' into wayback-integration | Aug 12 2014 | |||
c665a23bf1f3 | lintool | Fixes issue #58: Wayback reads directly from REST API instead of writing and… | Aug 12 2014 | |||
ffec5b2caa3d | lintool | Fixed issue #59: Unable to fetch URLs from archive with '?' in them | Aug 12 2014 | |||
0f4f541c24d7 | lintool | Fixed issues with fetching URLs with spaces in them. | Aug 12 2014 | |||
b5ccb86d330e | lintool | Better handling of errors: when REST API is unavailable, when URL isn't found… | Aug 11 2014 | |||
515bab098cff | lintool | Code cleanup for browser code; removed unneeded files and associated web files. | Aug 11 2014 | |||
74f3dece6c62 | lintool | Merge branch 'rest-api-bug-fix' of github.com:lintool/warcbase into wayback… | Aug 11 2014 | |||
3075ca19d76d | lintool | Converted host/port/table information to bean settings. | Aug 11 2014 | |||
37e97073d57c | lintool | Simplified code. | Aug 11 2014 | |||
bca5b6944ac7 | lintool | Refactoring; mostly reformatting. | Aug 11 2014 | |||
9e8765b4e3f0 | lintool | Minor fix for NPE when capture isn't in HBase. | Aug 11 2014 | |||
4f387e8952a7 | lintool | Initial check-in of Warcbase integration points with Open Wayback. | Aug 11 2014 | |||
03d5c5365481 | lintool | /*/ query returns MIME type. | Aug 11 2014 | |||
65b6dc5b7138 | lintool | Refactor to confirm to /*/ of Wayback to fetch list of available versions. | Aug 10 2014 | |||
096878f5f932 | lintool | Switched over to 14 digit dates for URLs to align with Wayback. Further… | Aug 10 2014 | |||
7f9764b1a793 | lintool | Cleaned up servlet fetch code. | Aug 10 2014 | |||
b685d65eba7b | lintool | Fixed 14 digit date parsing issue (now uses ArchiveUtils); was an issue with… | Aug 10 2014 | |||
bc6aab1ffd21 | lintool | Fixed a few minor ingestion issues. | Aug 10 2014 | |||
b20ef84e5df9 | lintool | Refactoring; removing WARC ingestion for now. | Aug 10 2014 | |||
cfe508a831f0 | lintool | Tweaks to ingest code. | Aug 10 2014 | |||
180b57fa5dc1 | lintool | Janky, but seems to work: ingesting and serving up raw ARC records. | Aug 10 2014 | |||
8b5a6be9db61 | lintool | Quick and dirty switch over to webarchive-commons API; stores raw ARC records. | Aug 10 2014 | |||
65c6e54a948e | Jeffyrao | add UriMappingBuilder Mapreduce version | Aug 4 2014 | |||
59aa95aab857 | Jeffyrao | fix the bug of selecting webpage by date ineffective | Jul 25 2014 | |||
7175239e0751 | Jeffyrao | add Hadoop/HBase input choice for ExtractLinks and ExtractSiteLinks classes | Jul 22 2014 | |||
60a9b96beebe | Milad Gholami | Merging with master. | Jun 26 2014 | |||
149ce6cf969f | Milad Gholami | Fixing git history. | Jun 26 2014 | |||
3b8484a944a8 | lintool | More work on the admin interface. | Jun 18 2014 | |||
f3015cd7ba4b | lintool | issue #50 | Jun 18 2014 | |||
10d60bd28c75 | lintool | Fixed broken merge. | Jun 17 2014 | |||
6c452cbb6b5d | lintool | Merge branch 'master' into admin | Jun 17 2014 | |||
781fe4247b31 | lintool | Fixed issue #48 | Jun 17 2014 | |||
5f62c2f6fb9d | lintool | Added comment. | Jun 17 2014 | |||
d17dddca8deb | lintool | Minor refactoring. | Jun 17 2014 | |||
ee7f7749a30a | lintool | Initial working version of anchor text inversion program: issue #43 | Jun 17 2014 | |||
02c26d6d8ea4 | lintool | Started working on issue #46 cleanup of org.warcbase.data.Util | Jun 17 2014 | |||
c7a7247d5fa2 | lintool | Appears to have fixed issue #49, starting work on admin tool, issue #45. | Jun 17 2014 | |||
a6be4375e0c7 | lintool | ExtractLinks using HBase appears to be working. | Jun 13 2014 | |||
21a07efb350e | lintool | Refactored HDFS extractor; HBase extractor still broken. | Jun 13 2014 | |||
bbc73ab64808 | lintool | Merge branch 'master' into refactoring | Jun 12 2014 | |||
273e5969e943 | lintool | Light refactoring, pushed column family filter into scan. | Jun 12 2014 | |||
3283bb8512ef | lintool | Alternative implementation based on iterating over maps... slightly slower. | Jun 12 2014 | |||
dbfbcb0b3c7e | lintool | More light refactoring. | Jun 12 2014 | |||
34273fdb935a | Jeffyrao | add hbase option for ExtractLinks | Jun 12 2014 | |||
cfb89d2d3379 | Jeffyrao | reformat Jinfeng's code | Jun 12 2014 | |||
b02f33c9b43a | lintool | Refactoring, code cleanup. | Jun 11 2014 | |||
d4c29085ee12 | lintool | Fixed issue #39 | Jun 11 2014 | |||
c3a4348e1250 | lintool | Debugged HBase scan parameters so that they don't knock over region servers… | Jun 11 2014 | |||
0e61a3094550 | milad621 | Issue 38 fixed. Still need to add other URL Encoding characters. | Jun 6 2014 | |||
20ef503dbfe5 | lintool | Moving ExtractLinks and ExtractSiteLinks into analysis.graph package, per Issue… | Jun 5 2014 | |||
dc58365a9579 | lintool | Extracts values at different timestamps. | Jun 5 2014 | |||
60b827d1c174 | lintool | Refactored getIdRange method signature, add more test cases to UriMapping. | Jun 5 2014 | |||
a6b17787ccd1 | lintool | Merge branch 'extract-links' of github.com:Jeffyrao/warcbase into refactoring | Jun 5 2014 | |||
7fc1b92d60a7 | Jeffyrao | fix issue 40 that UriMapping prefix search should return empty result when no… | Jun 5 2014 | |||
a8953616248c | lintool | Refactoring, added test case (currently broken). | Jun 4 2014 | |||
6518836b7ea0 | Jeffyrao | fix issue 32, update ExtractSiteLinks code | Jun 3 2014 | |||
a17e3c74954a | Jeffyrao | fix issue 30, add ExtractSiteLinks code | May 29 2014 | |||
3f36eff4f429 | Jeffyrao | fix issue 31 | May 27 2014 | |||
b4ab2ea0d499 | lintool | Initial MapReduce over HBase demo. | May 26 2014 | |||
0598dcd91073 | Jeffyrao | more edits | May 25 2014 | |||
cad5eddceb19 | Jeffyrao | remove javacsv dependency, add opencsv dependency | May 25 2014 | |||
f0e36b46b872 | Jinfeng Rao | Merge remote-tracking branch 'upstream/master' into extract-links | May 25 2014 | |||
7c21e2df9267 | lintool | Fixed robustness and OOM issues when ingesting corrupt ARC files. | May 23 2014 | |||
67888b472702 | milad621 | Fixed issue29. | May 23 2014 | |||
fcc4acf98a6f | lintool | Simple program to find URL patterns in archives. | May 22 2014 | |||
52b39cfecc82 | lintool | Minor refactoring. | May 6 2014 | |||
0856f3faddd1 | lintool | Merge branch 'master' of github.com:milad621/warcbase into display_issues | May 6 2014 | |||
a4247934797c | milad621 | changed type of nobanner to boolean. | May 5 2014 | |||
c5432ac2468c | milad621 | Working on issue17. Added /noframes to the url but still needs to add warcbase… | May 4 2014 | |||
8799707e9856 | milad621 | Fixed issue 25 | Apr 24 2014 | |||
24d7d8e08807 | Jeffyrao | modified ExtractSiteLinks.java, changed to read csv format of prefix input | Apr 20 2014 |
c4science · Help