R1473/srcmaster
R1473/src
master
History Graph
History Graph
Commit | Author | Details | Committed | |||
---|---|---|---|---|---|---|
65b6dc5b7138 | lintool | Refactor to confirm to /*/ of Wayback to fetch list of available versions. | Aug 10 2014 | |||
096878f5f932 | lintool | Switched over to 14 digit dates for URLs to align with Wayback. Further… | Aug 10 2014 | |||
7f9764b1a793 | lintool | Cleaned up servlet fetch code. | Aug 10 2014 | |||
b685d65eba7b | lintool | Fixed 14 digit date parsing issue (now uses ArchiveUtils); was an issue with… | Aug 10 2014 | |||
bc6aab1ffd21 | lintool | Fixed a few minor ingestion issues. | Aug 10 2014 | |||
b20ef84e5df9 | lintool | Refactoring; removing WARC ingestion for now. | Aug 10 2014 | |||
cfe508a831f0 | lintool | Tweaks to ingest code. | Aug 10 2014 | |||
180b57fa5dc1 | lintool | Janky, but seems to work: ingesting and serving up raw ARC records. | Aug 10 2014 | |||
8b5a6be9db61 | lintool | Quick and dirty switch over to webarchive-commons API; stores raw ARC records. | Aug 10 2014 | |||
65c6e54a948e | Jeffyrao | add UriMappingBuilder Mapreduce version | Aug 4 2014 | |||
59aa95aab857 | Jeffyrao | fix the bug of selecting webpage by date ineffective | Jul 25 2014 | |||
7175239e0751 | Jeffyrao | add Hadoop/HBase input choice for ExtractLinks and ExtractSiteLinks classes | Jul 22 2014 | |||
60a9b96beebe | Milad Gholami | Merging with master. | Jun 26 2014 | |||
149ce6cf969f | Milad Gholami | Fixing git history. | Jun 26 2014 | |||
3b8484a944a8 | lintool | More work on the admin interface. | Jun 18 2014 | |||
f3015cd7ba4b | lintool | issue #50 | Jun 18 2014 | |||
10d60bd28c75 | lintool | Fixed broken merge. | Jun 17 2014 | |||
6c452cbb6b5d | lintool | Merge branch 'master' into admin | Jun 17 2014 | |||
781fe4247b31 | lintool | Fixed issue #48 | Jun 17 2014 | |||
5f62c2f6fb9d | lintool | Added comment. | Jun 17 2014 | |||
d17dddca8deb | lintool | Minor refactoring. | Jun 17 2014 | |||
ee7f7749a30a | lintool | Initial working version of anchor text inversion program: issue #43 | Jun 17 2014 | |||
02c26d6d8ea4 | lintool | Started working on issue #46 cleanup of org.warcbase.data.Util | Jun 17 2014 | |||
c7a7247d5fa2 | lintool | Appears to have fixed issue #49, starting work on admin tool, issue #45. | Jun 17 2014 | |||
a6be4375e0c7 | lintool | ExtractLinks using HBase appears to be working. | Jun 13 2014 | |||
21a07efb350e | lintool | Refactored HDFS extractor; HBase extractor still broken. | Jun 13 2014 | |||
bbc73ab64808 | lintool | Merge branch 'master' into refactoring | Jun 12 2014 | |||
273e5969e943 | lintool | Light refactoring, pushed column family filter into scan. | Jun 12 2014 | |||
3283bb8512ef | lintool | Alternative implementation based on iterating over maps... slightly slower. | Jun 12 2014 | |||
dbfbcb0b3c7e | lintool | More light refactoring. | Jun 12 2014 | |||
34273fdb935a | Jeffyrao | add hbase option for ExtractLinks | Jun 12 2014 | |||
cfb89d2d3379 | Jeffyrao | reformat Jinfeng's code | Jun 12 2014 | |||
b02f33c9b43a | lintool | Refactoring, code cleanup. | Jun 11 2014 | |||
d4c29085ee12 | lintool | Fixed issue #39 | Jun 11 2014 | |||
c3a4348e1250 | lintool | Debugged HBase scan parameters so that they don't knock over region servers… | Jun 11 2014 | |||
0e61a3094550 | milad621 | Issue 38 fixed. Still need to add other URL Encoding characters. | Jun 6 2014 | |||
20ef503dbfe5 | lintool | Moving ExtractLinks and ExtractSiteLinks into analysis.graph package, per Issue… | Jun 5 2014 | |||
dc58365a9579 | lintool | Extracts values at different timestamps. | Jun 5 2014 | |||
60b827d1c174 | lintool | Refactored getIdRange method signature, add more test cases to UriMapping. | Jun 5 2014 | |||
a6b17787ccd1 | lintool | Merge branch 'extract-links' of github.com:Jeffyrao/warcbase into refactoring | Jun 5 2014 | |||
7fc1b92d60a7 | Jeffyrao | fix issue 40 that UriMapping prefix search should return empty result when no… | Jun 5 2014 | |||
a8953616248c | lintool | Refactoring, added test case (currently broken). | Jun 4 2014 | |||
6518836b7ea0 | Jeffyrao | fix issue 32, update ExtractSiteLinks code | Jun 3 2014 | |||
a17e3c74954a | Jeffyrao | fix issue 30, add ExtractSiteLinks code | May 29 2014 | |||
3f36eff4f429 | Jeffyrao | fix issue 31 | May 27 2014 | |||
b4ab2ea0d499 | lintool | Initial MapReduce over HBase demo. | May 26 2014 | |||
0598dcd91073 | Jeffyrao | more edits | May 25 2014 | |||
cad5eddceb19 | Jeffyrao | remove javacsv dependency, add opencsv dependency | May 25 2014 | |||
f0e36b46b872 | Jinfeng Rao | Merge remote-tracking branch 'upstream/master' into extract-links | May 25 2014 | |||
7c21e2df9267 | lintool | Fixed robustness and OOM issues when ingesting corrupt ARC files. | May 23 2014 | |||
67888b472702 | milad621 | Fixed issue29. | May 23 2014 | |||
fcc4acf98a6f | lintool | Simple program to find URL patterns in archives. | May 22 2014 | |||
52b39cfecc82 | lintool | Minor refactoring. | May 6 2014 | |||
0856f3faddd1 | lintool | Merge branch 'master' of github.com:milad621/warcbase into display_issues | May 6 2014 | |||
a4247934797c | milad621 | changed type of nobanner to boolean. | May 5 2014 | |||
c5432ac2468c | milad621 | Working on issue17. Added /noframes to the url but still needs to add warcbase… | May 4 2014 | |||
8799707e9856 | milad621 | Fixed issue 25 | Apr 24 2014 | |||
24d7d8e08807 | Jeffyrao | modified ExtractSiteLinks.java, changed to read csv format of prefix input | Apr 20 2014 | |||
627743751578 | milad621 | juniversalchardet removed. | Apr 19 2014 | |||
0d34860367d9 | milad621 | For issue17 it now shows a page with three frames, each has a seperate search… | Apr 19 2014 | |||
8d9ef6d2ac6b | Jeffyrao | add extract site-level links | Apr 16 2014 | |||
fd511b6eaf7b | milad621 | Fixed Issue 24. | Apr 6 2014 | |||
997a8e84e816 | lintool | Exposed -getPrefix command-line option. | Mar 31 2014 | |||
758948b463b9 | lintool | Light refactoring, fixed a few errors. | Mar 31 2014 | |||
5227ba65232c | lintool | Merge branch 'extract-links' of github.com:Jeffyrao/warcbase into fst | Mar 31 2014 | |||
e3ccb37b2e1d | Jeffyrao | remove TestFSTPrefix class | Mar 31 2014 | |||
71c68ecae0fa | Jeffyrao | add prefix search feature to UriMapping class; add test class for UriMapping… | Mar 31 2014 | |||
2f1574575438 | lintool | More work on Issue #22: prev/next buttons turns on and off appropriately. | Mar 29 2014 | |||
a48c65e498ab | Jeffyrao | add Lucene FST prefix search | Mar 28 2014 | |||
6d1747bd14af | milad621 | Fixed issue 18. | Mar 28 2014 | |||
88e5d97286e4 | milad621 | Fixed issues 21 and 22. | Mar 28 2014 | |||
dac5827fa1e5 | milad621 | Merge branch 'master' of https://github.com/lintool/warcbase | Mar 28 2014 | |||
f2f1e34837f3 | milad621 | merged with master | Mar 28 2014 | |||
3da8131e077b | lintool | Minor code cleanup. | Mar 27 2014 | |||
d965c015a914 | milad621 | Bug with getting close dates fixed. | Mar 26 2014 | |||
be5a4d366786 | lintool | Fixed bug with MAX_VERSION blowing up array length. | Mar 25 2014 | |||
72b9558b3ca5 | lintool | Merge branch 'issue12' of github.com:milad621/warcbase into multi_version_cell | Mar 25 2014 | |||
ef6650cb7bc4 | milad621 | fixing branches | Mar 25 2014 | |||
0d43c98ee177 | milad621 | Fixed issue12 and some issues with dates in browser. | Mar 25 2014 | |||
6c48ce81ee80 | milad621 | Issue12 fixed | Mar 24 2014 | |||
e0d6be1055f4 | milad621 | test push | Mar 24 2014 | |||
252eb87988b1 | milad621 | issue12 fixed | Mar 24 2014 | |||
24370c4f051f | Jeffyrao | update README file, add steps for getting URLs and extracting links | Mar 21 2014 | |||
cb9e68d4f94d | Jeffyrao | add ExtractLinks and PrefixQuery(ExtractSiteLinks) | Mar 20 2014 | |||
374b6f2c2173 | lintool | Adds Snappy compression. | Mar 19 2014 | |||
232a07ac56c9 | cneud | Merge remote-tracking branch 'lintool/master' | Mar 18 2014 | |||
303860c9ae3b | lintool | Reformatting source. | Mar 17 2014 | |||
c52474128b38 | lintool | Merge branch 'disable_wal' of github.com:lintool/warcbase into disable_wal | Mar 17 2014 | |||
d543a4957916 | lintool | Merge branch 'master' into disable_wal | Mar 17 2014 | |||
b430dc7c94f6 | lintool | Merge branch 'jar_upgrade' into disable_wal | Mar 17 2014 | |||
8c649e5336af | lintool | Fixed Issue #7 | Mar 17 2014 | |||
48ac86cd8479 | lintool | Added log4j properties. | Mar 17 2014 | |||
a33f92f340e1 | lintool | Added try/catch block in ingestion code. | Mar 17 2014 | |||
53aacb335403 | Jeffyrao | code reformat via eclipse imported project | Mar 17 2014 | |||
987f9518ebb8 | lintool | Ignores all files that aren't WARC/ARC files in ingest. | Mar 16 2014 | |||
dfb63a169b43 | lintool | Tried disabling WAL to see impact on performance. | Mar 16 2014 | |||
1ad985045d9a | Jeffyrao | code reformat | Mar 16 2014 | |||
073fd3206185 | lintool | Simplified Maven dependencies; Moved from IA artifacts to openwayback artifacts. | Mar 16 2014 | |||
228ca87cd0c3 | lintool | Light refactoring. | Mar 16 2014 | |||
19453fc7ac8f | lintool | whitespace | Mar 16 2014 |
c4science · Help