History Graph
History Graph
Commit | Author | Details | Committed | |||
---|---|---|---|---|---|---|
5e316b1bb129 | lintool | Killed RecordUtils, ExtractLinksAndText, JwatArcLoaderTest | Nov 25 2015 | |||
2e88c1b19afb | lintool | Slapped Apache License boilerplate -- now we're a *real* open-source project :) | Nov 25 2015 | |||
8057c46945d0 | Alice-Z | Add Spark support | Nov 3 2015 | |||
d9b553c76ced | lintool | Removed JWAT; removed unneeded files in org.warcbase.analysis; switched… | Jun 7 2015 | |||
5d471280e7db | lintool | Add ingestion option to select either Snappy or GZ compression, Issue #110 | Dec 24 2014 | |||
e6b21cc557d6 | lintool | Fixed Issue #109: OOM when running UrlMappingBuilder | Dec 23 2014 | |||
6acf52f320cf | lintool | Merge branch 'hbase-api-refactoring' into dev | Dec 22 2014 | |||
7f6d571424a9 | lintool | Changed compression back from GZ to Snappy. | Dec 21 2014 | |||
dbaa917eb8b2 | Jeremy Wiebe | Updated regex for "Content-Type" (RFC2616 sec. 4.2 says HTTP header field names… | Dec 11 2014 | |||
eb28dc02a8d2 | Jeremy Wiebe | Added getBodyContent(WARCRecord r) method to WarcRecordUtils | Dec 11 2014 | |||
65837e09aa3a | Jeremy Wiebe | Added WARC support to UrlMappingMapReduceBuilder. It can now accept a path… | Dec 9 2014 | |||
5590850ab2f0 | Jeremy Wiebe | For building on OS X, changed HBase compression from Snappy to GZ. | Dec 8 2014 | |||
0145d74ed35e | lintool | Refactoring to create method that extracts MIME from WARC response records. | Oct 22 2014 | |||
410cfd81a069 | lintool | WARC-related Hadoop bindings. | Oct 19 2014 | |||
f97523f440ff | lintool | Merge branch 'master' into hbase-api-refactoring | Oct 19 2014 | |||
de4267bf28f1 | lintool | Updated documentation. | Oct 13 2014 | |||
457a71345d2b | lintool | Merge branch 'master' into warc | Sep 15 2014 | |||
2b8e721063fa | lintool | Refactored to eliminate deprecated HBase APIs. | Sep 14 2014 | |||
159596e9b378 | lintool | Fixed build issues in upgrade to CDH 5.1.2. | Sep 13 2014 | |||
f3516c7fd7f0 | lintool | Added test cases to try loading WARC records from a stream; back-ported same… | Aug 29 2014 | |||
28c5c007f4fd | lintool | Added simple test case. | Aug 28 2014 | |||
ccee8fd2204a | lintool | Implemented ArcRecordWritable. | Aug 23 2014 | |||
a57c6b83eaf2 | lintool | Refactoring: created ArcRecordUtils. | Aug 16 2014 | |||
d58afca206fc | lintool | Refactoring packge org.warcbase.demo; added Jwat prefix to Hadoop InputFormats… | Aug 16 2014 | |||
2d16fd032060 | lintool | Bump max value size up to 10 MB, tweak WAL settings. | Aug 14 2014 | |||
bf606c07d453 | lintool | Refactoring UrlUtil and related classes. | Aug 14 2014 | |||
f29edf052e19 | lintool | Added command-line options. | Aug 14 2014 | |||
14fa0f329072 | lintool | More refactoring. | Aug 14 2014 | |||
6eeb0ea431f1 | lintool | Refactoring on local UrlMappingBuilder. | Aug 14 2014 | |||
5022874b83e5 | lintool | Lightweight refactoring. | Aug 14 2014 | |||
0bc5d4876199 | lintool | Uri -> Url classes renaming. | Aug 14 2014 | |||
ab64384ff204 | lintool | Light refactoring | Aug 14 2014 | |||
bc47be0fea19 | Jeffyrao | resolve conflicts | Aug 13 2014 | |||
515bab098cff | lintool | Code cleanup for browser code; removed unneeded files and associated web files. | Aug 11 2014 | |||
b685d65eba7b | lintool | Fixed 14 digit date parsing issue (now uses ArchiveUtils); was an issue with… | Aug 10 2014 | |||
65c6e54a948e | Jeffyrao | add UriMappingBuilder Mapreduce version | Aug 4 2014 | |||
60a9b96beebe | Milad Gholami | Merging with master. | Jun 26 2014 | |||
149ce6cf969f | Milad Gholami | Fixing git history. | Jun 26 2014 | |||
f3015cd7ba4b | lintool | issue #50 | Jun 18 2014 | |||
02c26d6d8ea4 | lintool | Started working on issue #46 cleanup of org.warcbase.data.Util | Jun 17 2014 | |||
0e61a3094550 | milad621 | Issue 38 fixed. Still need to add other URL Encoding characters. | Jun 6 2014 | |||
20ef503dbfe5 | lintool | Moving ExtractLinks and ExtractSiteLinks into analysis.graph package, per Issue… | Jun 5 2014 | |||
60b827d1c174 | lintool | Refactored getIdRange method signature, add more test cases to UriMapping. | Jun 5 2014 | |||
a6b17787ccd1 | lintool | Merge branch 'extract-links' of github.com:Jeffyrao/warcbase into refactoring | Jun 5 2014 | |||
7fc1b92d60a7 | Jeffyrao | fix issue 40 that UriMapping prefix search should return empty result when no… | Jun 5 2014 | |||
a8953616248c | lintool | Refactoring, added test case (currently broken). | Jun 4 2014 | |||
6518836b7ea0 | Jeffyrao | fix issue 32, update ExtractSiteLinks code | Jun 3 2014 | |||
a17e3c74954a | Jeffyrao | fix issue 30, add ExtractSiteLinks code | May 29 2014 | |||
3f36eff4f429 | Jeffyrao | fix issue 31 | May 27 2014 | |||
0598dcd91073 | Jeffyrao | more edits | May 25 2014 | |||
cad5eddceb19 | Jeffyrao | remove javacsv dependency, add opencsv dependency | May 25 2014 | |||
67888b472702 | milad621 | Fixed issue29. | May 23 2014 | |||
0856f3faddd1 | lintool | Merge branch 'master' of github.com:milad621/warcbase into display_issues | May 6 2014 | |||
c5432ac2468c | milad621 | Working on issue17. Added /noframes to the url but still needs to add warcbase… | May 4 2014 | |||
24d7d8e08807 | Jeffyrao | modified ExtractSiteLinks.java, changed to read csv format of prefix input | Apr 20 2014 | |||
8d9ef6d2ac6b | Jeffyrao | add extract site-level links | Apr 16 2014 | |||
997a8e84e816 | lintool | Exposed -getPrefix command-line option. | Mar 31 2014 | |||
758948b463b9 | lintool | Light refactoring, fixed a few errors. | Mar 31 2014 | |||
5227ba65232c | lintool | Merge branch 'extract-links' of github.com:Jeffyrao/warcbase into fst | Mar 31 2014 | |||
e3ccb37b2e1d | Jeffyrao | remove TestFSTPrefix class | Mar 31 2014 | |||
71c68ecae0fa | Jeffyrao | add prefix search feature to UriMapping class; add test class for UriMapping… | Mar 31 2014 | |||
a48c65e498ab | Jeffyrao | add Lucene FST prefix search | Mar 28 2014 | |||
be5a4d366786 | lintool | Fixed bug with MAX_VERSION blowing up array length. | Mar 25 2014 | |||
72b9558b3ca5 | lintool | Merge branch 'issue12' of github.com:milad621/warcbase into multi_version_cell | Mar 25 2014 | |||
ef6650cb7bc4 | milad621 | fixing branches | Mar 25 2014 | |||
0d43c98ee177 | milad621 | Fixed issue12 and some issues with dates in browser. | Mar 25 2014 | |||
6c48ce81ee80 | milad621 | Issue12 fixed | Mar 24 2014 | |||
e0d6be1055f4 | milad621 | test push | Mar 24 2014 | |||
252eb87988b1 | milad621 | issue12 fixed | Mar 24 2014 | |||
24370c4f051f | Jeffyrao | update README file, add steps for getting URLs and extracting links | Mar 21 2014 | |||
cb9e68d4f94d | Jeffyrao | add ExtractLinks and PrefixQuery(ExtractSiteLinks) | Mar 20 2014 | |||
374b6f2c2173 | lintool | Adds Snappy compression. | Mar 19 2014 | |||
d543a4957916 | lintool | Merge branch 'master' into disable_wal | Mar 17 2014 | |||
b430dc7c94f6 | lintool | Merge branch 'jar_upgrade' into disable_wal | Mar 17 2014 | |||
8c649e5336af | lintool | Fixed Issue #7 | Mar 17 2014 | |||
53aacb335403 | Jeffyrao | code reformat via eclipse imported project | Mar 17 2014 | |||
dfb63a169b43 | lintool | Tried disabling WAL to see impact on performance. | Mar 16 2014 | |||
1ad985045d9a | Jeffyrao | code reformat | Mar 16 2014 | |||
073fd3206185 | lintool | Simplified Maven dependencies; Moved from IA artifacts to openwayback artifacts. | Mar 16 2014 | |||
228ca87cd0c3 | lintool | Light refactoring. | Mar 16 2014 | |||
19453fc7ac8f | lintool | whitespace | Mar 16 2014 | |||
1e0783ff8a6c | lintool | Merge branch 'new_hbase_structure' of https://github.com/milad621/warcbase into… | Mar 16 2014 | |||
75a0820c382e | milad621 | servlet updated with the new hbase structure. | Dec 28 2013 | |||
a80e87ca6f74 | milad621 | new hbase structure for ingest files | Dec 11 2013 | |||
2b49ec425597 | Jeffyrao | check text/html type and modify Jsoup.parse charset as ISO-8859-1 | Dec 8 2013 | |||
2e7e0cb81775 | Jeffyrao | modify UriMappingBuilder to read all files under given directory | Dec 7 2013 | |||
8e1cadb7e70e | Jeffyrao | Extract links and Lucene FST for URLs. | Dec 6 2013 | |||
944b138ee0d2 | milad621 | started a new branch to extract text from warcbase data and organize the data… | Dec 1 2013 | |||
7e76e19f2547 | milad621 | PrintAllUris add to appassembler which will output a urls.html file with all… | Nov 23 2013 | |||
bf51f6b75bd8 | milad621 | Code refactoring after pair coding. | Nov 18 2013 | |||
64ef175db2fd | milad621 | Some code cleanup in servlet. DetectDuplicates fixed with new hbase table… | Nov 7 2013 | |||
bf1b3616e4c0 | milad621 | updated servlet. | Nov 7 2013 | |||
76949eacabd5 | milad621 | Added a seperate class to manage HBase connection and addRecord | Nov 6 2013 | |||
b1bee7dc3a8e | milad621 | Arc Processing tools added. Not working with hbase yet. | Oct 17 2013 | |||
b321544c8a1f | milad621 | url style fixed. Can capture table names from url and home page changed to http… | Oct 9 2013 | |||
4b7ec6fab139 | milad621 | URL style changed | Oct 2 2013 | |||
127a533cf731 | milad621 | fixed a baseurl problem | Sep 24 2013 | |||
24cf7be01bea | milad621 | Some url bugs with senate data fixed. A few code refactoring done | Aug 17 2013 |
c4science · Help