Graphmaster
Graph
master
History Graph
History Graph
Commit | Author | Details | Committed | ||||
---|---|---|---|---|---|---|---|
5938479cfab4 | ianmilligan1 | minor change to run tests again | Sep 23 2016 | ||||
205e9c179eba | ianmilligan1 | added https to better play with github pages and others | Aug 5 2016 | ||||
35e440d9a761 | Ian Milligan/GitHub | Merge pull request #242 from yb1/checksum | Aug 2 2016 | ||||
7f9de3dbcbf8 | Youngbin Kim | Multiple partitions | Aug 2 2016 | ||||
fe306339e3af | Ian Milligan/GitHub | Merge pull request #241 from yb1/checksum | Jul 31 2016 | ||||
71069610d8b4 | Youngbin Kim | Changed output type to rdd | Jul 31 2016 | ||||
ab72ae4ce231 | lintool | Add UDF for computing MD5 checksum. Issue #211 | Jul 28 2016 | ||||
3b4ebe297a21 | Youngbin Kim | checksum | Jul 27 2016 | ||||
16db93460896 | lintool | Skips corrupt WARC records with negative lengths. Issue #234. | Jun 29 2016 | ||||
165d8f38fffe | lintool | Skips corrupt WARC records with negative lengths. | Jun 29 2016 | ||||
46b65a8909e2 | ianmilligan1 | minor tweak, adding build instructions for analytics | Jun 28 2016 | ||||
62b8c4e57800 | lintool | Refactored Warcbase into multiple modules, upgraded to CDH 5.7.1 (w/ Spark 1.6. | Jun 28 2016 | ||||
7d2f8a4f2119 | lintool | Updated documentation. | Jun 28 2016 | ||||
6a90426a3d1a | lintool | Fixed Scala matching issue with NER extractor. | Jun 28 2016 | ||||
19be6edfa8cd | lintool | Upgraded to CDH 5.7.1. | Jun 24 2016 | ||||
0dfabaeb484b | lintool | Added dependencies for boilerpipe. | Jun 22 2016 | ||||
e8c43f1b9d24 | lintool | Cleaned up and simplified dependencies, etc. | Jun 17 2016 | ||||
6e8740a55414 | lintool | Integrating warcbase-hbase artifact. | Jun 16 2016 | ||||
0db46be7ef7f | lintool | Moved HBase and non-core code out of warcbase-core into warcbase-hbase. | Jun 16 2016 | ||||
f972206db516 | lintool | Created warcbase-core module. | Jun 16 2016 | ||||
63c3d5916d84 | lintool | Fixed issue #234 Error handling for broken ARC/WARC files: empty content in ARC… | Jun 15 2016 | ||||
e4405a1d8bd1 | ianmilligan1 | fixed broken link | Jun 15 2016 | ||||
c3f9345bf748 | Jeremy Wiebe | Merge pull request #229 from yb1/pagerank (optimization) | May 21 2016 | ||||
c09d35d8a49e | Youngbin Kim | default values | May 20 2016 | ||||
b0b4d5df2f77 | Ian Milligan | Merge pull request #228 from lintool/filter-content | May 16 2016 | ||||
e09289e57a88 | lintool | Cleaned up README. | May 15 2016 | ||||
0c850642f789 | lintool | Fixed issue #225 More robust tweet parsing | May 15 2016 | ||||
b5fd283dc492 | Jeremy Wiebe | Added keepContent() and discardContent() methods to RecordRDD | May 13 2016 | ||||
9ce41861d5f0 | Youngbin Kim | Fixed bug in optimization code | May 11 2016 | ||||
2f4f6fce86e8 | Youngbin Kim | optimize extracting links for Pagerank | May 9 2016 | ||||
51c1f57e0a7b | ianmilligan1 | added OMRI funding acknowlegement; annual report season is here | May 5 2016 | ||||
bf251c0dc714 | Jeremy Wiebe | Fix to eliminate build warning, per issue #223. | May 3 2016 | ||||
03c93f22c014 | lintool | More robust error handling. | May 2 2016 | ||||
b7fc45b842f0 | lintool | More robust JSON parsing. | May 1 2016 | ||||
53c7b911efe6 | ianmilligan1 | removed process.py as per #218 - this is why I should always use branches. | Apr 29 2016 | ||||
ee783c310c43 | ianmilligan1 | checking in process.py as per #218, on master | Apr 29 2016 | ||||
72c9ca70de97 | Ian Milligan | Merge pull request #220 from ianmilligan1/master | Apr 22 2016 | ||||
71a19fbaaa74 | Jeremy Wiebe | Merge pull request #221 from lintool/year-month | Apr 22 2016 | ||||
eb4c0e313a67 | ianmilligan1 | fixed ArcTest | Apr 21 2016 | ||||
d66f001374b1 | ianmilligan1 | more tweaks for case sensitive getCrawlDate | Apr 21 2016 | ||||
b7a81a3afff4 | ianmilligan1 | changing case errors in ExtractCrawlDate, checking with TravisCI | Apr 21 2016 | ||||
58b4e792a2bc | ianmilligan1 | capitalized function getCrawlDate and getCrawlMonth | Apr 21 2016 | ||||
9ee270102a93 | ianmilligan1 | new getCrawlMonth function | Apr 20 2016 | ||||
0b296eb38067 | ianmilligan1 | draft of contributing file | Apr 18 2016 | ||||
1f6fd06c02a4 | ianmilligan1 | updating README; code block, link to NER viz, prominent documentation link | Apr 10 2016 | ||||
7489b249d557 | lintool | Added funding. | Apr 5 2016 | ||||
b048f1347b0b | Jimmy Lin | Merge pull request #205 from ruebot/travis | Apr 5 2016 | ||||
1f3a4c6166b8 | nruest | Setup TravisCI | Apr 3 2016 | ||||
ee5262c67b04 | Jeremy Wiebe | Merge pull request #213 from lintool/link-visualization | Mar 31 2016 | ||||
4aab2330ecee | Jeremy Wiebe | Deleted unnecessary data processing script | Mar 31 2016 | ||||
666b2572b2f6 | Jeremy Wiebe | Change pure-min CSS location to CDN with HTTP and HTTPS | Mar 30 2016 | ||||
71e8a03333e1 | ianmilligan1 | minor change to NER URL text in index.html | Mar 30 2016 | ||||
c2ab2e57d987 | ianmilligan1 | checking in link visualization | Mar 29 2016 | ||||
03c2d47d0e7c | lintool | Issue #208 ExtractTopLevelDomain UDF misnamed | Mar 29 2016 | ||||
52eb2f96304d | lintool | downgraded guava to fix borked tests, cf https://issues.apache. | Mar 29 2016 | ||||
513c69695940 | lintool | Fixed compile bug, converted to .removePrefixWWW() | Mar 29 2016 | ||||
70e6c3a63ff6 | lintool | Merge branch 'master' into domainlinks | Mar 29 2016 | ||||
2ffc60cc178e | Jeremy Wiebe | Merge branch 'graphx' | Mar 29 2016 | ||||
44f3548c05cd | Jeremy Wiebe | Parameterized static vs dynamic PageRank. | Mar 29 2016 | ||||
fc0d11495cf1 | ianmilligan1 | renaming ExtractTopLevelDomain to ExtractDomain | Mar 29 2016 | ||||
c9e369fef9ef | Jeremy Wiebe | Use loadArchives | Mar 28 2016 | ||||
4a4d5e5494bf | ianmilligan1 | updated loadArc to loadArchives in README.md | Mar 28 2016 | ||||
2b6689ce033e | lintool | Merging in minor tweaks in NER vis | Mar 27 2016 | ||||
40025b169e93 | lintool | Add UDF for extracting stuff from tweets, Issue #210 | Mar 27 2016 | ||||
b80c970f2e50 | lintool | Added test cases. | Mar 27 2016 | ||||
04d105ded330 | lintool | Addressed UDF for extracting image links, Issue #203 | Mar 27 2016 | ||||
2da8f0abb983 | lintool | Added ExtractImageLinks test cases. | Mar 27 2016 | ||||
ce43a632b44f | lintool | Merge branch 'master' into images | Mar 27 2016 | ||||
f45f4d9fb0d3 | lintool | UDFs for extracting at-mentions, hashtags, and URLs from tweets. | Mar 27 2016 | ||||
6903b98be69e | lintool | Merged in tweet analysis functionality, Issue #204 | Mar 26 2016 | ||||
c71c4dcc47f8 | lintool | Moved TweetJsonFunctions. | Mar 26 2016 | ||||
1d3ddbbd57d1 | Jeremy Wiebe | Separated graph generation (apply method) from writer, and fixed calculation of… | Mar 21 2016 | ||||
19eefb5d7f2c | Jeremy Wiebe | Revert to old CDH (new version produces errors on cluster) | Mar 8 2016 | ||||
997d2405a84e | Jeremy Wiebe | Basic graph analysis, initial check-in | Mar 5 2016 | ||||
1c8cd5c78b1f | lintool | Added method to extract text. | Mar 5 2016 | ||||
93686da4ff23 | lintool | Downgraded json4s, folds in parsing in the loader. | Mar 5 2016 | ||||
f2abb2b5bbc2 | lintool | Initial pass at processing tweets in Warcbase. | Mar 5 2016 | ||||
b4e49fb40be5 | lintool | Added UDF for extracting image links. | Mar 5 2016 | ||||
db5c84770adb | Jeremy Wiebe | Update ExtractLinks.scala | Feb 20 2016 | ||||
c52fab1a3e7a | Jeremy Wiebe | Added loadArchives(), per request #195 | Feb 18 2016 | ||||
f69c7900f12e | Jeremy Wiebe | Added tests for generic ARC/WARC classes | Feb 18 2016 | ||||
77891e7d54ad | Jeremy Wiebe | Added loadArchives and supporting classes (loads WARCs or ARCs). | Feb 17 2016 | ||||
bf893d3b5da5 | Jeremy Wiebe | Added keepUrlPatterns and discardUrlPatterns as per #197. | Feb 16 2016 | ||||
8a4c55019413 | Jeremy Wiebe | Added discardUrlPatterns | Feb 16 2016 | ||||
232397a26bf1 | Jeremy Wiebe | keepUrlPatterns test | Feb 16 2016 | ||||
bf682d8e8efc | Jeremy Wiebe | Added keepUrlPatterns | Feb 16 2016 | ||||
a7b0e0b07682 | Jeremy Wiebe | Merge branch 'format' for ExtractDate (#154) and TupleFormatter | Feb 13 2016 | ||||
4b8f0fb7b482 | Jeremy Wiebe | Use shapeless to flatten tuples of any arity | Feb 13 2016 | ||||
3fb632ccd905 | Jeremy Wiebe | Removed protocol from URLs pointing to scripts and stylesheets, to avoid mixed… | Feb 4 2016 | ||||
3323b70e43cb | Jeremy Wiebe | Merge pull request #194 from lintool/remove-prefix-www | Feb 4 2016 | ||||
16276bc9d3d9 | Jeremy Wiebe | Added keepLanguages filter to RecordRDD as per Issue #190 | Feb 4 2016 | ||||
30a5e5d8d585 | Jeremy Wiebe | Added keepLanguages RDD filter | Feb 4 2016 | ||||
1b6d83c4a7fe | Alice-Z | Add RemovePrefixWWW method | Dec 25 2015 | ||||
4042cac0ad8b | Alice-Z | add method to filter date by component | Dec 25 2015 | ||||
d1c3f5d49783 | Alice-Z | Cleanup and add comments | Dec 25 2015 | ||||
1ec6e05d7540 | Alice-Z | Add Date utils to clarify date extraction | Dec 25 2015 | ||||
c773fd242ff5 | Alice-Z | Add Formatter to un-nest tuples and print in tab-delimited format | Dec 25 2015 | ||||
e2697e7df766 | ianmilligan1 | URL change to Gephi tutorial | Dec 16 2015 | ||||
ba2b44c52b35 | lintool | Refactoring API so that I can take individual Spark records in spark-shell and… | Dec 14 2015 | ||||
0d78b9bb2b5b | Alice-Z | Use SerializableWritable wrapper for serialization | Dec 10 2015 |
c4science · Help