Merge branch 'master' of github.com:cneud/warcbase into cneud-integration
Description
Description
Details
Details
- Committed
lintool <jimmylin@umd.edu> Aug 12 2014, 19:38 - Pushed
dportabella Oct 19 2016, 16:29 - Parents
- R1473:232a07ac56c9: Merge remote-tracking branch 'lintool/master'
R1473:a0a594f92b94: Prototype integration of Wayback/Warcbase via REST API on HBase. - Branches
- Unknown
- Tags
Merged Changes
Merged Changes
Commit | Author | Details | Committed | |||
---|---|---|---|---|---|---|
232a07ac56c9 | cneud | Merge remote-tracking branch 'lintool/master' | Mar 18 2014 | |||
3496355707ea | Clemens Neudecker | Merge pull request #2 from perdalum/pig-integration-file-udf | Jan 6 2014 | |||
1324d5dc6edc | pmd | Added short descriptions of the UDF to the README. | Dec 20 2013 | |||
db4ebe9f4826 | pmd | Added a null pointer check and a more Pig friendly return value from the UDFs | Dec 19 2013 | |||
ba201c27e210 | pmd | Refactored the configuration of the magic lib into the Pig script. | Dec 19 2013 | |||
71b90c81859e | pmd | Improved the DetectMimeTypeTika Pig script. | Dec 19 2013 | |||
bedac9288080 | pmd | Corrected an error in the DetectMimeTypeMagic Pig script and the corresponding… | Dec 19 2013 | |||
5d312a3f80ec | pmd | Removed warnings | Dec 19 2013 | |||
b3fdd488fdcb | pmd | Refactored the DetectMimeType into two seperate methods: one for each detection… | Dec 19 2013 | |||
d6c0cec7efb4 | pmd | Added a TODO comment | Dec 11 2013 | |||
842012e0de5f | pmd | Corrected a comment | Dec 10 2013 | |||
e71f20141a36 | pmd | Changed unit test to match the change in ArcLoader that removed the filter for… | Dec 10 2013 | |||
10b02ec31780 | pmd | Changed unit test to match the change in ArcLoader that removed the filter for… | Dec 10 2013 | |||
b8f63da4d685 | pmd | Use the provided ARC file for the unit test. | Dec 9 2013 | |||
25ca4df65a50 | pmd | Improving the unit test of the DetectMimeType by using the two identification… | Dec 9 2013 | |||
259f0057f400 | pmd | Add identification engine as a parameter to the DetectMimeType UDF | Dec 9 2013 | |||
49f50dd95873 | pmd | Add the magic lib UDF to the Pig script | Dec 9 2013 | |||
b69563d53c83 | pmd | Enable the ArcLoader to load all types of files | Dec 9 2013 | |||
1a209cf5b04a | pmd | First version of a magic lib UDF | Dec 4 2013 | |||
d3228b9f79b0 | pmd | Added .iml files | Dec 4 2013 | |||
939473b8d157 | pmd | Added .idea | Dec 4 2013 | |||
a1d8f3eed43e | cneud | force maven to use Java 1.7 | Dec 4 2013 | |||
030ffa9f9449 | Clemens Neudecker | Merge pull request #1 from perdalum/pig-integration | Dec 3 2013 | |||
ccb9ea44f90a | cneud | use tika for mime type detection | Dec 3 2013 | |||
c0e996ec2e61 | pmd | Added unit test for the language detection UDF. | Dec 3 2013 | |||
fbf902fcd066 | cneud | use tika for language detection | Dec 2 2013 | |||
a4282d81d6e9 | lintool | Cleaned up Pig test cases, added JWAT test case. | Dec 2 2013 | |||
64daa7dbb395 | lintool | Merge branch 'pig' of https://github.com/graemon/warcbase into pig-integration | Dec 2 2013 | |||
eb848477370a | lintool | Added WarcLoader | Dec 2 2013 | |||
17bc9616a180 | graemon | added a pig unit test | Dec 2 2013 | |||
1c57edb9e225 | lintool | Added ExtractRawText UDF, tweaked ExtractLinks. | Dec 2 2013 | |||
8885f5db4ded | graemon | added a pig unit test | Dec 2 2013 | |||
1fd881ddf836 | lintool | Loader now materializes actual text, added ExtractLinks UDF. | Dec 2 2013 | |||
02317746e1b2 | lintool | Added simple Pig Loader for Arc files, returns (url, time, mime) currently. | Dec 2 2013 |