Homec4science

Fixed issue #60: Merge in Clemens et al. contributions to Warcbase

Authored by lintool <jimmylin@umd.edu> on Aug 14 2014, 18:10.

Description

Fixed issue #60: Merge in Clemens et al. contributions to Warcbase

Details

Committed
lintool <jimmylin@umd.edu>Aug 14 2014, 18:10
Pushed
dportabellaOct 19 2016, 16:29
Parents
R1473:b6431bed6f8b: Minor tweaks.
R1473:5a8dc8ab24e3: Fixed:
Branches
Unknown
Tags
Unknown

Event Timeline

lintool <jimmylin@umd.edu> committed R1473:55a7334ae0bb: Fixed issue #60: Merge in Clemens et al. contributions to Warcbase (authored by lintool <jimmylin@umd.edu>).Aug 14 2014, 18:10

Merged Changes

CommitAuthorDetailsCommitted
b6431bed6f8blintool
Minor tweaks. 
Aug 14 2014
37cc53a7fd66lintool
Tweaked settings. 
Aug 14 2014
72c1afbca77clintool
Merge branch 'master' into cneud-integration 
Aug 14 2014
a14cda2217falintool
Commented out LibmagicJnaWrapper functionality because the jar isn't generally… 
Aug 12 2014
93f6d42f4ff4lintool
Fixed compile and broken test issues. 
Aug 12 2014
accd1978862dlintool
Merge branch 'master' of github.com:cneud/warcbase into cneud-integration 
Aug 12 2014
232a07ac56c9cneud
Merge remote-tracking branch 'lintool/master' 
Mar 18 2014
3496355707eaClemens Neudecker
Merge pull request #2 from perdalum/pig-integration-file-udf 
Jan 6 2014
1324d5dc6edcpmd
Added short descriptions of the UDF to the README. 
Dec 20 2013
db4ebe9f4826pmd
Added a null pointer check and a more Pig friendly return value from the UDFs 
Dec 19 2013
ba201c27e210pmd
Refactored the configuration of the magic lib into the Pig script. 
Dec 19 2013
71b90c81859epmd
Improved the DetectMimeTypeTika Pig script. 
Dec 19 2013
bedac9288080pmd
Corrected an error in the DetectMimeTypeMagic Pig script and the corresponding… 
Dec 19 2013
5d312a3f80ecpmd
Removed warnings 
Dec 19 2013
b3fdd488fdcbpmd
Refactored the DetectMimeType into two seperate methods: one for each detection… 
Dec 19 2013
d6c0cec7efb4pmd
Added a TODO comment 
Dec 11 2013
842012e0de5fpmd
Corrected a comment 
Dec 10 2013
e71f20141a36pmd
Changed unit test to match the change in ArcLoader that removed the filter for… 
Dec 10 2013
10b02ec31780pmd
Changed unit test to match the change in ArcLoader that removed the filter for… 
Dec 10 2013
b8f63da4d685pmd
Use the provided ARC file for the unit test. 
Dec 9 2013
25ca4df65a50pmd
Improving the unit test of the DetectMimeType by using the two identification… 
Dec 9 2013
259f0057f400pmd
Add identification engine as a parameter to the DetectMimeType UDF 
Dec 9 2013
49f50dd95873pmd
Add the magic lib UDF to the Pig script 
Dec 9 2013
b69563d53c83pmd
Enable the ArcLoader to load all types of files 
Dec 9 2013
1a209cf5b04apmd
First version of a magic lib UDF 
Dec 4 2013
d3228b9f79b0pmd
Added .iml files 
Dec 4 2013
939473b8d157pmd
Added .idea 
Dec 4 2013
a1d8f3eed43ecneud
force maven to use Java 1.7 
Dec 4 2013
030ffa9f9449Clemens Neudecker
Merge pull request #1 from perdalum/pig-integration 
Dec 3 2013
ccb9ea44f90acneud
use tika for mime type detection 
Dec 3 2013
c0e996ec2e61pmd
Added unit test for the language detection UDF. 
Dec 3 2013
fbf902fcd066cneud
use tika for language detection 
Dec 2 2013
a4282d81d6e9lintool
Cleaned up Pig test cases, added JWAT test case. 
Dec 2 2013
64daa7dbb395lintool
Merge branch 'pig' of https://github.com/graemon/warcbase into pig-integration 
Dec 2 2013
eb848477370alintool
Added WarcLoader 
Dec 2 2013
17bc9616a180graemon
added a pig unit test 
Dec 2 2013
1c57edb9e225lintool
Added ExtractRawText UDF, tweaked ExtractLinks. 
Dec 2 2013
8885f5db4dedgraemon
added a pig unit test 
Dec 2 2013
1fd881ddf836lintool
Loader now materializes actual text, added ExtractLinks UDF. 
Dec 2 2013
02317746e1b2lintool
Added simple Pig Loader for Arc files, returns (url, time, mime) currently. 
Dec 2 2013