Homec4science

Merge pull request #2 from perdalum/pig-integration-file-udf

Authored by Clemens Neudecker <clemens.neudecker@gmail.com> on Jan 6 2014, 10:28.

Description

Merge pull request #2 from perdalum/pig-integration-file-udf

Pig integration file udf

Event Timeline

Clemens Neudecker <clemens.neudecker@gmail.com> committed R1473:3496355707ea: Merge pull request #2 from perdalum/pig-integration-file-udf (authored by Clemens Neudecker <clemens.neudecker@gmail.com>).Jan 6 2014, 10:28

Merged Changes

CommitAuthorDetailsCommitted
1324d5dc6edcpmd
Added short descriptions of the UDF to the README. 
Dec 20 2013
db4ebe9f4826pmd
Added a null pointer check and a more Pig friendly return value from the UDFs 
Dec 19 2013
ba201c27e210pmd
Refactored the configuration of the magic lib into the Pig script. 
Dec 19 2013
71b90c81859epmd
Improved the DetectMimeTypeTika Pig script. 
Dec 19 2013
bedac9288080pmd
Corrected an error in the DetectMimeTypeMagic Pig script and the corresponding… 
Dec 19 2013
5d312a3f80ecpmd
Removed warnings 
Dec 19 2013
b3fdd488fdcbpmd
Refactored the DetectMimeType into two seperate methods: one for each detection… 
Dec 19 2013
d6c0cec7efb4pmd
Added a TODO comment 
Dec 11 2013
842012e0de5fpmd
Corrected a comment 
Dec 10 2013
e71f20141a36pmd
Changed unit test to match the change in ArcLoader that removed the filter for… 
Dec 10 2013
10b02ec31780pmd
Changed unit test to match the change in ArcLoader that removed the filter for… 
Dec 10 2013
b8f63da4d685pmd
Use the provided ARC file for the unit test. 
Dec 9 2013
25ca4df65a50pmd
Improving the unit test of the DetectMimeType by using the two identification… 
Dec 9 2013
259f0057f400pmd
Add identification engine as a parameter to the DetectMimeType UDF 
Dec 9 2013
49f50dd95873pmd
Add the magic lib UDF to the Pig script 
Dec 9 2013
b69563d53c83pmd
Enable the ArcLoader to load all types of files 
Dec 9 2013
1a209cf5b04apmd
First version of a magic lib UDF 
Dec 4 2013
d3228b9f79b0pmd
Added .iml files 
Dec 4 2013
939473b8d157pmd
Added .idea 
Dec 4 2013
a1d8f3eed43ecneud
force maven to use Java 1.7 
Dec 4 2013
030ffa9f9449Clemens Neudecker
Merge pull request #1 from perdalum/pig-integration 
Dec 3 2013
ccb9ea44f90acneud
use tika for mime type detection 
Dec 3 2013
c0e996ec2e61pmd
Added unit test for the language detection UDF. 
Dec 3 2013
fbf902fcd066cneud
use tika for language detection 
Dec 2 2013
a4282d81d6e9lintool
Cleaned up Pig test cases, added JWAT test case. 
Dec 2 2013
64daa7dbb395lintool
Merge branch 'pig' of https://github.com/graemon/warcbase into pig-integration 
Dec 2 2013
eb848477370alintool
Added WarcLoader 
Dec 2 2013
17bc9616a180graemon
added a pig unit test 
Dec 2 2013
1c57edb9e225lintool
Added ExtractRawText UDF, tweaked ExtractLinks. 
Dec 2 2013
8885f5db4dedgraemon
added a pig unit test 
Dec 2 2013
1fd881ddf836lintool
Loader now materializes actual text, added ExtractLinks UDF. 
Dec 2 2013
02317746e1b2lintool
Added simple Pig Loader for Arc files, returns (url, time, mime) currently. 
Dec 2 2013
7ee8dbb88c3clintool
Improved error checking for dates. 
Nov 25 2013
22dd0757c01elintool
Tweaked browser; added MR programs for simple content analysis. 
Nov 23 2013
b5b4e211ea05lintool
Hadoop InputFormats for ARC and WARC files + simple demos. 
Nov 22 2013