History Graph
History Graph
Commit | Author | Details | Committed | |||
---|---|---|---|---|---|---|
f5d8edf50506 | lintool | Killed all the Pig stuff. | Nov 24 2015 | |||
3eb11b04e499 | Alice-Z | Commit clean-up | Nov 8 2015 | |||
8057c46945d0 | Alice-Z | Add Spark support | Nov 3 2015 | |||
e5821091ecca | Alice-Z | update ArcRecords interface | Oct 21 2015 | |||
11dfe152d4cc | Jeremy Wiebe | Added ExtractBoilerpipeText UDF | Jun 30 2015 | |||
1c013a7b064f | Jeremy Wiebe | Added Stanford NER UDF | May 26 2015 | |||
eec01a41b18e | lintool | Cleanup. | May 24 2015 | |||
1c9b25027e73 | lintool | Wraps a try block around everything to catch all errors. | May 24 2015 | |||
9acc8ca74c0a | lintool | Merge branch 'master' into extract-pdf-udf | May 24 2015 | |||
a3a2cd7853c8 | lintool | tries to fix issues with relative links | May 13 2015 | |||
5b10497c2d53 | lintool | Updated UDF to handle relative paths (with source page). Added test case. | May 6 2015 | |||
fb95f3ae562e | lintool | UDF for extracting top-level domain from URL. | May 6 2015 | |||
5c2540099370 | lintool | Reformatting. | Dec 18 2014 | |||
c49c58f76ed3 | rwolniak | ExtractTextFromPDFs updated | Dec 9 2014 | |||
cea1f0cae974 | rwolniak | Updated PIG Tika Parser with new code that should work. Awaiting test on Hadoop | Dec 9 2014 | |||
8a3fed9342fc | rwolniak | Still testing ExtractTextFromPDFs | Nov 12 2014 | |||
10b92d78f6ed | rwolniak | Extract text from PDF UDF. Currently still debugging | Nov 10 2014 | |||
3af097ea655d | lintool | Fixed test cases. | Oct 19 2014 | |||
b6431bed6f8b | lintool | Minor tweaks. | Aug 14 2014 | |||
a14cda2217fa | lintool | Commented out LibmagicJnaWrapper functionality because the jar isn't generally… | Aug 12 2014 | |||
db4ebe9f4826 | pmd | Added a null pointer check and a more Pig friendly return value from the UDFs | Dec 19 2013 | |||
ba201c27e210 | pmd | Refactored the configuration of the magic lib into the Pig script. | Dec 19 2013 | |||
5d312a3f80ec | pmd | Removed warnings | Dec 19 2013 | |||
b3fdd488fdcb | pmd | Refactored the DetectMimeType into two seperate methods: one for each detection… | Dec 19 2013 | |||
259f0057f400 | pmd | Add identification engine as a parameter to the DetectMimeType UDF | Dec 9 2013 | |||
1a209cf5b04a | pmd | First version of a magic lib UDF | Dec 4 2013 | |||
ccb9ea44f90a | cneud | use tika for mime type detection | Dec 3 2013 | |||
fbf902fcd066 | cneud | use tika for language detection | Dec 2 2013 | |||
1c57edb9e225 | lintool | Added ExtractRawText UDF, tweaked ExtractLinks. | Dec 2 2013 | |||
1fd881ddf836 | lintool | Loader now materializes actual text, added ExtractLinks UDF. | Dec 2 2013 |
c4science · Help