workflows: update workflow tasks
- Adds new filtering task for record categories, allowing to specify which categories to trigger an action in holding pen.
- Adds task results to objects in various tasks.
- Updates the sample harvesting workflow to use new bibclassify API.
- Several other updates and improvements to workflow tasks.
- Updates the category filtering task to allow regular expression syntax when defining which categories to halt, continue or stop the workflow.
- Adds new data type attribute to workflows, by Jan Aage Lavik.
- Updates the exception handling to better handle exceptions in workflows when running a workflow in a BibSched task.
- Improvements to category filter.
- Adds options for the foreach loop, we can now cache the list to go through it will then allow to five the choice to the user if he wants to priviledge CPU vs Memory, cache uses few CPU but a lot of memory and doesn't provide a dynamic list. Without the cache option, we recompute the list each time so we use much more CPU but we keep just the data needed so few memory, and the list is still refresh for each loop.
- Refactors use of extra_data in the engine.
- Changes the task foreach so that obj.data takes the value it has before the foreach. In the previous version we were keeping the last value of the foreach.
- Adds the function filtering_oai_pmh_identifier which allows you to filter the identifier. Mainly used to avoid processing already processed one.
- Adds features to inspire_filter_category, now parameter can be passed throught the extra_data of the engine.
- Fixes some bugs with regex in inspire_filter_category
- Adds the function update_last_update which allows you to update the last_update time of a repository. It is important because the user can choose exactly where in the workflows he considers that a crash is not a problem for harvested documents.
- Adds the different features needed by the change from pickle to message pack in Celery, to the container for BibWorkflowObject exchange.
- Adds the inspire_filter_custom task which allows to filter record by examinating some field of its bibfield It is a generalization of inspire_filter_category.
- Reenables all the functions in full_doc_process workflow.
- Adds some control over the number of workflows generated it is to avoid the memory saturation or the messaging queue limit.
- Adds some protected name for workflows to avoid possible conflict with new task an new user data etc.
- Add some functions to control workflows states.
- Fixes PEP8 and clean code
- Deletes non-informative logs.
- Modifies the different workflows to take into account the previous modification made.
Co-authored-by: Jan Aage Lavik <jan.age.lavik@cern.ch>