diff --git a/Makefile b/Makefile index 22dcc20..e24f38c 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,6 @@ init: git lfs install - pip install -r requirements.txt - pip install -e git+https://github.com/lorsbach/enviPath-python#egg=enviPath-python \ No newline at end of file + source activate myenv + conda install -r requirements.txt + conda install git pip + pip install -e git+https://github.com/lorsbach/enviPath-python#egg=enviPath-python@develop \ No newline at end of file diff --git a/TP_prediction/README.md b/TP_prediction/README.md index 05d2e28..cf7a9e4 100644 --- a/TP_prediction/README.md +++ b/TP_prediction/README.md @@ -1,55 +1,54 @@ -TP_prediction - Predict Transformation Products (TPs) ---------------------------------------------------- +## TP_prediction - Predict Transformation Products (TPs) + TP_prediction predicts TPs and associated biodegradation pathways using the enviPath pathway prediction engine. To run the script and to save the output of your TP prediction on enviPath, you need a user account on envipath.org. ### Input Add yor input compounds to ./input/input_structures.tsv. The input format should be the smiles of the compound, followed by its name (or identifier), and separated by a tab. ### Mandatory settings -* USERNAME: Enter here the username of your enviPath account. -* EP_PACKAGE_ID: Create a new package for your results on envipath.org and enter its URI here. +* **USERNAME**: Enter here the username of your enviPath account. +* **EP_PACKAGE_ID**: Create a new package for your results on envipath.org and enter its URI here. ### Optional settings By default, the search will predict the 50 TPs with the highest probability to be observed according to the relative reasoning model You can adapt settings under PATHWAY SEARCH SETTINGS in find_best_TPs.py: -* EP_MODEL_ID: URI of enviPath relative reasoning model to be used for prediction -* MAX_TP: Maximum number of TPs to predict per input compound -* PROBABILITY_THRESHOLD: Lower probability threshold - any value equal to or lower than the threshold will be excluded -* INCLUDE_0_PROBABILITIES: Set probabilities of 0 to 0.01 to continue having a weighting scheme downstream of the pathway -* MOIETY: Follow a chemical moiety - only compounds containing this moiety in SMILES will be expanded, e.g., "C(F)(F)F" -* SORT_TPS_BY_SIZE: To prioritize small compounds in the node queue -* FOLLOW_LABELED_ATOM: Follow labeled atoms - only compounds containing at least one atom labeled with ATOM_LABEL will -be expanded -* ATOM_LABEL: Label used to follow atoms, e.g., 14 for radiolabeled carbon +* **EP_MODEL_ID**: URI of enviPath relative reasoning model to be used for prediction +* **MAX_TP**: Maximum number of TPs to predict per input compound +* **PROBABILITY_THRESHOLD**: Lower probability threshold - any value equal to or lower than the threshold will be excluded +* **INCLUDE_0_PROBABILITIES**: Set probabilities of 0 to 0.01 to continue having a weighting scheme downstream of the pathway +* **MOIETY**: Follow a chemical moiety - only compounds containing this moiety in SMILES will be expanded, e.g., "C(F)(F)F" +* **SORT_TPS_BY_SIZE**: To prioritize small compounds in the node queue +* **FOLLOW_LABELED_ATOM**: Follow labeled atoms - only compounds containing at least one atom labeled with ATOM_LABEL will be expanded +* **ATOM_LABEL**: Label used to follow atoms, e.g., 14 for radiolabeled carbon Output settings -* INPUT_FILE_PATH: path to input file -* OUTPUT_DIRECTORY: path to output directory -* OUTPUT_FILE_TAG: Name tag to be added to your output files +* **INPUT_FILE_PATH**: path to input file +* **OUTPUT_DIRECTORY**: path to output directory +* **OUTPUT_FILE_TAG**: Name tag to be added to your output files ### Run prediction To predict pathways for all compounds specified in the input, run: ``` $ python find_best_TPs.py ``` ### Output The output file containing all predicted pathways and TP information will be stored in ./output. Each pathway entry starts with '///', followed by the name of the pathway and the link to the pathway entry on envipath.org. Each pathway entry is followed by a tab-separated table containing the following information: -* SMILES: SMILES of TP -* name: Name of TP (automatically generated) -* combined_probability: probability of the node (p_node = p_edge * p_node,parent) -* rules: List of biotransformation rules used to predict this reaction -* generation: number of iteration where TP was generated -* probability: probability of the reaction from the parent to this TP (p_edge) -* parent: SMILES of parent compound +* **SMILES**: SMILES of TP +* **name**: Name of TP (automatically generated) +* **combined_probability**: probability of the node (p_node = p_edge * p_node,parent) +* **rules**: List of biotransformation rules used to predict this reaction +* **generation**: number of iteration where TP was generated +* **probability**: probability of the reaction from the parent to this TP (p_edge) +* **parent**: SMILES of parent compound The output of the TP prediction can be directly used as input for * File_conversion/Prediction_output_to_mass_list/get_mass_list_from_prediction.py * Additional_analyses/Analyse_cutooff_thresholds.py \ No newline at end of file diff --git a/readme.md b/readme.md index 22e773e..837ff3a 100644 --- a/readme.md +++ b/readme.md @@ -1,14 +1,31 @@ -NICEpath - find pathways in a biochemical network ---------------------------------------------------- -NICEpath is a pathway search tool for biochemical networks. +# TP_predict - Predict TPs and create suspect lists + +This collection of scripts allows the user to reproduce the TP prediction and data analyses presented in the following publication: + +Trostel, L. & Coll, C., Fenner, K., Hafner, J. Synergy of predictive and analytical methods advances elucidating biotransformation processes in activated sludge, 2023. +[insert DOI] + +The tools can further be used to perform the same predictions and analyses on a different set of compounds. + +## Content + +* **TP_prediction**: Script to predict TPs and corresponding biodegradation pathways +* **File_conversion**: Conversion of prediction output to input for suspect screening tools + * Prediction_output_to_mass_list + * SMILES_to_mass_and_inclusion_list +* **Additional_analyses** + * Compare_methods + * Analyse_cutoff_thresholds + +Specific user guidance can be found in the README.md files of the content folders. + +## How to +To fetch the code from the git repository, open a terminal and run: +``` +$ git clone [insert link] +``` To install the dependencies, go to the nicepath directory and run: ``` -$ cd /path/to/nicepath +$ cd TP_predict $ make ``` - -To run NICEpath, go to the ./nicepath/ folder and run: -``` -$ cd nicepath -$ python main.py INPUT-NAME -``` \ No newline at end of file