Phriction Projects Wikis Bioimaging And Optics Platform Image Processing Machine Learning Deep Learning SCITAS GPU Clusters Protocols History Version 4 vs 14
Version 4 vs 14
Version 4 vs 14
Content Changes
Content Changes
= First Time Setup =
Create environment
`virtualenv --system-site-packages -p python3 env-biop-tf2`
== Activate environment ==
source env-biop-tf2/bin/activate
== Install TF ==
```
module load python gcc/8.4.0-cuda mvapich2/2.3.4-cuda py-tensorflow
pip install ipython tensorboard
```
== Test ==
ipython
import tensorflow as tf
= Run on the Cluster =
Request time as ptbiop user
```
salloc -t 2:0:0 -A ptbiop --exclusive
squeue -u $USER
ssh NAME OF ALLOCATED NODE
module load python gcc/8.4.0-cuda mvapich2/2.3.4-cuda py-tensorflow
source env-biop-tf2/bin/activate
```
Now you can run stuff
NOTE: Nicola will update on the Jupyter Notebook setup
= Transfering data to and from the cluster =
This is a distilled version of what is available here:
https://scitas-data.epfl.ch/confluence/display/DOC/Data+Transfer+Nodes
== Some background ==
There are two locations where you can place data on the SCITAS clusters
1. Your `home` directory
2. Your `scratch` directory
The `home` directory is for your configuration files, your virtual environments, anything permanent you need to run what you need
The `scratch` directory is volatile, meaning it can go down anytime, but it is for large data, so your individual projects, datasets and results should go here and be transfered back to your lab share at the end of processing.
Moving data back and forth means connecting to the SCITAS Transfer nodes. Multiple solutions exist, but Oli has opted for WinSCP, as it has a GUI that makes transferring from Windows machines more comfortable.
== First time setup ==
Subscribe to [[ https://groups.epfl.ch/cgi-bin/groups/viewgroup?groupid=S19545 | HPC-DATAMOVERS ]] after zou have been granted a SCITAS account
You **CANNOT** login using your username and password. You need to use public key authentication.
Download and install [[ https://winscp.net/eng/index.php | WinSCP ]]
=== Create a private-public key pair ===
Open WinSCP and create a new site:
```
File protocol: SCP
Host name: fdata1.epfl.ch
User name: **your user name**
```
# Click on {nav Advanced > Authentication (Under SSH) > Tools > type=instructions, name=Generate New Key Pair with PuTTYGen}
# Click on {nav Generate} and follow the instructions to add randomness.
# Add a Key comment like "SCITAS Cluster Key"
# Add a passphrase if you want
# Click on {nav Save Private Key} and save it under `C:\Users\username\.ssh` as `scitas_access.ppk`
# (Optional) Click on {Save Public Key} and save it somewhere else if you want as `scitas_access.pub` (it is not needed as such for this protocol)
# Close PuTTYGen and back to WinSCP browse to your private key file
# Click on {nav Display Public Key} and copy it to your clipboard
# Open a command prompt and ssh into izar : `ssh izar.epfl.ch -l username`
# Create a `.ssh` folder: `mkdir .ssh`
# Change the permissions: `chmod 700 .ssh`
# Paste your public key inside: `echo 'PASTE_FROM_CLIPBOARD' >> .ssh/authorized_keys`
# Back to WinSCP, Save the Site as `SCITAS fdata1`
**Now you should be able to connect from WinSCP**
NOTE: If you added a passphrase, you will need to provide it now. It is not your Gaspar password
NOTE: Remember: Data should go to `/scratch/username`
= TensorFlow2 environment setup =
Login to `izar`: `ssh izar.epfl.ch -l oburri`
== Create environment called env-biop-tf2 ==
```
module load gcc/8.4.0-cuda mvapich2/2.3.4 py-tensorflow/2.3.1-cuda-mpi python
virtualenv --system-site-packages -p python3 env-biop-tf2
```
== Activate environment ==
```
source env-biop-tf2/bin/activate
```
== Install what you need ==
```
pip install ipython tensorboard etc...
```
== Test that you have access to the GPUs ==
```
ipython
import tensorflow as tf
```
= Run on the Cluster =
Request time as ptbiop user
```
salloc -t 2:0:0 -A ptbiop --exclusive
squeue -u $USER
ssh NAME OF ALLOCATED NODE
module load gcc/8.4.0-cuda mvapich2/2.3.4 py-tensorflow/2.3.1-cuda-mpi python
source env-biop-tf2/bin/activate
```
Now you can run stuff
NOTE: Nicola will update on the Jupyter Notebook setup
= YAPIC Setup =
NOTE: We need to work with TF 1.15 in YAPIC in order to be able to export YAPIC models to the ModelZoo structure for reuse in Fiji.
== First time setup ==
You should do this from the GPU node
```
salloc -t 2:0:0 -A ptbiop --exclusive
squeue -u $USER
ssh NAME OF ALLOCATED NODE
```
Then here we can install and setup Python with TF 1.15
```
module load gcc/8.4.0-cuda cuda/10.2.89 cudnn/7.6.5.32-10.2-linux-x64 python
virtualenv -p python3 --system-site-packages yapic-tf-1.15
source yapic-tf-1.15/bin/activate
pip install tensorflow-gpu==1.15 yapic
cd yapic-tf-1.15/lib64
ln -s $CUDA_ROOT/lib64/libcudart.so libcudart.so.10.0
ln -s $CUDA_ROOT/lib64/libcublas.so libcublas.so.10.0
ln -s $CUDA_ROOT/lib64/libcufft.so libcufft.so.10.0
ln -s $CUDA_ROOT/lib64/libcurand.so libcurand.so.10.0
ln -s $CUDA_ROOT/lib64/libcusolver.so libcusolver.so.10.0
ln -s $CUDA_ROOT/lib64/libcusparse.so libcusparse.so.10.0
export LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH
```
== Runnin YAPIC Training ==
Data must be in the form of `.tif` files for this to work. Follow the [[ https://yapic.github.io/yapic/tutorial.html | YAPIC tutorial to see the shape of your data folder]]
1. Copy your local data to the cluster using `WinSCP` in /scratch/izar/YOUR_USERNAME`
2. Request time: `salloc -t 10:0:0 -A ptbiop --exclusive`
3. Login to the node with `squeue -u $USER` followed by `ssh NAME_OF_ALLOCATED_NODE`
4. Load the necessary modules `module load gcc/8.4.0-cuda cuda/10.2.89 cudnn/7.6.5.32-10.2-linux-x64 python`
5. Start the environment: `source yapic-tf-1.15/bin/activate`
6. Start YAPIC:
```
yapic train unet_2d "/scratch/izar/YOUR_USERNAME/YOUR_PROJECT/*.tif" /scratch/izar/YOUR_USERNAME/YOUR_PROJECT/YOUR_ILP.ilp -e 800 --gpu==0
```
== Deploy the model for use in DeepImageJ
After the training is complete, save it to DeepImageJ format. You do not need to do this on the GPU nodes.
```
yapic deploy model.h5 /scratch/izar/YOUR_USERNAME/YOUR_PROJECT/EXAMPLE.tif MODEL_NAME --skip-predict
```
Using WinSCP, Move the folder `MODEL_NAME` back to your Fiji installation in `Fiji.app\models`
You should now be able to access it by running DeepImageJ
= Using The Cluster with Jupyter Notebooks =
= First Time Setup =
Create environment
`virtualenv --system-site-packages -p python3 env-biop-tf2`
== Activate environment ==
source env-biop-tf2/bin/activate
== Install TF ==
```
module load python gcc/8.4.0-cuda mvapich2/2.3.4-cuda py-tensorflow
pip install ipython tensorboard
```
== Test ==
ipython
import tensorflow as tf
= Run on the Cluster =
Request time as ptbiop user
```
salloc -t 2:0:0 -A ptbiop --exclusive
squeue -u $USER
ssh NAME OF ALLOCATED NODE
module load python gcc/8.4.0-cuda mvapich2/2.3.4-cuda py-tensorflow
source env-biop-tf2/bin/activate
```
Now you can run stuff
NOTE: Nicola will update on the Jupyter Notebook setup
= Transfering data to and from the cluster =
This is a distilled version of what is available here:
https://scitas-data.epfl.ch/confluence/display/DOC/Data+Transfer+Nodes
== Some background ==
There are two locations where you can place data on the SCITAS clusters
1. Your `home` directory
2. Your `scratch` directory
The `home` directory is for your configuration files, your virtual environments, anything permanent you need to run what you need
The `scratch` directory is volatile, meaning it can go down anytime, but it is for large data, so your individual projects, datasets and results should go here and be transfered back to your lab share at the end of processing.
Moving data back and forth means connecting to the SCITAS Transfer nodes. Multiple solutions exist, but Oli has opted for WinSCP, as it has a GUI that makes transferring from Windows machines more comfortable.
== First time setup ==
Subscribe to [[ https://groups.epfl.ch/cgi-bin/groups/viewgroup?groupid=S19545 | HPC-DATAMOVERS ]] after zou have been granted a SCITAS account
You **CANNOT** login using your username and password. You need to use public key authentication.
Download and install [[ https://winscp.net/eng/index.php | WinSCP ]]
=== Create a private-public key pair ===
Open WinSCP and create a new site:
```
File protocol: SCP
Host name: fdata1.epfl.ch
User name: **your user name**
```
# Click on {nav Advanced > Authentication (Under SSH) > Tools > type=instructions, name=Generate New Key Pair with PuTTYGen}
# Click on {nav Generate} and follow the instructions to add randomness.
# Add a Key comment like "SCITAS Cluster Key"
# Add a passphrase if you want
# Click on {nav Save Private Key} and save it under `C:\Users\username\.ssh` as `scitas_access.ppk`
# (Optional) Click on {Save Public Key} and save it somewhere else if you want as `scitas_access.pub` (it is not needed as such for this protocol)
# Close PuTTYGen and back to WinSCP browse to your private key file
# Click on {nav Display Public Key} and copy it to your clipboard
# Open a command prompt and ssh into izar : `ssh izar.epfl.ch -l username`
# Create a `.ssh` folder: `mkdir .ssh`
# Change the permissions: `chmod 700 .ssh`
# Paste your public key inside: `echo 'PASTE_FROM_CLIPBOARD' >> .ssh/authorized_keys`
# Back to WinSCP, Save the Site as `SCITAS fdata1`
**Now you should be able to connect from WinSCP**
NOTE: If you added a passphrase, you will need to provide it now. It is not your Gaspar password
NOTE: Remember: Data should go to `/scratch/username`
= TensorFlow2 environment setup =
Login to `izar`: `ssh izar.epfl.ch -l oburri`
== Create environment called env-biop-tf2 ==
```
module load gcc/8.4.0-cuda mvapich2/2.3.4 py-tensorflow/2.3.1-cuda-mpi python
virtualenv --system-site-packages -p python3 env-biop-tf2
```
== Activate environment ==
```
source env-biop-tf2/bin/activate
```
== Install what you need ==
```
pip install ipython tensorboard etc...
```
== Test that you have access to the GPUs ==
```
ipython
import tensorflow as tf
```
= Run on the Cluster =
Request time as ptbiop user
```
salloc -t 2:0:0 -A ptbiop --exclusive
squeue -u $USER
ssh NAME OF ALLOCATED NODE
module load gcc/8.4.0-cuda mvapich2/2.3.4 py-tensorflow/2.3.1-cuda-mpi python
source env-biop-tf2/bin/activate
```
Now you can run stuff
NOTE: Nicola will update on the Jupyter Notebook setup
= YAPIC Setup =
NOTE: We need to work with TF 1.15 in YAPIC in order to be able to export YAPIC models to the ModelZoo structure for reuse in Fiji.
== First time setup ==
You should do this from the GPU node
```
salloc -t 2:0:0 -A ptbiop --exclusive
squeue -u $USER
ssh NAME OF ALLOCATED NODE
```
Then here we can install and setup Python with TF 1.15
```
module load gcc/8.4.0-cuda cuda/10.2.89 cudnn/7.6.5.32-10.2-linux-x64 python
virtualenv -p python3 --system-site-packages yapic-tf-1.15
source yapic-tf-1.15/bin/activate
pip install tensorflow-gpu==1.15 yapic
cd yapic-tf-1.15/lib64
ln -s $CUDA_ROOT/lib64/libcudart.so libcudart.so.10.0
ln -s $CUDA_ROOT/lib64/libcublas.so libcublas.so.10.0
ln -s $CUDA_ROOT/lib64/libcufft.so libcufft.so.10.0
ln -s $CUDA_ROOT/lib64/libcurand.so libcurand.so.10.0
ln -s $CUDA_ROOT/lib64/libcusolver.so libcusolver.so.10.0
ln -s $CUDA_ROOT/lib64/libcusparse.so libcusparse.so.10.0
export LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH
```
== Runnin YAPIC Training ==
Data must be in the form of `.tif` files for this to work. Follow the [[ https://yapic.github.io/yapic/tutorial.html | YAPIC tutorial to see the shape of your data folder]]
1. Copy your local data to the cluster using `WinSCP` in /scratch/izar/YOUR_USERNAME`
2. Request time: `salloc -t 10:0:0 -A ptbiop --exclusive`
3. Login to the node with `squeue -u $USER` followed by `ssh NAME_OF_ALLOCATED_NODE`
4. Load the necessary modules `module load gcc/8.4.0-cuda cuda/10.2.89 cudnn/7.6.5.32-10.2-linux-x64 python`
5. Start the environment: `source yapic-tf-1.15/bin/activate`
6. Start YAPIC:
```
yapic train unet_2d "/scratch/izar/YOUR_USERNAME/YOUR_PROJECT/*.tif" /scratch/izar/YOUR_USERNAME/YOUR_PROJECT/YOUR_ILP.ilp -e 800 --gpu==0
```
== Deploy the model for use in DeepImageJ
After the training is complete, save it to DeepImageJ format. You do not need to do this on the GPU nodes.
```
yapic deploy model.h5 /scratch/izar/YOUR_USERNAME/YOUR_PROJECT/EXAMPLE.tif MODEL_NAME --skip-predict
```
Using WinSCP, Move the folder `MODEL_NAME` back to your Fiji installation in `Fiji.app\models`
You should now be able to access it by running DeepImageJ
= Using The Cluster with Jupyter Notebooks =
c4science · Help