SCITAS GPU Clusters Protocols
Transfering data to and from the cluster
This is a distilled version of what is available here:
https://scitas-data.epfl.ch/confluence/display/DOC/Data+Transfer+Nodes
Some background
There are two locations where you can place data on the SCITAS clusters
- Your home directory
- Your scratch directory
The home directory is for your configuration files, your virtual environments, anything permanent you need to run what you need
The scratch directory is volatile, meaning it can go down anytime, but it is for large data, so your individual projects, datasets and results should go here and be transfered back to your lab share at the end of processing.
Moving data back and forth means connecting to the SCITAS Transfer nodes. Multiple solutions exist, but Oli has opted for WinSCP, as it has a GUI that makes transferring from Windows machines more comfortable.
First time setup
Subscribe to HPC-DATAMOVERS after zou have been granted a SCITAS account
You CANNOT login using your username and password. You need to use public key authentication.
Download and install WinSCP
Create a private-public key pair
Open WinSCP and create a new site:
File protocol: SCP Host name: fdata1.epfl.ch User name: **your user name**
- Click on Advanced → Authentication (Under SSH) → Tools → Generate New Key Pair with PuTTYGen
- Click on Generate and follow the instructions to add randomness.
- Add a Key comment like "SCITAS Cluster Key"
- Add a passphrase if you want
- Click on Save Private Key and save it under C:\Users\username\.ssh as scitas_access.ppk
- (Optional) Click on {Save Public Key} and save it somewhere else if you want as scitas_access.pub (it is not needed as such for this protocol)
- Close PuTTYGen and back to WinSCP browse to your private key file
- Click on Display Public Key and copy it to your clipboard
- Open a command prompt and ssh into izar : ssh izar.epfl.ch -l username
- Create a .ssh folder: mkdir .ssh
- Change the permissions: chmod 700 .ssh
- Paste your public key inside: echo 'PASTE_FROM_CLIPBOARD' >> .ssh/authorized_keys
- Back to WinSCP, Save the Site as SCITAS fdata1
Now you should be able to connect from WinSCP
TensorFlow2 environment setup
Login to izar: ssh izar.epfl.ch -l oburri
Create environment called env-biop-tf2
module load gcc/8.4.0-cuda mvapich2/2.3.4 py-tensorflow/2.3.1-cuda-mpi python virtualenv --system-site-packages -p python3 env-biop-tf2
Activate environment
source env-biop-tf2/bin/activate
Install what you need
pip install ipython tensorboard etc...
Test that you have access to the GPUs
ipython import tensorflow as tf
Run on the Cluster
Request time as ptbiop user
salloc -t 2:0:0 -A ptbiop --exclusive squeue -u $USER ssh NAME OF ALLOCATED NODE module load gcc/8.4.0-cuda mvapich2/2.3.4 py-tensorflow/2.3.1-cuda-mpi python source env-biop-tf2/bin/activate
Now you can run stuff
YAPIC Setup
First time setup
You should do this from the GPU node
salloc -t 2:0:0 -A ptbiop --exclusive squeue -u $USER ssh NAME OF ALLOCATED NODE
Then here we can install and setup Python with TF 1.15
module load gcc/8.4.0-cuda cuda/10.2.89 cudnn/7.6.5.32-10.2-linux-x64 python virtualenv -p python3 --system-site-packages yapic-tf-1.15 source yapic-tf-1.15/bin/activate pip install tensorflow-gpu==1.15 yapic cd yapic-tf-1.15/lib64 ln -s $CUDA_ROOT/lib64/libcudart.so libcudart.so.10.0 ln -s $CUDA_ROOT/lib64/libcublas.so libcublas.so.10.0 ln -s $CUDA_ROOT/lib64/libcufft.so libcufft.so.10.0 ln -s $CUDA_ROOT/lib64/libcurand.so libcurand.so.10.0 ln -s $CUDA_ROOT/lib64/libcusolver.so libcusolver.so.10.0 ln -s $CUDA_ROOT/lib64/libcusparse.so libcusparse.so.10.0 export LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH
Runnin YAPIC Training
Data must be in the form of .tif files for this to work. Follow the YAPIC tutorial to see the shape of your data folder
- Copy your local data to the cluster using WinSCP in /scratch/izar/YOUR_USERNAME`
- Request time: salloc -t 10:0:0 -A ptbiop --exclusive
- Login to the node with squeue -u $USER followed by ssh NAME_OF_ALLOCATED_NODE
- Load the necessary modules module load gcc/8.4.0-cuda cuda/10.2.89 cudnn/7.6.5.32-10.2-linux-x64 python
- Start the environment: source yapic-tf-1.15/bin/activate
- Start YAPIC:
yapic train unet_2d "/scratch/izar/YOUR_USERNAME/YOUR_PROJECT/*.tif" /scratch/izar/YOUR_USERNAME/YOUR_PROJECT/YOUR_ILP.ilp -e 800 --gpu==0
Deploy the model for use in DeepImageJ
After the training is complete, save it to DeepImageJ format. You do not need to do this on the GPU nodes.
yapic deploy model.h5 /scratch/izar/YOUR_USERNAME/YOUR_PROJECT/EXAMPLE.tif MODEL_NAME --skip-predict
Using WinSCP, Move the folder MODEL_NAME back to your Fiji installation in Fiji.app\models
You should now be able to access it by running DeepImageJ
Using The Cluster with Jupyter Notebooks
- Last Author
- oburri
- Last Edited
- Jan 4 2021, 14:38