Page MenuHomec4science

SCITAS GPU Clusters Protocols
Updated 1,198 Days AgoPublic

Transfering data to and from the cluster

This is a distilled version of what is available here:
https://scitas-data.epfl.ch/confluence/display/DOC/Data+Transfer+Nodes

Some background

There are two locations where you can place data on the SCITAS clusters

  1. Your home directory
  2. Your scratch directory

The home directory is for your configuration files, your virtual environments, anything permanent you need to run what you need
The scratch directory is volatile, meaning it can go down anytime, but it is for large data, so your individual projects, datasets and results should go here and be transfered back to your lab share at the end of processing.

Moving data back and forth means connecting to the SCITAS Transfer nodes. Multiple solutions exist, but Oli has opted for WinSCP, as it has a GUI that makes transferring from Windows machines more comfortable.

First time setup

Subscribe to HPC-DATAMOVERS after zou have been granted a SCITAS account
You CANNOT login using your username and password. You need to use public key authentication.
Download and install WinSCP

Create a private-public key pair

Open WinSCP and create a new site:

File protocol: SCP
Host name: fdata1.epfl.ch
User name: **your user name**
  1. Click on AdvancedAuthentication (Under SSH)ToolsGenerate New Key Pair with PuTTYGen
  2. Click on Generate and follow the instructions to add randomness.
  3. Add a Key comment like "SCITAS Cluster Key"
  4. Add a passphrase if you want
  5. Click on Save Private Key and save it under C:\Users\username\.ssh as scitas_access.ppk
  6. (Optional) Click on {Save Public Key} and save it somewhere else if you want as scitas_access.pub (it is not needed as such for this protocol)
  7. Close PuTTYGen and back to WinSCP browse to your private key file
  8. Click on Display Public Key and copy it to your clipboard
  9. Open a command prompt and ssh into izar : ssh izar.epfl.ch -l username
  10. Create a .ssh folder: mkdir .ssh
  11. Change the permissions: chmod 700 .ssh
  12. Paste your public key inside: echo 'PASTE_FROM_CLIPBOARD' >> .ssh/authorized_keys
  13. Back to WinSCP, Save the Site as SCITAS fdata1

Now you should be able to connect from WinSCP

NOTE: If you added a passphrase, you will need to provide it now. It is not your Gaspar password
NOTE: Remember: Data should go to /scratch/username

TensorFlow2 environment setup

Login to izar: ssh izar.epfl.ch -l oburri

Create environment called env-biop-tf2

module load gcc/8.4.0-cuda  mvapich2/2.3.4 py-tensorflow/2.3.1-cuda-mpi python
virtualenv --system-site-packages -p python3 env-biop-tf2

Activate environment

source env-biop-tf2/bin/activate

Install what you need

pip install ipython tensorboard etc...

Test that you have access to the GPUs

ipython 
import tensorflow as tf

Run on the Cluster

Request time as ptbiop user

salloc -t 2:0:0 -A ptbiop --exclusive
squeue -u $USER
ssh NAME OF ALLOCATED NODE
module load gcc/8.4.0-cuda  mvapich2/2.3.4 py-tensorflow/2.3.1-cuda-mpi python
source env-biop-tf2/bin/activate

Now you can run stuff

NOTE: Nicola will update on the Jupyter Notebook setup

YAPIC Setup

NOTE: We need to work with TF 1.15 in YAPIC in order to be able to export YAPIC models to the ModelZoo structure for reuse in Fiji.

First time setup

You should do this from the GPU node

salloc -t 2:0:0 -A ptbiop --exclusive
squeue -u $USER
ssh NAME OF ALLOCATED NODE

Then here we can install and setup Python with TF 1.15

module load gcc/8.4.0-cuda   cuda/10.2.89   cudnn/7.6.5.32-10.2-linux-x64 python
virtualenv -p python3 --system-site-packages yapic-tf-1.15
source yapic-tf-1.15/bin/activate
pip install tensorflow-gpu==1.15 yapic
cd yapic-tf-1.15/lib64
ln -s $CUDA_ROOT/lib64/libcudart.so libcudart.so.10.0
ln -s $CUDA_ROOT/lib64/libcublas.so libcublas.so.10.0
ln -s $CUDA_ROOT/lib64/libcufft.so libcufft.so.10.0
ln -s $CUDA_ROOT/lib64/libcurand.so libcurand.so.10.0
ln -s $CUDA_ROOT/lib64/libcusolver.so libcusolver.so.10.0
ln -s $CUDA_ROOT/lib64/libcusparse.so libcusparse.so.10.0
export LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH

Runnin YAPIC Training

Data must be in the form of .tif files for this to work. Follow the YAPIC tutorial to see the shape of your data folder

  1. Copy your local data to the cluster using WinSCP in /scratch/izar/YOUR_USERNAME`
  2. Request time: salloc -t 10:0:0 -A ptbiop --exclusive
  3. Login to the node with squeue -u $USER followed by ssh NAME_OF_ALLOCATED_NODE
  4. Load the necessary modules module load gcc/8.4.0-cuda cuda/10.2.89 cudnn/7.6.5.32-10.2-linux-x64 python
  5. Start the environment: source yapic-tf-1.15/bin/activate
  6. Start YAPIC:
yapic train unet_2d "/scratch/izar/YOUR_USERNAME/YOUR_PROJECT/*.tif" /scratch/izar/YOUR_USERNAME/YOUR_PROJECT/YOUR_ILP.ilp -e 800 --gpu==0

Deploy the model for use in DeepImageJ

After the training is complete, save it to DeepImageJ format. You do not need to do this on the GPU nodes.

yapic deploy model.h5 /scratch/izar/YOUR_USERNAME/YOUR_PROJECT/EXAMPLE.tif MODEL_NAME --skip-predict

Using WinSCP, Move the folder MODEL_NAME back to your Fiji installation in Fiji.app\models

You should now be able to access it by running DeepImageJ

Using The Cluster with Jupyter Notebooks

Last Author
oburri
Last Edited
Jan 4 2021, 14:38

Event Timeline

oburri created this document.Oct 26 2020, 09:08
oburri edited the content of this document. (Show Details)
oburri edited the content of this document. (Show Details)Oct 26 2020, 09:55
oburri edited the content of this document. (Show Details)Oct 27 2020, 13:32
oburri edited the content of this document. (Show Details)Oct 27 2020, 14:29
oburri published a new version of this document.Oct 28 2020, 08:41
oburri edited the content of this document. (Show Details)Oct 28 2020, 15:44
oburri edited the content of this document. (Show Details)Dec 10 2020, 09:51
oburri changed the visibility from "Restricted Project (Project)" to "All Users".Dec 10 2020, 11:05
oburri changed the visibility from "All Users" to "Public (No Login Required)".
oburri changed the title from Clusters to SCITAS GPU Clusters Protocols.Dec 10 2020, 11:16
oburri edited the content of this document. (Show Details)Dec 10 2020, 13:00
oburri edited the content of this document. (Show Details)Dec 10 2020, 13:09
oburri published a new version of this document.
oburri edited the content of this document. (Show Details)Dec 11 2020, 13:45
oburri edited the content of this document. (Show Details)Dec 14 2020, 14:18
oburri edited the content of this document. (Show Details)Dec 14 2020, 15:56
oburri edited the content of this document. (Show Details)Dec 14 2020, 16:13
oburri published a new version of this document.
oburri edited the content of this document. (Show Details)Jan 4 2021, 14:38