Page MenuHomec4science

basic_concepts.tex
No OneTemporary

File Metadata

Created
Mon, Jul 22, 17:39

basic_concepts.tex

\renewcommand{\FIGREP}{src/basic_concepts/figures}
\section{Basic concepts}
\label{sec:basic_concepts}
\intersec{fidis}
\begin{frame}
\frametitle{Goal of this section}
\framesubtitle{}
\begin{itemize}
\item Cluster access using \cmd{ssh}
\item Data transfers using \cmd{scp} and/or \cmd{rsync}
\item Software modules on the cluster
\item Code compilation
\item Debugging
\end{itemize}
\end{frame}
\subsection{Connection to the cluster}
\label{sec:cluster_connection}
\begin{frame}[fragile]
\frametitle{Connection to the cluster}
\framesubtitle{}
\begin{itemize}
\item Secure shell or better known as SSH
\begin{bashcode}
$> man ssh
ssh (SSH client) is a program for logging into a remote machine and
for executing commands on a remote machine. It is intended to
provide secure encrypted communications [...]. X11 connections [...]
can also be forwarded over the secure channel.
\end{bashcode}%$
\item Basic usage
\begin{bashcode}
$> ssh <user>@<host>
\end{bashcode}%$
\cmd{<user>} is your GASPAR username, \cmd{<host>} is any of \cmd{{helvetios,izar,jed}.epfl.ch}
\item Example
\begin{bashcode}
$> ssh jdoe@helvetios.epfl.ch
password: *****
\end{bashcode}%$
\end{itemize}
\end{frame}
\note{
\begin{itemize}
\item Read the manual/documentation
\item Pause to test connection
\end{itemize}
}
\begin{frame}[fragile]
\frametitle{Connection to the cluster}
\framesubtitle{Optional step}
\begin{itemize}
\item The alternative to password is to use a cryptographic key pair
\begin{bashcode}
$> ssh-keygen -b 4096 -t rsa
[Follow the instructions]
$> ssh-copy-id jdoe@helvetios.epfl.ch
\end{bashcode}
\item \cmd{ssh-keygen} generates a public/private key pair
\item By default, they are found in \cmd{~/.ssh}
\begin{itemize}
\item \cmd{id_rsa.pub} is the public key and can be shared with anyone (\cmd{ssh-copy-id} copies it to the remote)
\item \cmd{id_rsa} is the private key and it is \textbf{SECRET}!
\end{itemize}
\end{itemize}
\end{frame}
\note{
\begin{itemize}
\item Careful with \cmd{ssh-keygen} to not overwrite existing keys
\item \cmd{ssh-copy-id} copies all the keys
\item Pause to test
\end{itemize}
}
\subsection{Data transfer}
\label{sec:data_transfer}
\begin{frame}[fragile]
\frametitle{Data transfer}
\framesubtitle{}
\begin{itemize}
\item We are working remotely and need to get your data back
locally
\item There are two main commands:
\begin{bashcode}
$> man scp
scp copies files between hosts on a network. It uses ssh for data transfer, and uses the same authentication and provides the same security as ssh.
\end{bashcode}
\begin{bashcode}
$> man rsync
Rsync is a fast and extraordinarily versatile file copying tool. It can copy
locally, to/from another host over any remote shell, or to/from a remote rsync
daemon. [...] It is famous for its delta-transfer algorithm, which reduces the
amount of data sent over the network by sending only the differences between the
source files and the existing files in the destination. Rsync is widely used for
backups and mirroring and as an improved copy command for everyday use.
\end{bashcode}
\item Similar usage pattern. The path on a remote host is written
\cmd{hostname:/path/to/file}. For example
\begin{bashcode}
$> scp jdoe@helvetios.epfl.ch:src/myCode/file.c src/
\end{bashcode}%$
\end{itemize}
\end{frame}
\subsection{Software modules}
\label{sec:modules}
\begin{frame}
\frametitle{Software modules}
\framesubtitle{A tool to organize your environment}
\begin{itemize}
\item HPC clusters are particular because many software and versions of
them are installed alongside
\item We need a tool to make the software easily available to everyone
\item The main tools used today are \cmd{environment-modules} and \cmd{Lmod}
\item At SCITAS, we chose \cmd{Lmod} (we'll see later why)
\end{itemize}
\pause
\vfill
\begin{itemize}
\item Those tools package different software and their configurations into
\textit{modules}
\item When you need to use a software, you need to \textit{load} the
corresponding module
\item Examples of (made-up) modules:
\begin{itemize}
\item \cmd{intel-19.0.2}: provides Intel compiler version 19.0.2
\item \cmd{data-analysis}: provides tools for data-analysis such as
Python with different packages, and Matlab
\end{itemize}
\end{itemize}
\end{frame}
\begin{frame}
\frametitle{Software modules}
\framesubtitle{Quick tutorial}
\begin{itemize}
\item Lmod is called using the \cmd{module} command followed by an action:
\begin{itemize}
\item \cmd{avail}: print a list of available modules
\item \cmd{load/unload <modules>}: load/unload the \cmd{<modules>}
\item \cmd{purge}: unload all modules
\item \cmd{swap <module1> <module2>}: swap \cmd{<module1>} for \cmd{<module2>}
\item \cmd{list}: print a list of currently loaded modules
\item \cmd{spider <module>}: print all possible versions of \cmd{<module>}
\item \cmd{show}: print the module configuration
\item \cmd{save/restore <name>}: save/restore current module collection under \cmd{<name>}
\item \cmd{help}: print help
\end{itemize}
\item Many of those commands are also available in \cmd{environment-modules}
\end{itemize}
\end{frame}
\begin{frame}
\frametitle{Software modules}
\framesubtitle{Lmod main strength}
\begin{itemize}
\item Lmod supports a hierarchical software stack
\item When you switch a module, it will automatically reload the ones
depending on it
\item You need to load a compiler, and an MPI and BLAS implementation to
have access to all modules
\end{itemize}
\end{frame}
\note{
\begin{itemize}
\item Demo time !
\item Module avail to see available modules. Not all are present.
\item Load a compiler, avail, mpi, avail, BLAS, avail
\item Change compiler version
\item Save, purge, restore
\item Spider
\item Introduce ml
\end{itemize}
}
\subsection{Compilation}
\label{sec:compilation}
\begin{frame}
\frametitle{Compilation}
\framesubtitle{0100101110101001010...}
\begin{itemize}
\item A computer only understands ON and OFF states (1 and 0)
\item It would be very inconvenient for us to code in binary
\item We therefore use different levels of abstraction (languages), e.g. C, C++, Fortran
\item We need a translator!
\end{itemize}
\end{frame}
\begin{frame}
\frametitle{Compilation}
\framesubtitle{The four compilation steps}
\begin{itemize}
\item Translation is made by a compiler in 4 steps
\begin{description}
\item[Preprocessing] Format source code to make it ready for compilation (remove comments, execute preprocessing directives such as \cxxinline{\#include}, etc.)
\item[Compiling] Translate the source code (C, C++, Fortran, etc) into assembly, a very basic CPU-dependent language
\item[Assembly] Translate the assembly into machine code and store it in object files
\item[Linking] Link all the object files into one executable
\end{description}
\item In practice, the first three steps are combined together and simply
called ``compiling''
\end{itemize}
\end{frame}
\begin{frame}[t,fragile]
\frametitle{Compilation}
\framesubtitle{The four compilation steps (visually)}
\hspace{6cm}
\begin{minipage}{0.5\textwidth}
\begin{itemize}
\item<5> Note that in reality, everything is done transparently
\begin{bashcode}
$> gcc -c file_1.c
$> gcc -c file_2.c
$> gcc file_1.o file_2.o -lexample -o exec
\end{bashcode}%$
\end{itemize}
\end{minipage}
\onslide<1>\addimage[width=12cm]{\FIGREP/compilation_steps_0.pdf}{2cm}{1cm}
\onslide<2>\addimage[width=12cm]{\FIGREP/compilation_steps_1.pdf}{2cm}{1cm}
\onslide<3>\addimage[width=12cm]{\FIGREP/compilation_steps_2.pdf}{2cm}{1cm}
\onslide<4>\addimage[width=12cm]{\FIGREP/compilation_steps_3.pdf}{2cm}{1cm}
\onslide<5>\addimage[width=12cm]{\FIGREP/compilation_steps_4.pdf}{2cm}{1cm}
\end{frame}
\subsection{Debugging}
\label{sec:debugging}
\begin{frame}
\frametitle{Debugging}
\framesubtitle{A few advices}
\begin{itemize}
\item Why bother debugging?
\begin{itemize}
\item Studies\footnote{Code Complete, S. McConnell} show $\sim$ 20 bugs/kloc in industry codes
\item You don't want to find a bug when on a deadline
\end{itemize}
\item Only optimize a correct code
\end{itemize}
\vfill
\begin{itemize}
\item There are different types of bugs:
\begin{description}
\item[Syntax error] A code keyword is misspelled, e.g. \cmd{dobule}
instead of \cmd{double}. The code doesn't compile and the compiler
tells you where is the error.
\item[Runtime error] Division by 0 (fpe), out of bound access (seg. fault), etc. The code
compiles fine, but will (most likely) crash at runtime.
\item[Logical errors] Mistake that leads to an incorrect or unexpected
behavior. You want to compute a distance from a velocity and a time,
but you use an acceleration instead.
\end{description}
\item Logical errors are clearly the most dangerous! The compiler doesn't
complain and your code runs. You need to test it!
\end{itemize}
\end{frame}
\note{
\begin{itemize}
\item This is a parallel programming course. Why bother with debugging?
\item After all, it is boring and time consuming. You could use this time to
make your code faster instead!
\item Small test cases
\item Typical ticket: works on my machine but not clusters -> bug on the
clusters
\end{itemize}
}
\begin{frame}
\frametitle{Debugging}
\framesubtitle{A few advices}
\begin{itemize}
\item Write tests (unit tests, application tests)!
\item Write tests!
\item Ask the compiler to complain (\cmd{-g -Wall -Wextra})
\item Use debuggers (gdb, TotalView, Alinea DDT)
\item Use memory checkers (Valgrind, TotalView, Intel Inspector,
\cmd{-fsanitize=address})
\item Don't use print statements (Heisenbug)
\end{itemize}
\end{frame}
\note{
\begin{itemize}
\item Test!
\item Compiler produces warnings that indicate possible source of bugs
(uninitialized value, missleading indentation, loss of precision, ...)
\item Heisenbug
\end{itemize}
}
\begin{frame}[exercise]
\frametitle{Basic debugging}
\framesubtitle{Write overflow}
\begin{itemize}
\item In the \texttt{\bf debugging} folder, make the executable.
\item Execute the \texttt{\bf ./write} executable
\item Run the code with \texttt{\bf gdb}\\
\texttt{\$ gdb ./write}
\item Run the code in gdb with \texttt{run} in gdb, it should stop at the line
where the segfault happens.
\item You can print the value of the variables with \texttt{print}\\
\texttt{(gdb) print i}
\texttt{(gdb) print data}
\item At this point you should see the bug
\end{itemize}
\end{frame}
\begin{frame}[exercise]
\frametitle{Basic debugging}
\framesubtitle{Read overflow}
\begin{itemize}
\item Execute the \texttt{\bf ./read} executable
\item It might run fine but there is a bug.
\item Run the code with \texttt{\bf valgrind}\\
\texttt{\$ valgrind ./read}
\item You can also compile with special sanitize options (this works only with gcc and clang).
\texttt{\$ CXXFLAGS=-fsanitize=address make}
In this case the bound check is always done at execution.
\end{itemize}
\end{frame}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../../phys_743_parallel_programming"
%%% End:

Event Timeline