\item We will focus on the most interesting metrics for HPC
\end{itemize}
\vfill
\pause
\begin{itemize}
\item The first that comes in mind is \textit{time}, e.g. time-to-solution
\item Derived metrics: speedup and efficiency
\end{itemize}
\vfill
\pause
\begin{itemize}
\item Scientific codes do computations on floating point numbers
\item A second metric is the number of \textit{floating-point operations per second}
(\si{\flops})
\end{itemize}
\vfill
\pause
\begin{itemize}
\item Finally, the \textit{memory bandwidth} indicates how much data does your code
transfers per unit of time
\end{itemize}
\end{frame}
\note{
\begin{itemize}
\item My code is super fast, it runs in $2.5\si{\ns}$!
\item It seems fast, but is it? How fast can your hardware go?
\item To really understand how much your code exploit the hardware, we use
the \si{\flops} and memory BW
\item Your hardware has theoretical maximum values for those
\item You can compare the values from your code to the max to see how well
you use the hardware
\end{itemize}
}
\subsection{Profiling}
\label{sec:profiling}
\begin{frame}
\frametitle{Profiling}
\framesubtitle{A tool to measure various timings}
\begin{itemize}
\item Where is my application spending most of its time?
\begin{itemize}
\item (bad) measure time ``by hand'' using timings and prints
\item (good) use a tool made for this, e.g. Intel Amplifier, Score-P,
gprof
\end{itemize}
\end{itemize}
\vfill
\begin{itemize}
\item There are two types of profiling techniques
\begin{itemize}
\item Sampling: you stop the code every now and then and check in
which function you are
\item Code instrumentation: instructions are added at compile time
to trigger measurements
\end{itemize}
\end{itemize}
\vfill
\begin{itemize}
\item In addition to timings, profilers give you a lot more information on
\begin{itemize}
\item Memory usage
\item Hardware counters
\item CPU activity
\item MPI communications
\item etc.
\end{itemize}
\end{itemize}
\end{frame}
\begin{frame}[fragile,exercise]
\frametitle{Profiling}
\framesubtitle{Interactive demonstration}
\begin{itemize}
\item For the purpose of this exercise, we will use MiniFE
\begin{itemize}
\item 3D implicit finite-elements on an unstructured mesh
\item C++ mini application
\item \url{https://github.com/Mantevo/miniFE}
\item You don't need to understand what the code does!
\end{itemize}
\item We will use Intel VTune, part of the \href{https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html\#base-kit}{OneAPI Base toolkit (free)}
\end{itemize}
\vfill
\begin{itemize}
\item Download miniFE
\item Compile the basic version found in \cmd{ref/src}
\item Profile the code using the hotspot analysis
\item Open Intel VTune and select your timings
\item Play around and find the 5 most time-consuming functions