This project proposes a way to increase the performance of a sequential code using classical parallelism technic. It is based on a Matlab code of \textit{Nicolas Richart} that simulates the evolution of a tsunami. The purpose of this report is to describe the state of the current project and to describe how the sequential code will parallelized. It also gives the expected results for the speed up, as well as the strong scaling of the problem.
\section{Description of the Code}
The code is arranged in the following way. The algorithm will compute the evolution of the height of the wave $H(x,y)$, based on the knowledge of the topography and the evolution of the speed of the wave in the direction $x$ and $y$. In order to start the computation, the algorithm first read the initial conditions for the different values ($H$,$HU$,$HV$, $Zdx$, $Zdy$) in binary files. The duration of the evolution $T_{end}$ is specified by the user. While this time is not reached, the algorithm will compute the next variable time step and the evolution of the height and the speed : $H$,$HU$,$HV$. The update of the variables only needs information of the direct neighbor of a point (x,y). When the simulation is over the height of the wave at the final time step is stored in a file.
\section{Parallelization}
The sequential code will be parallelized using \texttt{MPI}. The reader, writer and time step computation as well as the update of the height and the speed $H,HU,HV$ will be implemented.
The parallelization of the reader and writer will make use of the Parallel I/O of MPI. Computing the next time step implies that the max over a grid should be found. This can be done in a parallel fashion using the following design : as \texttt{C++} is a row major language, the grid will be split between nodes in a block of $\frac{N}{P}$ complete row. Each processor will then return the max over his subgrid and the final max will be computed at the end, this final computation will be done on a single node. The parallelization of the update of $H$, $HU$, $HV$ will be approached in a similar manner. The grid will be split as before. Before and after each subgrid (except the first one and the last one) a row of ghost cells will be added, in order to allow the computation. Most of the code will therefore be parallelized.
\section{Performance}
Translating the code from \texttt{Matlab} to \texttt{C++} allows us to compute the solution approximately 5 times faster. This performance will be improved drastically using parallelism technic.
\begin{center}
\begin{table}[h]
\begin{tabular}{l*{3}{r}r}
& Matlab [s] & C++ [s]&\\
\hline
2001x2001 & 1233.068 & 226.741 &\\
4001x4001 & 10625.835 & 2003.880 &\\
\end{tabular}
\caption{Total time in seconds for a run on a 3.1 GHz Intel i5}
\label{Table1}
\end{table}
\end{center}
\section{Theorethical results}
\subsection{Amdahl's Law - Strong scaling}
The speed up of a parallelism problem depends mostly on the percentage of sequential code that can be parallelized. In the case of this problem, most of the code can be parallelized. A first estimation can be deduced from the profiling table [TAB. \ref{Table2}]. Summing up the three main functions that can be parallelized (\texttt{compute\_step, compute\_mu\_and\_set\_dt} and \texttt{imposeTolerances}) gives a value of 98.06\% of code parallelizable. The speed up is given by Amdahl's Law :
\begin{equation}
S_p = \frac{1}{\alpha + \frac{1-\alpha}{p}}
\end{equation}
Where $\alpha$ is the fraction of non-parallelizable code, in this case 1.94\%, and $p$ is the number of cores that is used to compute the simulation.
\subsection{Expected Speed Up}
Computing the speed up using strong scaling for this problem gives the following curve [FIG. \ref{SpeedUp}]. This theoretical result will be compared with the real one when the problem is parallelized.
\onecolumngrid
\newpage
\appendix
\section{Profiling}
\begin{center}
\begin{table}[h]
\begin{tabular}{l*{7}{r}r}
\%& cumulative & self && self & total &\\
time & seconds & seconds & calls & ms/call & ms/call & name \\