% Change "article" to "report" to get rid of page number on title page
% !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
% Here put your info (name, due date, title etc).
% the rest should be left unchanged.
% !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
% Homework Specific Information
\newcommand{\hmwkTitle}{Exercise 8}
\newcommand{\hmwkClass}{Introduction to multiprocessor architecture}
%\newcommand{\hmwkClassInstructor}{Prof. Oleg Yazyev}
\newcommand{\hmwkAuthorName}{Raffaele Ancarola}
% In case you need to adjust margins:
\topmargin=-0.45in %
\evensidemargin=0in %
\oddsidemargin=0in %
\textwidth=6.5in %
\textheight=9.5in %
\headsep=0.25in %
% This is the color used for comments below
% units of mesure
% comments
% pgfplots environment
%package setup for graphs
%pgfplot setup
/pgfplots/flexible xticklabels from table/.code n args={3}{%
% layer definition
layers/my layer set/.define layer set={
% you could state styles here which should be moved to
% corresponding layers, but that is not necessary here.
% That is why we don't state anything here
% activate the newly created layer set
set layers=my layer set
% pfgplots add graphics
% set options in this local group (will be lost afterwards):
\pgfqkeys{/pgfplots/plot graphics}{#1}%
% measure the natural size of the graphics:
% compute the required unit vector ratio:
\pgfmathparse{\wd0/(\pgfkeysvalueof{/pgfplots/plot graphics/xmax} - \pgfkeysvalueof{/pgfplots/plot graphics/xmin})}%
\pgfmathparse{\ht0/(\pgfkeysvalueof{/pgfplots/plot graphics/ymax} - \pgfkeysvalueof{/pgfplots/plot graphics/ymin})}%
% configure pgfplots to use it.
% The \xdef expands all macros except those prefixed by '\noexpand'
% and assigns the result to a global macro named '\marshal'.
\noexpand\pgfplotsset{unit vector ratio={\xunit\space \yunit}}%
% use our macro here:
\addplot graphics[#1] {#2};
% For faster processing, load Matlab syntax for listings
\lstset{language=Matlab, % Use MATLAB
frame=single, % Single frame around code
basicstyle=\small\ttfamily, % Use small true type font
keywordstyle=[1]\color{Blue}\bf, % MATLAB functions bold and blue
keywordstyle=[2]\color{Purple}, % MATLAB function arguments purple
keywordstyle=[3]\color{Blue}\underbar, % User functions underlined and blue
identifierstyle=, % Nothing special about identifiers
% Comments small dark green courier
stringstyle=\color{Purple}, % Strings are purple
showstringspaces=false, % Don't put marks in string spaces
tabsize=3, % 5 spaces per tab
%%% Put standard MATLAB functions not included in the default
%%% language here
%%% Put MATLAB function parameters here
morekeywords=[2]{on, off, interp},
%%% Put user defined functions here
morekeywords=[3]{FindESS, homework_example},
morecomment=[l][\color{Blue}]{...}, % Line continuation (...) like blue comment
numbers=left, % Line numbers on left
firstnumber=1, % Line numbers start with line 1
numberstyle=\tiny\color{Blue}, % Line numbers are blue
stepnumber=1 % Line numbers go in steps of 5
% Setup the header and footer
\pagestyle{fancy} %
\lhead{\hmwkAuthorName} %
%\chead{\hmwkClass\ (\hmwkClassInstructor\ \hmwkClassTime): \hmwkTitle} %
\rhead{\hmwkClass\ : \hmwkTitle} %
%\rhead{\firstxmark} %
\lfoot{\lastxmark} %
\cfoot{} %
\rfoot{Page\ \thepage\ of\ \protect\pageref{LastPage}} %
\renewcommand\headrulewidth{0.4pt} %
\renewcommand\footrulewidth{0.4pt} %
% This is used to trace down (pin point) problems
% in latexing a document:
% Some tools
\newcommand{\enterProblemHeader}[1]{\nobreak\extramarks{#1}{#1 continued on next page\ldots}\nobreak%
\nobreak\extramarks{#1 (continued)}{#1 continued on next page\ldots}\nobreak}%
\newcommand{\exitProblemHeader}[1]{\nobreak\extramarks{#1 (continued)}{#1 continued on next page\ldots}\nobreak%
% We put the blank space above in order to make sure this
% \marginpar gets correctly placed.
\newenvironment{homeworkProblem}[1][Problem \arabic{homeworkProblemCounter}]%
{% We put this space here to make sure we're not connected to the above.
% Otherwise the changetext can do funny things to the other margin
\enterProblemHeader{\homeworkProblemName\ [\homeworkSectionName]}}%
% We put the blank space above in order to make sure this margin
% change doesn't happen too soon (otherwise \sectionAnswer's can
% get ugly about their \marginpar placement.
{% We put this space here to make sure we're disconnected from the previous
% passage
% We put the blank space above in order to make sure this
% \marginpar gets correctly placed.
%%% I think \captionwidth (commented out below) can go away
%% Edits the caption width
% \dimen0=\columnwidth \advance\dimen0 by-#1\relax
% \divide\dimen0 by2
% \advance\leftskip by\dimen0
% \advance\rightskip by\dimen0
% Includes a figure
% The first parameter is the label, which is also the name of the figure
% with or without the extension (e.g., .eps, .fig, .png, .gif, etc.)
% IF NO EXTENSION IS GIVEN, LaTeX will look for the most appropriate one.
% This means that if a DVI (or PS) is being produced, it will look for
% an eps. If a PDF is being produced, it will look for nearly anything
% else (gif, jpg, png, et cetera). Because of this, when I generate figures
% I typically generate an eps and a png to allow me the most flexibility
% when rendering my document.
% The second parameter is the width of the figure normalized to column width
% (e.g. 0.5 for half a column, 0.75 for 75% of the column)
% The third parameter is the caption.
% Requires \usepackage{graphicx}
%%% I think \captionwidth (see above) can go away as long as
%%% \centering is above
% Includes a MATLAB script.
% The first parameter is the label, which also is the name of the script
% without the .m.
% The second parameter is the optional caption.
% Make title
%\title{\vspace{2in}\textmd{\textbf{\hmwkClass:\ \hmwkTitle\ifthenelse{\equal{\hmwkSubTitle}{}}{}{\\\hmwkSubTitle}}}\\\normalsize\vspace{0.1in}\small{Due\ on\ \hmwkDueDate}\\\vspace{0.1in}\large{\textit{\hmwkClassInstructor\ \hmwkClassTime}}\vspace{3in}}
%\title{\vspace{2in}\textmd{\textbf{\hmwkClass:\ \hmwkTitle\ifthenelse{\equal{\hmwkSubTitle}{}}{}{\\\hmwkSubTitle}}}\\\normalsize\vspace{0.1in}\small{Due\ on\ \hmwkDueDate}\\\vspace{0.1in}\large{\textit{ \hmwkClassTime}}\vspace{3in}}
\title{\textmd{\textbf{\hmwkClass:\ \hmwkTitle\ifthenelse{\equal{\hmwkSubTitle}{}}{}{\\\hmwkSubTitle}}}\\\normalsize\vspace{0.1in}\small{Due\ on\ \hmwkDueDate}\\\vspace{0.1in}\large{\textit{ \hmwkClassTime}}}
% Uncomment the \tableofcontents and \newpage lines to get a Contents page
% Uncomment the \setcounter line as well if you do NOT want subsections
% listed in Contents
% When problems are long, it may be desirable to put a \newpage or a
% \clearpage before each homeworkProblem environment
GPUs and CPUs are both processing units
and both conceptions share the same basic structure, in other words both require an I-cache,
a set of registers, a memory and one or more ALUs.
However they differ in the organization, which is designed for different kind of parallelization approach:
CPUs parallelize the instruction flow, GPUs the data treatement.
Superscalar CPUs are designed in order to parallelize instructions involving a thread per core,
then each one has its own execution context, or rather a instruction flow, an assigned stack and a program counter.
Such a system is flexible in instruction branching and it's optimal to afford independent tasks parallelization.
In particular, different threads execute different instructions, allowing the pipeline to accumulate horizontal waste.
In GPU streaming processor the execution is similar to a vector processor:
a lot of ALUs with the support of memory and registers executing a single instruction flow.
Compared to the superscalar CPUs, there are much more threads sharing cores and memory and
all of them are executing the same task.
Thread divergence comes out from the conditional branching of the execution between the
different threads belonging to the same warp. In order to avoid possible data race, the instruction flow
should admit only one path at a time, meaning that all threads are executing that path.
In case one thread branches the instruction flow, all other thread should stop and wait
for the current path to join the main one.
Such a thread is called a diverging thread.
There are different type of memory on a GPU.
The main, or rather the one in common to all processing units and so it allows blocks
to communicate between. A part of this hardware is also used for local memory, or rather
a dedicated buffer for warps. As it's the highest level memory, its latency is bigger compared to the next.
A single block is said to own a shared memory: all threads running on this block have access to it and its latency
is much smaller (around $5$ ns).
Each streaming multiprocessor owns an L1 cache and there also an L2 shared to all of them and they generally cache local or global memory.
It remains registers, constants and textures.
The first one is the faster over all the GPU and serves to stack the variables declared in the GPU kernel.
Constans and textures are supplementary dedicated spaces and they are separated global memories with their own cache.
They were introduced in order to reduce traffic on global memory.
The first thing a programmer should consider is to use the shared memory as it's possible,
unless a bank conflict is present, which would cause more latency than global memory access.
So it's important to pay attention to the threads accessing banks and check to always have a one-to-one access.
In principle, further optimizations are reached if GPU ALUs are always/equivalently busy, because the more
they are, the least will finish and that will play also a role on the final thread synchronization.
So the GPU must be fully occupied all time for the maximum performance.
In order to not overflow the faster memories traffic, it's a good pratice to minimize the resource access and use kernel defined variables when possible.

