ILP is defined as the simultaneous execution of multiple instructions during a program
belonging to the same task.
TLP is the technique of distributing tasks across different processor units. A task is generally a portion of code that can
be executed by a process or a thread.
The two differs mainly on the management of the execution: while ILP takes an input and process it
into different subsets of instructions synchronously, TLP assigns the execution to different threads.
Notice that TLP, instead of ILP, can apply
parallelism with differents clocks then the execution of a task is
asynchronous.
There are two basic approaches of ILP and TLP:
\begin{itemize}
\item Dynamic: let the hardware manage the parallelism over the executed instructions.
\item Static: involve software control over the executed instructions.
\end{itemize}
These two approaches are complementar and can be combined in order to obtain better performance.
A consequence of parallelism over instructions (then mainly ILP) are the three arising dependences: data dependency, name dependency and control dependency
marked as Ddep, Ndep and Cdep respectively in the following statements.
Let "i" and "j" be single instructions, then
\paragraph{Ddep}
"i" Ddep "j" if "i" produces a result that may be used by "j" and,
if "k" is another instruction, "j" Ddep "k" Ddep "i".
\paragraph{Ndep}
"i" Ndep "j" if "i" is antidependent or outputdependent on "j".
Generally, "i" is antidependent on "j" if "j" aptempts to write a memory location that "i" is reading.
On the other hand, "i" and "j" are outputdependent if both write on the same register.
In that case the depence can be avoided by thinking of a program structure which doesn't embed it.
The risk of this dependence is a "data hazard", or rather a violation of the program order, meaning that some registers
can be read and written at the same time from "i" and "j". The only case where there's no hazard is a simultaneous reading
of the same register.
\paragraph{Cdep}
Control dependence is explained imagining a tree of instructions representing its execution flow
and the instructions parallelism divergence in branches.
"i" Cdep "j" if there exists a branch ("j" -> "l") such that "i" post-dominate "l" and, if "i" doesn't post-dominate "j"
when "i" is different than "j".
"i" is said to post-dominate "j" if "i" appears on every path of the tree from "j" to the end of the execution.
\end{homeworkSection}
\begin{homeworkSection}{(3)}
NUMA is a specific build philosophy constructed for a system of multiple processing units.
In NUMA, individual processors work in parallel, sharing local memory in order to improve performance.
The main reason to prefer such a configuration is the gain in efficiency given by the reduction of the distance (in terms of units)
that a digital signal has to cover in order to pass from a memory to a process. This is achieved by placing intermediate shared memories
and avoid a lot of worthless uses of the bus. For example, when connected to a shared storage cache, a symmetric multiprocessing system can be implemented.
Sharing memory is also useful to avoid the need of storing duplicates values when more units want to read the same data.