Page MenuHomec4science

00-vs-vectors.tex
No OneTemporary

File Metadata

Created
Thu, Mar 13, 08:20

00-vs-vectors.tex

\chapter[Signal Processing and Vector Spaces]{Signal Processing and \\ Vector Space}
\label{ch:vector_space}\label{ch:vs}
In the previous chapter we already started to classify signals according to their mathematical properties by discriminating, for instance, between finite- and infinite-length entities. As we progress in the book, we will encounter several more types of signal, both in discrete and in continuous time but, rather than trying to create a complex taxonomy, in this chapter we will show how the concept of \textit{vector space} provides an ideal framework to describe and analyze signals irrespective of the details inherent to their type or category.
Pedagogically speaking, today's standard approach to vector space is purely inductive: given a set of objects, called vectors, and a set of numbers, called scalars, vector space is defined by a set of rules that stipulate how vectors can be resized and combined. The approach, while elegant and concise, regrettably hides the historical sequence of brilliant intuitions that, over the course of centuries, produced the formalism currently in use. After a long pragmatic separation between the Greek ideal of synthetic geometry and the bag of tricks that was algebra, in 1637 Ren\`e Descartes arranged a remarkably fruitful union: by tackling geometric questions with algebraic methods, his \textit{analytic geometry} managed to provide an answer to problems that had long defied the classic ruler-and-compass methods (such as, famously, the trisection of the angle). Yet, it would take almost two more centuries before another crucial step in the formalization of vector space were put in place; this happened around the year 1800, when at least three different mathematicians\footnote{Friedrich Gauss and Caspar Wessel in 1799, and Jean-Robert Argand in 1806.} independently developed a geometric interpretation of complex numbers (entities de facto in use since the 16th century) that was consistent with their algebraic properties. In a way, the ideas behind the concept of complex plane contain all the fundamental ingredients that define a vector space, from the notions of norm and direction to the roles of the real and imaginary units as elements of a two-dimensional canonical basis. And yet, the extension of these results to the next logical level, that is, to three dimensions, proved a difficult nut to crack, and created a fork in the road to modern vector theory. While Hamilton and his followers continued along the algebraic path and developed the theory of quaternions (with the endorsement of leading physicists of the era such as Maxwell) it was Gibbs (and later Heaviside) that formalized the properties of vector space as we use them today as a more straightforward generalization of Cartesian geometry. Finally, \textit{linear algebra} emerged as a new branch of mathematics devoted to the study of vector space, by now extended to encompass not only tuples of arbitrary or even infinite dimension but also more abstract entities such as functions.
Most of the signal processing theory in this book can be conveniently described using vector space formalism. While by no means the only possible approach, there are several advantages to this choice, the first one of which is the ability to leverage simple geometric intuition to understand sophisticated abstract concepts. As an example, take the notion of \textit{orthogonality:}\/ the basic properties of orthogonal angles have been clearly apparent even to unschooled masons or carpenters well before their Euclidean formalization; the concept was then mathematically extended to generic vectors so that we can now talk about, say, orthogonal functions and, even if the details of how to verify such orthogonality are not spelled out, we immediately understand that two such functions must be fundamentally different in some respect. Interesting, the concept is so powerful that it has even seeped into common language, as in the sentence ``our political views are orthogonal to each other.'' So, in the next chapter, when we introduce the orthogonality of discrete-time harmonic sinusoids, we will be able to start the discussion with an intuitive understanding of the subject already in place for free.
Finally, and this as well is no small advantage, vector notation closely mirrors the data structures used by numerical analysis packages and programming languages so that signal processing algorithms expressed in vector notation translate to working code with minimal effort.
%The first advantage of this approach is compactness: the entire linear algebra toolbox becomes available and this greatly simplifies the formalism used in mathematical proofs while fostering a good geometry-based intuition with respect to the underlying principles. Secondly, vector notation provides \textit{general} theoretical results by hiding all unnecessary details (such as, for instance, the actual length of finite-length signals).
%formalizing the axioms in use today. Crucially, the latter ensure that any axiom-abiding vector space retains the structure and interpretability of the familiar Euclidean space; concepts such as distance or orthogonality, for instance, which are immediately understandable in three-dimensional space, conserve their meaning in all spaces and help us understand the properties of highly abstract vectors even when intuitive or graphical visualizations are no longer possible.
%Basic arithmetic, back at the beginning of human history, was borne of practical necessities that involved counting and measuring and, to this day, arithmetic is taught to children in a deductive manner that mirrors its early inception: apples are gathered in baskets and pies are sliced and eaten. \textit{Algebra}, i.e. the set of recipes that are used to find missing quantities in an equation, followed closely in the development of the mathematical mind and, although formalized only in the 16th century, practical recipes had started accumulating from the very beginning. Once it was clear that arithmetic operations had an inner structure of their own that did not require a physical embodiment, apples and pies became simply $x$'s and $y$'s. In the $17$th century, algebra and geometry indissolubly married: Descartes's original idea of translating geometric constructs into algebraic form spurred a new successful line of attack for a number of problems which had long defied the classic methods using ruler and compass (such as, famously, the trisection of the angle). The Cartesian plane was born and, with it, the concept of two- and three-dimensional \textit{vectors}. In two and three dimensions, that is, in Euclidean space, vectors have the same intuitive power of arithmetical apples and pies since we can rather successfully conjure up a physical representation of their nature. Again, however, it was soon realized that operations on Euclidean vectors can be defined for any dimensionality of the underlying space, leading to the concept of \textit{linear algebra}; and, although we cannot visualize $N$-dimensional space for $N>3$, we can use our low-dimensional intuition to extend to any dimension the meaning (and therefore the understanding) of concepts such as distance and orthogonality.
%Today, of course, vector spaces are introduced almost exclusively in an inductive, operational way: if a certain set of properties is satisfied by the vectors and by an operation defined on them, we have a vector space. Nothing is said about the vectors themselves, least of all how the vectors should ``look like;'' indeed, common vector spaces can have an infinite number of dimensions and contain esoteric objects such as functions. The power of the formalism, nevertheless, is that we can always use Euclidean geometry as a helpful crutch
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Euclidean Geometry and Vector Space}
Euclidean geometry is a straightforward idealization of the physical space that we experience with our senses and therefore the concepts of length, angle, translation or scaling require no formal explanation. In this section we will show how Euclidean space can be mapped to a vector space of real-valued tuples and, consequently, how the properties of these vectors can be described in terms of standard geometrical constructs. The idea is to associate the key ingredients of vector space to intuitive geometrical concepts before generalizing them to the more abstract vector spaces that we will use later.
\subsection{The Euclidean Plane}
\itempar{The vectors.} Consider a vector $\mathbf{x}$ defined as a pair of real numbers:
\begin{equation}
\mathbf{x} = \begin{bmatrix}
x_0 \\ x_1
\end{bmatrix}, \quad x_0, x_1 \in \mathbb{R};
\end{equation}
we call the set of all such vectors $\mathbb{R}^2$.
Consider now the Euclidean plane, equipped with a Cartesian reference system as in Figure. Each point on the plane is uniquely identified by a pair of real-valued coordinates on the two reference axes, and therefore we can associate any point on the plane to a vector in $\mathbb{R}^2$. The standard graphical representation of such a vector is an arrow that from the origin (i.e. the point with coordinates\footnote{
The superscript $T$ indicates transposition so that $[ 0 \;\; 0]^T$ is a column vector.
}
$[ 0 \;\; 0]^T$) reaches the point with coordinates $[ x_0 \;\; x_1]^T$, as in Figure~\ref{fig:vs:vector_representation_r2}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t]
\center\small
\psset{unit=8mm}
\begin{pspicture}(-4,-1)(8,4)
\psgrid[subgriddiv=1,gridlabels=0pt,gridcolor=gray,griddots=10](0,0)(-4,-1)(8,4)
\psaxes[labelFontSize=\scriptstyle]{->}(0,0)(-4,-1)(8,4)
\psline[linecolor=ColorOne]{->}(4,2)\uput[-45](4,2){$\mathbf{x} = [ 4 \;\; 2]^T$}
\end{pspicture}
% \vskip-5mm
\caption{Graphical representation of a vector in $\mathbb{R}^2$.}\label{fig:vs:vector_representation_r2}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Define now the \textit{sum} of two vectors in $\mathbb{R}^2$ like so:
\begin{equation}
\mathbf{x + y} =
\begin{bmatrix} x_0 \\ x_1 \end{bmatrix} + \begin{bmatrix} y_0 \\ y_1 \end{bmatrix} =
\begin{bmatrix} x_0 + y_0 \\ x_1 + y_1 \end{bmatrix};
\end{equation}
although we could have invented a different ``recipe'' for addition, this particular one makes a lot of sense since it corresponds to a \textit{translation} of $\mathbf{y}$ by $\mathbf{x}$: geometrically, we place $\mathbf{y}$ at the end of $\mathbf{x}$ and we connect the origin to the endpoint of the joined vectors as in Figure~\ref{fig:vs:vector_addition_r2}. This is the classic parallelogram rule for vector addition.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t]
\center\small
\psset{unit=8mm}
\begin{pspicture}(-4,-1)(8,4)
\psgrid[subgriddiv=1,gridlabels=0pt,gridcolor=gray,griddots=10](0,0)(-4,-1)(8,4)
\psaxes[labelFontSize=\scriptstyle]{->}(0,0)(-4,-1)(8,4)
\psline[linecolor=ColorTwo]{->}(5,1)\uput[-45](5,1){$\mathbf{x}$}
\psline[linecolor=ColorThree]{->}(2,2)\uput[45](2,2){$\mathbf{y}$}
\psline[linecolor=ColorLightest,linestyle=dashed]{->}(5,1)(7,3)
\psline[linecolor=ColorLightest,linestyle=dashed](2,2)(7,3)
\psline[linecolor=ColorOne]{->}(7,3)\uput[45](7,3){$\mathbf{x+y}$}
\end{pspicture}
\caption{Vector addition in $\mathbb{R}^2$}\label{fig:vs:vector_addition_r2}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Similarly, define the scalar multiplication by a scalar $\alpha \in \mathbb{R}$ as
\begin{equation}
\alpha\mathbf{x} =
\begin{bmatrix} \alpha x_0 \\ \alpha x_1 \end{bmatrix};
\end{equation}
again, this makes geometrical sense since it corresponds to changing the ``size'' of the vector, as in Figure~\ref{fig:vs:scalar_multiplication_r2}. Given these definitions for sum and scalar multiplication, it is easy to verify that they possess all the nice properties we expect them to have, including distributivity, associativity and so on. We will formalize all this later. Multiple vectors can be joined together in \textit{linear combinations} of the form:
\begin{equation}
\mathbf{y} = \sum_{n=0}^{N} \alpha_n \mathbf{x}_n
\end{equation}
where the subscript, when applied to a boldface symbol, indicates the ordinal number of an item in a given set of vectors.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\center\small
\psset{unit=8mm}
\begin{pspicture}(-4,-1)(8,4)
\psgrid[subgriddiv=1,gridlabels=0pt,gridcolor=gray,griddots=10](0,0)(-4,-1)(8,4)
\psaxes[labelFontSize=\scriptstyle]{->}(0,0)(-4,-1)(8,4)
\psline[linecolor=ColorOne]{->}(6,3)\uput[-45](6,3){$1.5\,\mathbf{x}$}
\psline[linecolor=ColorTwo,linewidth=2pt]{->}(4,2)\uput[-45](4,2){$\mathbf{x}$}
\end{pspicture}
\caption{Scalar multiplication in $\mathbb{R}^2$}\label{fig:vs:scalar_multiplication_r2}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\itempar{Norm, similarity and distance.} We said that scalar multiplication changes the ``size'' of a vector but we haven't defined a way to quantify said size. Once again, by looking at the geometric interpretation of $\mathbb{R}^2$, the natural choice is using the length of the arrow that represents the vector; that is commonly called the \textit{norm} and is represented by the symbol $\|\cdot\|$. From Figure~\ref{fig:vs:vector_representation_r2}, using Pythagoras Theorem, it's easy to see that
\begin{equation} \label{eq:vs:norm_r2}
\|\mathbf{x}\| = \sqrt{x_0^2 + x_1^2}
\end{equation}
Consider now the problem of estimating the degree of \textit{similarity} between two vectors; if we start with vectors of equal norm, the only thing left to compare is their directions or, in other words, the angle that they form. If the angle is small, the vectors will be basically oriented equally and almost interchangeable; at the other extreme, if they form a right angle, they will be maximally dissimilar in the sense that the second vector shares no directionality component with the first. The \textit{inner product} of two vectors in $\mathbb{R}^2$ is a function from $\mathbb{R}^2 \times \mathbb{R}^2$ to $\mathbb{R}$ and it is defined as:
\begin{equation}\label{eq:vs:inner_product_r2}
\langle \mathbf{x, y} \rangle = x_0 y_0 + x_1 y_1.
\end{equation}
Although the definition is not instantly intuitive, the following properties will help: first of all, again by invoking Pythagoras, we can see that
\begin{equation}
\langle \mathbf{x, x} \rangle = \|\mathbf{x}\|^2
\end{equation}
namely, a vector's self inner product is equal to the square of the norm as defined in~(\ref{eq:vs:norm_r2}). Secondly, using basic trigonometry, we can rewrite the inner product as
\begin{equation}
\langle \mathbf{x, y} \rangle = \|\mathbf{x}\|\,\|\mathbf{y}\|\,\cos\theta
\end{equation}
where $\theta$ is the angle between $\mathbf{x}$ and $\mathbf{y}$ as in Figure~\ref{fig:vs:inner_product_r2}. The inner product as defined above, therefore, is a qualitative measure of similarity between vectors. Vectors forming a right angle, that is, vectors that are maximally dissimilar, will have an inner product of zero. Such vectors are called \textit{orthogonal}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\center\small
\psset{unit=8mm}
\begin{pspicture}(-4,-1)(8,4)
\psgrid[subgriddiv=1,gridlabels=0pt,gridcolor=gray,griddots=10](0,0)(-4,-1)(8,4)
\psaxes[labelFontSize=\scriptstyle]{->}(0,0)(-4,-1)(8,4)
\psline[linecolor=ColorTwo]{->}(6,2)\uput[-45](6,2){$\mathbf{x}$}
\psline[linecolor=ColorThree]{->}(2.4,3.2)\uput[-45](2.4,3.2){$\mathbf{y}$}
\psarc[linecolor=ColorOne]{->}{3}{18.435}{53.13}
\uput{3.2}[33]{0}(0,0){$\theta$}
\end{pspicture}
\caption{Angle between vectors in $\mathbb{R}^2$.}\label{fig:vs:inner_product_r2}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Finally, for a quantitative measure of how far apart two vectors are, we can use the norm of their difference to compute the \textit{distance} between vectors as
\begin{equation}
d(\mathbf{x, y}) = \|\mathbf{x - y}\|;
\end{equation}
by using the definition of the norm in terms of the inner product, we have
\begin{align*}
\|\mathbf{x - y}\|^2 &= \langle \mathbf{x - y, x - y} \rangle \\
&= \langle \mathbf{x, x} \rangle + \langle \mathbf{y, y} \rangle - 2\langle \mathbf{x, y} \rangle \\
&= \|\mathbf{x}\|^2 + \|\mathbf{y}\|^2 - 2\langle \mathbf{x, y} \rangle
\end{align*}
which, when $\mathbf{x}$ and $\mathbf{y}$ are orthogonal, simplifies to Pythagoras theorem:
\begin{equation}
\|\mathbf{x - y}\|^2 = \|\mathbf{x}\|^2 + \|\mathbf{y}\|^2 \qquad \mbox{for $\mathbf{x \perp y}$}
\end{equation}
\itempar{Bases and projections.} Consider the two vectors
\begin{equation}
\boldsymbol{\delta}_0 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \qquad \boldsymbol{\delta}_1 = \begin{bmatrix} 0 \\ 1 \end{bmatrix};
\end{equation}
obviously, for \textit{any} vector in $\mathbb{R}^2$ we can write:
\begin{equation}\label{eq:vs:basis_expansion_r2}
\mathbf{x} = \begin{bmatrix} x_0 \\ x_1 \end{bmatrix} = x_0\,\boldsymbol{\delta}_0 + x_1\,\boldsymbol{\delta}_1.
\end{equation}
The fact that any vector in $\mathbb{R}^2$ can be expressed as a linear combination of $\boldsymbol{\delta}_0$ and $\boldsymbol{\delta}_1$ makes the set $\{\boldsymbol{\delta}_0, \boldsymbol{\delta}_1\}$ a \textit{basis} for $\mathbb{R}^2$. This basis is also particularly good since its elements have unit norm and are orthogonal, that is, they form an \textit{orthonormal} basis; if we look at the way we have been drawing vectors on the plane, as exemplified in Figure~\ref{fig:vs:bases_r2}-(a), we can see that the Cartesian reference grid uses $\boldsymbol{\delta}_0$ and $\boldsymbol{\delta}_1$ as the units on the horizontal and vertical axis, respectively. For this particular basis, the coefficients in the expansion~(\ref{eq:vs:basis_expansion_r2}) coincide with the elements of the vector; this unique fact gives $\{\boldsymbol{\delta}_0, \boldsymbol{\delta}_1\}$ the prestigious title of \textit{canonical basis}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t]
\center\small
\psset{unit=8mm}
\begin{tabular}{cc}
\begin{pspicture}(-3,-2)(4,3)
\psgrid[subgriddiv=1,gridlabels=0pt,gridcolor=ColorLight,griddots=10](0,0)(-3,-2)(4,3)
\psaxes[labels=none,ticks=none]{->}(0,0)(-3,-2)(4,3)
\psline[linecolor=ColorOne]{->}(3,2)
\psline[linecolor=ColorTwo,linewidth=2pt]{->}(1,0)\uput[-45](1,0){$\boldsymbol{\delta}_0$}
\psline[linecolor=ColorTwo,linewidth=2pt]{->}(0,1)\uput[135](0,1){$\boldsymbol{\delta}_1$}
\end{pspicture}
\hspace{1em}
&
\hspace{1em}
\begin{pspicture}(-3,-2)(4,3)
\psclip{\psframe[linestyle=none](-3,-2)(4,3)}
\rput{17.6}{\psgrid[subgriddiv=1,gridlabels=0pt,gridcolor=ColorLight,griddots=10](0,0)(-5,-5)(5,5)}
\endpsclip
\psaxes[labels=none,ticks=none]{->}(0,0)(-3,-2)(4,3)
\psline[linecolor=ColorOne]{->}(3,2)
\psline[linecolor=ColorTwo,linewidth=2pt]{->}(0.95,0.3)\uput[0](0.95,0.3){$\mathbf{r}_0$}
\psline[linecolor=ColorTwo,linewidth=2pt]{->}(-0.3,0.95)\uput[135](0,1){$\mathbf{r}_1$}
\end{pspicture}
\\ (a) & (b) \\
\begin{pspicture}(-3,-2)(4,3)
\psclip{\psframe[linestyle=none](-3,-2)(4,3)}
\psTilt{44}{\psgrid[subgriddiv=1,gridlabels=0pt,gridcolor=ColorLight,griddots=10](0,0)(-5,-5)(5,5)}
\endpsclip
\psaxes[labels=none,ticks=none]{->}(0,0)(-3,-2)(4,3)
\psline[linecolor=ColorOne]{->}(3,2)
\psline[linecolor=ColorTwo,linewidth=2pt]{->}(1,0)\uput[-45](1,0){$\mathbf{s}_0$}
\psline[linecolor=ColorTwo,linewidth=2pt]{->}(1,1)\uput[90](1,1){$\mathbf{s}_1$}
\end{pspicture}
\hspace{1em}
&
\hspace{1em}
\begin{pspicture}(-3,-2)(4,3)
\psaxes[labels=none,ticks=none]{->}(0,0)(-3,-2)(4,3)
\psline[linecolor=ColorOne]{->}(3,2)
\psline[linecolor=ColorTwo,linewidth=2pt]{->}(1,0)\uput[-45](1,0){$\mathbf{k}_0$}
\psline[linecolor=ColorTwo,linewidth=2pt]{->}(-1,0)\uput[-135](-1,0){$\mathbf{k}_1$}
\end{pspicture}
\\ (c) & (d)
\end{tabular}
\caption{(a) canonical basis in $\mathbb{R}^2$; (b) rotated orthonormal basis; (c) non-orthogonal basis; (b) not a basis.}\label{fig:vs:bases_r2}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
There is however an infinity of other bases that can be used for $\mathbb{R}^2$. For instance, any rotation by an angle $\theta$ of the canonical basis is also an orthonormal basis whose elements are
\begin{equation}
\mathbf{r}_0 = \begin{bmatrix} \cos\theta \\ \sin\theta \end{bmatrix}, \qquad
\mathbf{r}_1 = \begin{bmatrix} -\sin\theta \\ \cos\theta \end{bmatrix}
\end{equation}
and the grid they induce on $\mathbb{R}^2$ is shown in Figure~\ref{fig:vs:bases_r2}-(b). Note that a basis need not be orthonormal; for instance the vectors
\begin{equation}
\mathbf{s}_0 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \qquad
\mathbf{s}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}
\end{equation}
from a valid basis for $\mathbb{R}^2$ and they induce the non-rectangular grid shown in Figure~\ref{fig:vs:bases_r2}-(c). On the other hand, the vectors
\begin{equation}
\mathbf{k}_0 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \qquad
\mathbf{k}_1 = \begin{bmatrix} -1 \\ 0 \end{bmatrix}
\end{equation}
(Figure~\ref{fig:vs:bases_r2}-(d)) do not form a basis for $\mathbb{R}^2$ since any linear combination will necessarily yield a point on the horizontal axis (the second element of the resulting vector will always be zero). Stated differently, the two vectors are not \textit{linearly independent}.
Consider a vector $\mathbf{x}$ in the canonical basis and pick new basis $\{\mathbf{w}_0, \mathbf{w}_1\}$; how can we find the coefficients that express $\mathbf{x}$ in the new basis? In the Euclidean plane (and in Euclidean space in general), we can always expand the equation
\begin{equation}\label{eq:vs:change_of_basis_r2}
\alpha_0\,\mathbf{w}_0 + \alpha_1\,\mathbf{w}_1 = \mathbf{x}
\end{equation}
by writing out the vector elements explicitly and solving for $\alpha_1$ and $\alpha_2$:
\begin{equation} \label{eq:vs:matrixcob}
\begin{bmatrix} w_{0,0} & w_{0,1} \\ w_{1,0} & w_{1,1} \end{bmatrix} \begin{bmatrix} \alpha_0 \\ \alpha_1 \end{bmatrix} = \begin{bmatrix} x_0 \\ x_1 \end{bmatrix}
\end{equation}
(we use the notation $w_{n,m}$ to indicate the $m$-th element of vector $\mathbf{w}_n$). For instance, looking at Figure~\ref{fig:vs:bases_r2}-(c) and given $\mathbf{x} = [3 \;\; 2]^T$ we have
\begin{equation}
\begin{bmatrix} s_{0,0} & s_{0,1} \\ s_{1,0} & s_{1,1} \end{bmatrix} \begin{bmatrix} \alpha_0 \\ \alpha_1 \end{bmatrix} =
\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} \alpha_0 \\ \alpha_1 \end{bmatrix} =
\begin{bmatrix} 3 \\ 2 \end{bmatrix}
\end{equation}
so that
\begin{equation}
\mathbf{x} = 1\,\mathbf{s}_0 + 2\,\mathbf{s}_1.
\end{equation}
If the new basis is orthonormal, however, we know that
\begin{equation}
\langle \mathbf{w}_h, \mathbf{w}_k \rangle = \delta[h-k];
\end{equation}
therefore, if we take the left\footnote{
We use the left inner product because it mirrors the order of the operands when the change of basis is expressed as a matrix-vector multiplications as in~(\ref{eq:vs:matrixcob}). The order is of course irrelevant when we operate over the field of reals, since in that case the inner product is commutative; over the field of complex numbers, however, the inner product is commutative with conjugation (see definition~(\ref{eq:vs:inner_prod_comm}) later) and so the order is crucial.\label{page:vs:order}}
inner product of both sides in~(\ref{eq:vs:change_of_basis_r2}) with $\mathbf{w}_0$ and $\mathbf{w}_1$ in turn, we can obtain directly:
\begin{align}
\alpha_0 &= \langle \mathbf{w}_0, \mathbf{x} \rangle \\
\alpha_1 &= \langle \mathbf{w}_1, \mathbf{x} \rangle
\end{align}
For instance, looking at Figure~\ref{fig:vs:bases_r2}-(b) where the rotation angle is $\theta = \arctan(2/3)$, we have
\begin{align}
\alpha_0 &= \langle \mathbf{r}_0, \mathbf{x} \rangle = 3\cos\theta + 2\sin\theta \approx 3.46 \\
\alpha_1 &= \langle \mathbf{r}_1, \mathbf{x} \rangle = -3\sin\theta + 2\cos\theta = 1
\end{align}
Bases play a fundamental role in signal processing since, as we will see in the next Chapter, the Fourier Transform can be interpreted as a change of basis in an appropriate vector space. Intuitively, a change of basis is a change of perspective: we're looking at the same data but from a different reference system and this change of viewpoint can highlight properties of a signal that are difficult to see in the original dataset.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\center\small
\psset{unit=8mm}
\begin{pspicture}(-3,0)(8,4)
%\psgrid[subgriddiv=1,gridlabels=0pt,gridcolor=gray,griddots=10](0,0)(-4,-1)(8,4)
%\psaxes[labelFontSize=\scriptstyle]{->}(0,0)(-4,-1)(8,4)
\psline[linecolor=ColorTwo]{->}(6,2)\uput[-45](6,2){$\mathbf{x}$}
\psline[linecolor=ColorThree]{->}(2.4,3.2)\uput[120](2.4,3.2){$\mathbf{y}$}
\psarc[linecolor=ColorLightest]{->}{2}{18.435}{53.13}\uput{2.2}[33]{0}(0,0){$\theta$}
\psline[linecolor=ColorLightest,linestyle=dashed]{->}(3.12,1.04)(2.4,3.2)
\uput[45](3,2){\color{ColorLight}$\mathbf{y-y_x}$}
\psline[linecolor=ColorOne]{->}(3.12,1.04)\uput[-45](3.12,1.04){$\mathbf{y_x}$}
\end{pspicture}
\caption{Orthogonal projection in $\mathbb{R}^2$.}\label{fig:vs:orthogonal_projection_r2}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Let's now look once again at the definition of inner product in~\ref{eq:vs:inner_product_r2}. Consider Figure~\ref{fig:vs:orthogonal_projection_r2} and assume $\|\mathbf{x}\| = 1$; the vector
\begin{equation} \label{eq:vs:orthProj}
\mathbf{y_x} = \langle \mathbf{x, y} \rangle\, \mathbf{x} = (\|\mathbf{y}\|\cos\theta)\, \mathbf{x}
\end{equation}
is called the \textit{orthogonal projection} of $\mathbf{y}$ over $\mathbf{x}$ and it represents the best approximation of $\mathbf{y}$ using a scaled version of $\mathbf{x}$. The approximation is optimal in the sense that it achieves the minimum distance (or, in other words, the minimum Mean Square Error) between the original vector and any approximation based on $\mathbf{x}$:
\begin{equation}
\arg\min_\alpha \| \mathbf{y} - \alpha \mathbf{x} \| = \langle \mathbf{x, y} \rangle ;
\end{equation}
this is illustrated graphically in Figure~\ref{fig:vs:min_dist_r2} and can be easily derived analytically by solving $\partial\| \mathbf{y} - \alpha \mathbf{x} \|^2/\partial\alpha = 0$. Additionally, the approximation error $\mathbf{y - y_x}$ is orthogonal to the approximation itself
\begin{equation}\label{eq:vs:orthogonality_of_projection}
\langle \mathbf{y - y_x, y_x} \rangle = 0
\end{equation}
meaning that all the information of $\mathbf{y}$ that was expressible by $\mathbf{x}$ has been captured by the approximation. Please note that if $\|\mathbf{x}\| \neq 1$, the orthogonal projection simply includes a normalization term:
\begin{equation}\label{eq:vs:orthogonal_projection_normalized}
\mathbf{y}_{\mathbf{x}} = \frac{\langle \mathbf{x, y} \rangle}{\langle \mathbf{x, x} \rangle } \, \mathbf{x}.
\end{equation}
Although these results are nearly self-evident in $\mathbb{R}^2$, they will be particularly interesting when generalized to spaces of higher dimensionality.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\center\small
\psset{unit=8mm}
\begin{pspicture}(-3,0)(8,6)
%\psgrid[subgriddiv=1,gridlabels=0pt,gridcolor=gray,griddots=10](0,0)(-4,-1)(8,4)
%\psaxes[labelFontSize=\scriptstyle]{->}(0,0)(-4,-1)(8,4)
\psline[linecolor=ColorTwo]{->}(6,2)\uput[-45](6,2){$\mathbf{x}$}
\psline[linecolor=ColorThree]{->}(2.4,3.2)\uput[120](2.4,3.2){$\mathbf{y}$}
\psline[linecolor=ColorLightest,linestyle=dashed]{->}(3.12,1.04)(2.4,3.2)
\psline[linecolor=ColorOne]{->}(3.12,1.04)\uput[-45](3.12,1.04){$\mathbf{y_x}$}
\pscircle[linecolor=ColorFour,linewidth=0.5pt](2.4,3.2){0.6}
\pscircle[linecolor=ColorFour,linewidth=0.5pt](2.4,3.2){1.2}
\pscircle[linecolor=ColorFour,linewidth=0.5pt](2.4,3.2){1.7}
\pscircle[linecolor=ColorFour,linewidth=0.5pt](2.4,3.2){2.25}
\end{pspicture}
\caption{The orthogonal projection minimizes the distance between original and approximation.}\label{fig:vs:min_dist_r2}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Euclidean Space}
\label{sec:vs:approx}
One dimension up from the plane is the three-dimensional Euclidean space, which can be described in terms of vectors in $\mathbb{R}^3$ of the form
\begin{equation}
\mathbf{x} = \begin{bmatrix}
x_0 \\ x_1 \\ x_2
\end{bmatrix}, \quad x_0, x_1, x_2 \in \mathbb{R}.
\end{equation}
Addition, scalar multiplication and inner product all retain their definitions (by adjusting for one more component) and their geometrical meaning. Euclidean space, however, allows us to illustrate the concept of subspace approximation better than the plane.
\itempar{Subspaces and Least Squares approximation.} A vector subspace is a subset of the original vector space closed under addition and scalar multiplication; consider for example a plane in 3D space: if we sum (or rescale) vectors laying on the plane we remain on the plane and therefore we have a proper subspace. Subspaces have their own bases, which can be orthonormal of course. For instance, Figure~\ref{fig:vs:plane_in_3d} illustrates the planar subspace spanned by the first two 3D canonical vectors
\[
\boldsymbol{\delta}_0 = \begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \quad \mbox{and} \quad
\boldsymbol{\delta}_1 = \begin{bmatrix}0 \\ 1 \\ 0\end{bmatrix} .
\]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\center\small
\psset{unit=3cm}
\begin{pspicture}(-1.5,-.8)(1,1)
\psset{Alpha=40,linewidth=1pt}
\pstThreeDCoor[linecolor=black,linewidth=1pt,%
xMax=1.5,yMax=1.5,zMax=1,%
xMin=-.2,yMin=-.2,zMin=-.2,%
nameX=$\boldsymbol{\delta}_0$,
nameY=$\boldsymbol{\delta}_1$,
nameZ=$\boldsymbol{\delta}_2$]
\pstThreeDLine[linecolor=ColorOne]{->}(0,0,0)(1.5,0,0)
\pstThreeDLine[linecolor=ColorOne]{->}(0,0,0)(0,1.5,0)
\pstThreeDLine[linecolor=ColorTwo,linewidth=1.8pt]{->}(0,0,0)(1,.3,0)
\pstThreeDLine[linecolor=ColorThree,linewidth=1.8pt]{->}(0,0,0)(1,.7,0)
\pstThreeDLine[linecolor=ColorFour,linewidth=1.8pt]{->}(1,.3,0)(1,.7,0)
\pstThreeDPut(1.25,.45,0){$\mathbf{x+y}$}
\end{pspicture}
\caption{Three-dimensional Euclidean space and the planar subspace spanned by the first two canonical basis vectors; the subspace is closed under addition and scalar multiplication.}\label{fig:vs:plane_in_3d}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The existence of orthonormal bases for subspaces provides a very elegant answer to the problem of approximating an arbitrary vector with elements of a subspace. While the problem may appear quite abstract at first, it is actually at the core of many signal processing applications: modern image or video compression algorithms, for example, proceed by approximating a signal living in a high-dimensional space with a set of basis vectors from a subspace of much lower dimensionality.
Consider a vector space $V$ and a subspace $W \subseteq V$; assume $\{ \mathbf{w}_k \}_{k =0,1,\ldots, K-1}$ is an orthonormal basis for $W$. The \textit{projection theorem} states that the vector
\begin{equation}\label{eq:vs:projection_theorem}
\hat{\mathbf{x}} = \sum_{k=0}^{K-1} \langle \mathbf{w}_k, \mathbf{x} \rangle \, \mathbf{w}_k
\end{equation}
is the ``best'' approximation of $\mathbf{x} \in V$ over $W$. To see why, let's first unpack the above Equation; the sum is a linear combination of orthogonal projections of $\mathbf{x}$ onto all the basis vectors for the subspace. We already know from Equation~(\ref{eq:vs:orthogonality_of_projection}) that each orthogonal projection is the best approximation of $\mathbf{x}$ using a single unit norm basis vector. To show that (\ref{eq:vs:projection_theorem}) is the best global approximation in $W$ we need to show that the error $\mathbf{x - \hat{x}}$ is orthogonal to the approximation $\mathbf{\hat{x}}$:
\begin{equation}\label{eq:vs:orthogonality_of_approximation}
\langle \mathbf{x} - \hat{\mathbf{x}}, \, \hat{\mathbf{x}} \rangle = 0;
\end{equation}
this means that no further portion of the error could be accounted for by elements of the subspace and we've done the best we could. To prove~(\ref{eq:vs:orthogonality_of_approximation}) just substitute for~(\ref{eq:vs:projection_theorem}), use the distributivity of the inner product and remember that the basis $\{ \mathbf{w}_k \}$ is orthonormal. If the error is orthogonal to the subspace, the approximation minimizes the (square) norm of the error:
\begin{equation}
\arg\min_{\mathbf{y} \in W} \| \mathbf{x} - \mathbf{y} \| = \hat{\mathbf{x}};
\end{equation}
the approximation is therefore optimal in the least squares sense. An example of subspace approximation in Euclidean Space is given in Figure~\ref{fig:vs:projection_theorem}.
Note that the approximation provided by a set of orthonormal vectors can be \emph{successively refined}. Assume you have an approximation using the first $m$ basis vectors:
\begin{equation}
\hat{\mathbf{x}}_m = \sum_{k=0}^{m-1} \langle \mathbf{w}_k, \mathbf{x} \rangle \, \mathbf{w}_k
\end{equation}
the optimal approximation using another vector is simply
\begin{equation}
\hat{\mathbf{x}}_{m+1} = \hat{\mathbf{x}}_m + \langle \mathbf{w}_m, \mathbf{x} \rangle \, \mathbf{w}_m
\end{equation}
While this may seem obvious, it is actually a small miracle, since it does not hold for more general, non-orthonormal bases; a classic application of this property is used in algorithms for variable-bit-rate transmission of audio and video.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\center\small
\psset{unit=3cm}
\begin{pspicture}(-1.5,-.8)(1.5,1.8)
\psset{Alpha=40,linewidth=1pt}
\pstThreeDCoor[linecolor=black,linewidth=1pt,%
xMax=1.5,yMax=1.5,zMax=1.5,%
xMin=-.2,yMin=-.2,zMin=-.2,%
nameX=$\boldsymbol{\delta}_0$,
nameY=$\boldsymbol{\delta}_1$,
nameZ=$\boldsymbol{\delta}_2$]
\pstThreeDLine[linecolor=ColorTwo,linewidth=1.8pt]{->}(0,0,0)(1,.5,2)\pstThreeDPut(1,.5,2.15){$\mathbf{x}$}
\pstThreeDLine[linecolor=ColorOne]{->}(0,0,0)(1.5,0,0)
\pstThreeDLine[linecolor=ColorOne]{->}(0,0,0)(0,1.5,0)
\pstThreeDLine[linestyle=dashed,linewidth=0.5pt](0,.5,0)(1,.5,0)
\pstThreeDLine[linestyle=dashed,linewidth=0.5pt](1,.5,0)(1,0,0)
\pstThreeDLine[linestyle=dashed,linewidth=0.5pt](1,.5,0)(1,.5,2)
\pstThreeDLine[linecolor=ColorFour,linewidth=1.8pt]{->}(0,0,0)(1,.5,0)
\pstThreeDPut(1.15,.65,0){$\hat{\mathbf{x}}$}
\end{pspicture}
\caption{Least-squares approximation of a 3D vector onto the subspace spanned by the first two canonical basis vectors.}\label{fig:vs:projection_theorem}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Higher Dimensions}
Formally, it's quite easy to increase the number of dimensions beyond three: just consider the set of all tuples of $N$ real numbers
\begin{equation}
\mathbf{x} = \begin{bmatrix}
x_0 \\ x_1 \\ \vdots \\ x_{N-1}
\end{bmatrix}.
\end{equation}
By extending the Euclidean definitions of sum, scalar multiplication and inner product in a straightforward manner we obtain a vector space called $\mathbb{R}^N$. While formally equivalent to Euclidean space, $\mathbb{R}^N$ does not correspond to any physical entity that we can visualize; yet, the intuition gained in 2 and 3 dimensions can help us understand the properties of higher-dimensional spaces that defy graphical representation.
Similarly, we can define vectors as tuples of complex numbers; the resulting vector space, $\mathbb{C}^N$, cannot be represented graphically for $N$ greater than one, but properties like norm, orthogonality and distance remain intuitive by analogy to Euclidean space. In general, no matter how abstract the vector space, we can always try to resort to the geometrical interpretation of $\mathbb{R}^2$ or $\mathbb{R}^3$ to get an intuitive grasp on the space's properties.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Vector Space: a Formal Definition}
\subsection{The Axioms}
Following are the axiomatic definitions for a vector space equipped with an inner product. The stuff is dry, no question, but it simply encodes at an abstract level the very reasonable properties we have seen for Euclidean space\footnote{
Although the analogy is a bit tenuous, readers with more than a passing acquaintance with object-oriented programming could be excused for drawing a parallel between the concept of vector space and that of an interface class: internal variables and implementations in derived classes can be arbitrary as long as they conform with the interface.}.
\itempar{Vector Space.} A vector space $H(V, S)$ is defined in terms of $V$, a set of objects called \textit{vectors}, and $S$, a set of numbers called \textit{scalars}; note that we will always assume $S=\mathbb{C}$. For the vector space to have any use, we need to be able to resize and combine vectors together and this is achieved via two operations called, respectively, \textit{addition} and \textit{scalar multiplication}. For $H(V, S)$ to be a vector space, these two operations, while arbitrarily definable, must fulfill the following set of axioms:
For any $\mathbf{x, y, z,} \in V$ and any $\alpha, \beta \in S$:
\begin{itemize}
\item operations don't leave the vector space:
\begin{align}
&\mathbf{x + y} \in V \\
&\alpha\mathbf{x} \in V
\end{align}
\item addition is commutative:
\begin{equation}
\mathbf{x + y = y + x}
\end{equation}
\item addition is associative:
\begin{equation}
\mathbf{(x + y) + z = x + (y + z) }
\end{equation}
\item scalar multiplication is distributive:
\begin{align}
&\alpha \mathbf{(x + y)} = \alpha \mathbf{x} + \alpha \mathbf{y} \\
&(\alpha + \beta) \mathbf{x} = \alpha \mathbf{x} + \beta \mathbf{x} \\
&\alpha(\beta \mathbf{x}) = (\alpha\beta) \mathbf{x}
\end{align}
\item there exists a null vector $\mathbf{0} \in V$ that is the additive identity:
\begin{equation}
\mathbf{x + 0 = 0 + x = x}
\end{equation}
\item for every $\mathbf{x} \in V$ there exists an additive inverse $\mathbf{-x} \in V$:
\begin{equation}
\mathbf{x + (-x) = (-x) + x = 0}
\end{equation}
\item there exists a scalar $1 \in S$ that is the scalar multiplication identity:
\begin{equation}
1 \cdot \mathbf{x} = \mathbf{x} \cdot 1 = \mathbf{x}
\end{equation}
\end{itemize}
Note that no restrictions are set on what's ``inside'' a vector or on the internal mechanics of the vector operations; this leads to the great power of generalization of the vector space paradigm.
\itempar{Inner Product Space.} In order to compare vectors in a space, we need some function that takes two vectors and returns some measure of their similarity. The usual choice is to equip the space with an \textit{inner product} (or \textit{dot product}), namely, a function defined on $V\times V$ that returns a scalar; remember that our set of scalars is $\mathbb{C}$. Again, the inner product can be defined arbitrarily, as long as it satisfies the following properties:
For any $\mathbf{x, y, z} \in V$ and $\alpha \in \mathbb{C}$:
\begin{itemize}
\item the inner product is distributive:
\begin{equation}
\langle \mathbf{x + y, z} \rangle = \langle \mathbf{x, z} \rangle + \langle \mathbf{y, z} \rangle
\end{equation}
\item the inner product is commutative, with complex conjugation:
\begin{equation}\label{eq:vs:inner_prod_comm}
\langle \mathbf{x}, \mathbf{y} \rangle = \langle \mathbf{y}, \mathbf{x} \rangle^*
\end{equation}
\item the inner product scales with respect to scalar multiplication\footnote{
See footnote on page~\pageref{page:vs:order} as for the reason why we conjugate the scalar associated to the left operand.}:
\begin{align}
\langle \alpha\mathbf{x}, \mathbf{y} \rangle &= \alpha^* \langle \mathbf{x}, \mathbf{y} \rangle \\
\langle \mathbf{x}, \alpha\mathbf{y} \rangle &= \alpha \langle \mathbf{ x}, \mathbf{y} \rangle
\end{align}
\item the self-product is real-valued and nonegative:
\begin{equation}
\langle \mathbf{ x},\mathbf{ x}\rangle \geq 0
\end{equation}
\item the self-product is zero only for the null vector:
\begin{equation}
\langle \mathbf{ x},\mathbf{ x} \rangle = 0 \; \Leftrightarrow \; \mathbf{x = 0}.
\end{equation}
\end{itemize}
\itempar{Hilbert Space.} The axioms for addition and scalar multiplication require that the result of these operations remain in the original vector space; as a consequence, all linear combinations of a \textit{finite} number of vectors from $V$ will live in $V$ as well. However, that is not sufficient to guarantee that the limit of a convergent sequence of vectors in $V$ is also in $V$. Closure of a vector space under convergent limiting operations is an independent property called \textit{completeness} and a inner-product space that is complete is called a \textit{Hilbert space}. Completeness is a rather technical property and one that is often difficult to prove rigorously\footnote{In this book, we will really need completeness only once in Chapter~\ref{ch:sampling} when we prove the sampling theorem, and we will tacitly assume the property without proof.}. While there are no elementary examples of a non-complete vector space, to gain some intuition think of the set of rational numbers $\mathbb{Q}$; this set is not complete because there are many sequences in $\mathbb{Q}$ that converge to irrational numbers. Consider for instance, the sequence
\[
s_k = \sum_{n=1}^{k}\frac{1}{n^2};
\]
while it's easy to see that $s_k \in\mathbb{Q}$ for all values of $k$, it can be shown that
\[
\lim_{k\rightarrow\infty} s_k = \frac{\pi}{6} \; \not\in\mathbb{Q}.
\]
The set of rational numbers $\mathbb{R}$, on the other hand, is complete.
\subsection{Norms and Distances}
The properties of the inner-product induce a topology on the vector space, namely, once we have an inner product, we can start to compare vectors and define qualitative and quantitative measures of similarity. The most important concept is orthogonality
\[
\langle \mathbf{x, y} \rangle = 0,
\]
which encodes the notion of maximal difference between vectors. The \textit{norm}, defined in terms of the self inner product
\begin{equation} \label{eq:vs:norm}
\|\mathbf{x}\|^2 = \langle \mathbf{x, x} \rangle,
\end{equation}
is used to quantitatively describe the size of a vector. The norm also extends the concept of Euclidean distance to arbitrary vector spaces as
\begin{equation}
d(\mathbf{x, x}) = \|\mathbf{x - y}\|;
\end{equation}
this distance measure is also known as the Mean Square Error between vectors.
\itempar{Triangle Inequality} To quickly show the power of the abstract definitions above, let's formally prove the \textit{triangle inequality} in an generic inner product space $H(V,\mathbb{C})$. The statement is simple: given any two vectors $\mathbf{x,y} \in V$, it is always
\begin{equation}
\|\mathbf{x + y}\| \leq \|\mathbf{x}\| + \|\mathbf{y}\|
\end{equation}
where the norm is the natural norm induced by the inner product as in~(\ref{eq:vs:norm}). On the Euclidean plane, the above relation is obvious since it states that, for any triangle, the sum of the length of two sides can't be smaller than the length of the third side. The result is however valid for all inner-product spaces and, to prove the inequality in the general case, we first need an intermediate result, called the \textbf{Cauchy-Schwarz inequality}:
\begin{equation}
| \langle \mathbf{x, y} \rangle | \leq \|\mathbf{x}\|\, \|\mathbf{y}\|
\end{equation}
To prove this, assume $\mathbf{x, y \neq 0}$ (otherwise the result is obvious) and consider the orthogonal projection (\ref{eq:vs:orthProj}) of $\mathbf{y}$ onto $\mathbf{x}$
\[
\mathbf{y}_{\mathbf{x}} = \frac{\langle \mathbf{x, y} \rangle}{\langle \mathbf{x, x} \rangle } \, \mathbf{x};
\]
call $\mathbf{d = y - y_x}$ the approximation error and remember that the error is orthogonal to $\mathbf{y_x}$; you can look at a graphical representation of the vectors involved in the proof in Figure~\ref{fig:vs:cauchy_schwarz}. Since $\mathbf{y = y_x + d}$ we can write
\begin{align}\displaystyle
\| \mathbf{y} \|^2 &= \langle \mathbf{y_x + d, y_x + d} \rangle \\
&= \| \mathbf{y_x} \|^2 + \| \mathbf{d} \|^2 + \langle \mathbf{y_x, d} \rangle + \langle \mathbf{d, y_x} \rangle \\
&= \| \mathbf{y_x} \|^2 + \| \mathbf{d} \|^2 \\
&\geq \| \mathbf{y_x} \|^2 \\
&= \frac{|\langle \mathbf{x,y} \rangle|^2}{\| \mathbf{x} \|^2}
\end{align}
where the cross inner products have vanished because of the orthogonality of the error. Getting back to the triangle inequality, we have
\begin{align}\displaystyle
\|\mathbf{x + y}\|^2 &= \langle \mathbf{x + y, x + y} \rangle \\
&= \langle \mathbf{x, x} \rangle + \langle \mathbf{x, y} \rangle + \langle \mathbf{y, x} \rangle + \langle \mathbf{y, y} \rangle \\
&= \| \mathbf{x} \|^2 + 2\Re\{\langle \mathbf{x,y} \rangle \} + \| \mathbf{y} \|^2 \\
&\leq \| \mathbf{x} \|^2 + 2|\langle \mathbf{x,y} \rangle | + \| \mathbf{y} \|^2 \\
&\leq \| \mathbf{x} \|^2 + 2\| \mathbf{x} \|\| \mathbf{y} \| + \| \mathbf{y} \|^2 \\
&= (\| \mathbf{x} \|+ \| \mathbf{y} \|)^2
\end{align}
where we have used~(\ref{eq:vs:inner_prod_comm}), the fact that $|z| \geq \Re\{z\}$ for all $z \in \mathbb{C}$ and, of course, the Cauchy-Schwarz inequality. Please note that nowhere in the proof we made use of the internal structure of the vectors; we only used the axioms.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\center\small
\psset{unit=8mm}
\begin{pspicture}(-3,0)(8,4)
\psline[linecolor=ColorTwo]{->}(6,2)\uput[-45](6,2){$\mathbf{x}$}
\psline[linecolor=ColorThree]{->}(2.4,3.2)\uput[120](2.4,3.2){$\mathbf{y}$}
\psline[linecolor=ColorFour]{->}(3.12,1.04)(2.4,3.2)
\uput[45](3,2){$\mathbf{d}$}
\psline[linecolor=ColorOne]{->}(3.12,1.04)\uput[-45](3.12,1.04){$\mathbf{y_x}$}
\end{pspicture}
\caption{Graphical representation of the vectors used in the proof of the Cauchy-Schwarz inequality.}\label{fig:vs:cauchy_schwarz}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\itempar{Other Norms} We should mention in passing that, although the norm induced by the inner product is the natural norm for a Hilbert space, we are free to define alternative quantitative measures of a vector's size. In Euclidean space, for instance, the so-called $p$-norm is defined as
\begin{equation}
\|\mathbf{x}\|_p = \left(\sum_n |x_n|^p \right)^{1/p}.
\end{equation}
Notable values include
\begin{itemize}
\itemsep-1ex
\item the standard norm for $p=2$
\item the so-called ``Manhattan'' norm for $p=1$
\item the supremum norm for $p=\infty$:
\begin{equation}
\|\mathbf{x}\|_\infty = \max_n |x_n|
\end{equation}
\end{itemize}
The latter, in particular, will reappear in a major role when we study FIR filter design. Norms not derived from the inner product, in order to be valid norms, must be \textit{designed} to satisfy the triangle inequality.
\subsection{Bases and Subspaces} \label{sec:vs:bases}
Given a vector space $H(V,\mathbb{C})$, a subset of $N$ vectors $S = \{\mathbf{s}_k\}_k$, is said to be linearly independent if the equation
\[
\sum_{k=0}^{N-1}\beta_k\mathbf{s}_k = \mathbf{0}
\]
only admits the solution $\beta_k = 0$ for $k = 0, 1, \ldots, N-1$. Linear independence states that the subset $S$ contains no redundancies, that is, none of its members can be expressed as a linear combination of the others. Now, suppose we can determine a subset of $N$ vectors $W = \{\mathbf{w}_k\}_k \subseteq H$ so that any element $\mathbf{x} \in H$ can be written as a linear combination of elements of $W$:
\[
\mathbf{x} = \sum_{k=0}^{N-1}\alpha_k\mathbf{w}_k;
\]
in this case we say that $W$ is a \textit{spanning set} for $H$. If the set is also linearly independent (i.e., it contains no redundancy) then we call $W$ a \textit{basis} for $H$. If $W$ is a basis, the coefficients $\alpha_k$ are unique, as a consequence of linear independence. The laborious process of finding the analysis coefficients $\alpha_k$ for an arbitrary vector $\mathbf{x}$ is greatly simplified if the basis is orthonormal:
\[
\langle \mathbf{w}_k, \mathbf{w}_h \rangle = \begin{cases}
1 & \mbox{for $h = k$} \\
0 & \mbox{for $h \neq k$}
\end{cases};
\]
in this case, it is simply
\[
\alpha_k = \langle \mathbf{w}_k, \mathbf{x} \rangle.
\]
If a basis for $H$ contains $N$ elements, it is relatively straightforward (at least for finite $N$) to show that any other basis will also contain exactly $N$ vectors; $N$ is therefore called the \textit{dimensionality} of $H$. As a consequence, for finite-dimensional vector spaces, any set $S = \{\mathbf{s}_k\}_k$ of $N$ linearly-independent vectors will constitute a basis; indeed, if $S$ did not span $H$ then there exists at lease one nonzero element $\mathbf{x} \in H$ so that
\[
\mathbf{x} \neq \sum_{k=0}^{N-1}\alpha_k\mathbf{s}_k \quad \mbox{for all $\alpha_k$},
\]
that is, $S' = \{\mathbf{s}_0,\ldots,\mathbf{s}_{N-1}, \mathbf{x}\}$ is a set of $N+1$ linearly independent vectors. Now, if $S'$ spans $H$, it will be $\dim(H) = N+1$, which is a contradiction; otherwise we can repeat the procedure and augment $S'$ until we have a basis, but, a fortiori, the dimensionality of $H$ will be larger than $N$.
Finally, mutually orthogonal vectors are inherently linearly independent; pick a set of $N$ nonzero orthogonal vectors $\{\mathbf{w}_k\}_k$ and assume we can write
\[
\sum_{k=0}^{N-1} c_k\mathbf{w}_k = \mathbf{0}
\]
with at least one nonzero coefficient $c_i$. By the properties of the inner product, since $\langle \mathbf{0}, \mathbf{x} \rangle = 0$, we have
\[
\langle \mathbf{0}, \mathbf{w}_i \rangle = \langle \sum_{k=0}^{N-1} c_k\mathbf{w}_k, \mathbf{w}_i \rangle = c_i \|\mathbf{w}_i\|^2 = 0.
\]
As a consequence, if we can find a set of $N$ orthogonal vectors in a space of dimension $N$, we automatically have found a basis.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Vector Spaces in Signal Processing}
After an informal introduction using Euclidean space and a precise axiomatic definition, we are now ready to see how vector spaces are going to be used for signal processing in the rest of the book.
\subsection{Discrete-time Signals}
\label{sec:vs:dt_space}
As we have seen in the previous chapter, in signal processing it is common practice to use the expression $x[n]$ to indicate a discrete-time signal in its entirety; this is an abuse of notation, since the term $x[n]$ in fact pinpoints the value of the signal $x[\cdot]$ at a specific index value $n$. Vector notation, on the other hand, provides us with a much more compact and precise symbolism: we can use the symbol $\mathbf{x}$ to indicate an entire signal and the expression $x[n]$ to indicate its $n$-th value.
\itempar{Finite-length signals} Length-$N$ signals are simply ordered tuples of $N$ complex numbers and therefore they belong to $\mathbb{C}^N$:
\begin{equation}
\mathbf{x} = \begin{bmatrix}
x[0] \\ x[1] \\ \vdots \\ x[N-1]
\end{bmatrix}
\end{equation}
Addition and scalar multiplication are defined in the intuitive, straightforward way. The inner product is defined as
\begin{equation}\label{eq:vs:inner_product_rn}
\langle \mathbf{x, y} \rangle = \sum_{n=0}^{N-1} x^*[n] \,y[n].
\end{equation}
We will soon see that, in signal processing, a \textit{filter} works by implementing an inner product; indeed, by filtering a data set, we are evaluating the local degree of similarity of the input with a prototype ``shape'', known as the filter's impulse response. Similarly, many communication systems work by comparing the incoming waveform to a set of predefined ``pulses'' that encode different symbols; the measure of similarity that is used for the selection of the closest pulse is also an inner product.
The canonical basis for $\mathbb{C}^N$ also intuitive:
\begin{equation}
\boldsymbol{\delta}_0 = \begin{bmatrix}1 \\ 0 \\ \vdots \\ 0\end{bmatrix}, \quad
\boldsymbol{\delta}_1 = \begin{bmatrix}0 \\ 1 \\ \vdots \\ 0\end{bmatrix}, \quad \ldots \quad
\boldsymbol{\delta}_{N-1} = \begin{bmatrix}0 \\ 0 \\ \vdots \\ 1\end{bmatrix} .
\end{equation}
Another very important basis for $\mathbb{C}^N$ is the Fourier basis $\{\mathbf{w}_k\}$ with
\begin{equation}
w_k[n] = e^{j\frac{2\pi}{N}nk};
\end{equation}
we will study the Fourier basis in detail in the next chapter.
Given a basis $\{\mathbf{s}_k\}$, we know that we can express any vector in $\mathbb{C}^N$ as a linear combination of basis elements:
\begin{equation}\label{eq:vs:synthesis}
\mathbf{x} = \sum_{k=0}^{N-1} \alpha_k \mathbf{s}_k;
\end{equation}
if the basis is orthonormal, the coefficients $\alpha_k$ can be easily computed using the \textit{analysis formula}
\begin{equation}\label{eq:vs:analysis}
\alpha_k = \langle \mathbf{s}_k, \mathbf{x} \rangle.
\end{equation}
In $\mathbb{C}^N$ a particularly nice thing happens: the set of $N$ coefficients $\alpha_k$ is itself a member of $\mathbb{C}^N$ and this opens up the full linear algebra toolbox for us to use. For example, remembering the definition of inner product in~(\ref{eq:vs:inner_product_rn}), we can build an $N\times N$ matrix $\mathbf{M}$ by stacking the conjugates of the basis vectors as rows\footnote{The notation $\mathbf{x}^H$ denotes the \textit{Hermitian transpose} i.e. a vector or a matrix where the original elements have been transposed \textit{and} conjugated.}
\begin{equation}
\def\arraystretch{1.7}
\mathbf{M} = \left[ \begin{array}{c}
\mathbf{s}_0^H \\ \hline
\mathbf{s}_1^H \\ \hline
\vdots \\ \hline
\mathbf{s}_{N-1}^H
\end{array} \right]
\end{equation}
and, with this, obtain all the coefficient $\alpha_k$ in one go using the matrix-vector product:
\begin{equation}
\mbox{\boldmath{$\alpha$}} = \mathbf{Mx}.
\end{equation}
Note that $\mathbf{M}^H$ is a matrix whose columns are simply the basis vectors side by side; we can therefore write
\begin{equation}
\mathbf{x} = \mathbf{M}\mbox{\boldmath{$\alpha$}}
\end{equation}
since
\begin{equation}
\mathbf{MM}^H = \mathbf{M}^H\mathbf{M} = \mathbf{I}.
\end{equation}
The possibility of applying the constructs of linear algebra to signal manipulation further allows us to cast all linear operations on signals as matrix-vector multiplications. Although we will not really pursue this approach in the rest of the book, some examples are in the exercise section for Chapter~\ref{ch:lti}.
\itempar{Periodic Signals} Periodic signals of period $N$ are completely described by $N$ consecutive samples and therefore they are equivalent to finite-length signals of length $N$; they also live in $\mathbb{C}^N$ and, when the periodicity is to be made explicit, we will sometimes use the notation $\tilde{\mathbb{C}}^N$.
\itempar{Infinite-length signals} From a purely formal point of view, we can let $N$ in $\mathbb{C}^N$ grow to infinity and, by keeping the same definitions for sum, scalar multiplication and inner product, we obtain the vector space $\mathbb{C}^\infty$ (sometimes also notated as $\mathbb{C}^{\mathbb{Z}}$ since, in signal processing, we are usually using both positive and negative indexes). Because the resulting space is now infinite-dimensional, we need to exercise much caution; for example, it is immediate to see that the inner product can diverge even for elementary signals such as a constant signal whose samples are all equal to one; indeed
\begin{equation}
\langle \mathbf{x, x} \rangle = \sum_{n=-\infty}^{\infty} x^*[n] x[n] = \sum_{n=-\infty}^{\infty} 1 = \infty.
\end{equation}
To avoid these type of problems we can consider instead the set of infinite-length vectors whose Euclidean norm is finite, that is, the space of square-summable sequences; the space is called $\ell_2(\mathbb{Z})$, with
\begin{equation}
\mathbf{x} \in \ell_2(\mathbb{Z}) \iff \|\mathbf{x}\|^2 = \sum_{n \in\mathbb{Z}} \big|x[n]\big|^2 < \infty,
\end{equation}
and is also known as the space of \textit{finite-energy} signals, given the definition of energy in~(\ref{eq:dts:energy}). In this space the inner product is always well defined, as can be shown by using the Cauchy-Schwarz inequality. Be forewarned, however, that there are a lot of interesting signals that are not in $\ell_2(\mathbb{Z})$ and we will see how to deal with them while trying to preserve the vector space formalism.
In the case of infinite dimensions, the concept of basis also becomes tricky since it is not necessarily safe to allow for linear combinations of vectors with an infinite number of terms in the sum. In $\ell_2(\mathbb{Z})$, however, all is well: the canonical basis for the space is the extension to infinite dimensions of the canonical basis for $\mathbb{C}^N$, that is, an infinite set of vectors in which only a single element at a time is equal to one.
\subsection{Function Spaces}
\label{sec:vs:funcs}
Function vector spaces, as the name implies, are spaces in which the vectors are functions; since we know from calculus how to combine, rescale and measure the difference between functions, we can re-cast those operations in such a way that they comply with the axioms of a vector space. Here we are really performing a task of profound abstraction because our vectors are no longer tuples of numbers but sophisticated mathematical entities.
In this book we will only be interested in two specific function spaces, the space of square-integrable functions over an interval and the space of bandlimited functions on the real line; we will wait until Chapter~\ref{ch:is} to introduce the latter, but the former is easily described: consider an interval $I$ and the set of complex-valued functions of a real variable $\mathbf{x} = x(t)$ such that
\begin{equation}
\int_I |x(t)|^2 \de t < \infty.
\end{equation}
The space of all such functions is denoted by $L_2(I)$; addition and scalar multiplication are defined in the intuitive manner:
\begin{align}
\mathbf{x+y} &= x(t) + y(t) \\
\alpha\mathbf{x} &= \alpha x(t)
\end{align}
while the inner product is
\begin{equation}\label{eq:vs:inner_product_function}
\langle \mathbf{x, y} \rangle = \int_I x^*(t)\,y(t) \de t.
\end{equation}
Note that the norm induced by the inner product is
\begin{equation}
\|\mathbf{x}\|^2 = \langle \mathbf{x, x} \rangle = \int_I |x(t)|^2 \de t
\end{equation}
so that all elements of $L_2(I)$ have a well-defined norm.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\center
\begin{dspPlot}[xticks=1,xout=true,yticks=2,sidegap=0]{-1, 1}{-1.25, 1.25}
\dspText(.2,-.8){$\mathbf{x} = \sin(\pi t)$, antisymmetric}
\dspText(-.6,.8){$\mathbf{y} = 1-|t|$, symmetric}
\pscustom[fillstyle=solid,fillcolor=green!30,linestyle=none]{%
\dspFunc{x 180 mul sin x abs 1 exch sub mul dup 0 ge {} {pop 0} ifelse}}
\pscustom[fillstyle=solid,fillcolor=red!30,linestyle=none]{%
\dspFunc{x 180 mul sin x abs 1 exch sub mul dup 0 ge {pop 0} {} ifelse}}
\dspFunc[linecolor=ColorOne]{x 180 mul sin}
\dspFunc[linecolor=ColorTwo]{x abs 1 exch sub }
\end{dspPlot}
\caption{Orthogonality of symmetric and antisymmetric functions. The shaded areas show the integration area in the inner product.}\label{fig:vs:orthogonal_functions}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The concept of orthogonality carries on, with the same meaning, to function spaces. Consider for instance the two vectors
\begin{align}
\mathbf{x} &= \sin(\pi t)\\
\mathbf{y} &= 1 - |t|
\end{align}
over $I = [-1, 1]$, as shown in Figure~\ref{fig:vs:orthogonal_functions}. Their inner product is the integral
\[
\int_{-1}^{1} (1 - |t|)\sin(\pi t) \de t
\]
which, graphically, is the sum of the positive green area and the negative red areas in the picture; since this sum is zero, the functions are orthogonal. This makes sense since the first function is odd and the second is even: clearly no symmetric shape can be captured by an antisymmetric curve. Similarly, two sinusoids at different multiples of a fundamental frequency are also orthogonal, as shown in Figure~\ref{fig:vs:orthogonal_sinusoids}; the orthogonality of harmonically-related sinusoids is one of the key concepts in signal processing.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\center
\begin{dspPlot}[yticks=1,xticks=1,sidegap=0,xout=true]{-1, 1}{-1.2, 1.2}
\pscustom[fillstyle=solid,fillcolor=green!30,linestyle=none]{%
\dspFunc{x 180 mul 4 mul sin x 180 mul 5 mul sin mul dup 0 ge {} {pop 0} ifelse}}
\pscustom[fillstyle=solid,fillcolor=red!30,linestyle=none]{%
\dspFunc{x 180 mul 4 mul sin x 180 mul 5 mul sin mul dup 0 ge {pop 0} {} ifelse}}
\dspFunc[linecolor=ColorOne]{x 180 mul 4 mul sin}
\dspFunc[linecolor=ColorTwo]{x 180 mul 5 mul sin}
\end{dspPlot}
\caption{Orthogonality of sinusoids $\mathbf{x} = \sin(4\pi t)$ and $\mathbf{y} = \sin(5\pi t)$ over $[-1, 1]$; the sinusoids' frequencies are multiples of the fundamental frequency for the interval. The sum of the shaded areas yields the value of the integral in the inner product, which is zero.}\label{fig:vs:orthogonal_sinusoids}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Function vector spaces possess bases as well, obviously. For instance, an orthonormal basis for $L_2 \bigl([-\pi, \pi] \bigr)$ is the well-known Fourier basis
\begin{equation}
\{ (1/\sqrt{2\pi}) \,e^{jnt} \}_{n \in \mathbb{Z}}.
\end{equation}
The number of vectors is understandably infinite but, perhaps surprisingly, they forms a \textit{countable} set. While it's easy to verify that the vectors are orthogonal, proving that they indeed span $L_2 \bigl([-\pi, \pi] \bigr)$ is a much harder task --- unfortunately, this is very much the rule for all non-trivial, infinite-dimensional basis sets, as we will have to admit again for the space of bandlimited functions in Chapter~\ref{ch:sampling}.
\begincoda
\section{Appendix: Approximation via Legendre Polynomials}
In this section we will illustrate how the concept of subspace projection can lead to optimal polynomial approximation (in the min-square-error sense) for functions over an interval. This example will give us the chance to talk about polynomial vector spaces, orthonormalization procedures and optimal approximation via subspace projections with a concrete case study. While the section is not a direct signal processing application, approximation by projection is routinely used in compression algorithm and the concepts illustrated here will show the flexibility of the vector space approach.
Let's start by considering $P_N(I)$, the vector space of all polynomials of degree up to $N-1$ over the interval $I$; a generic element of $P_N(I)$ is of the form
\begin{equation}
\mathbf{a} = a_0 + a_1 t + \ldots + a_{N-1}t^{N-1}, \quad a_k\in\mathbb{R}, t\in I.
\end{equation}
Addition and scalar multiplication are defined according to standard polynomial algebra and, since $P_N(I) \in L_2(I)$, we will use the standard definition for the inner product as in~(\ref{eq:vs:inner_product_function}). A natural basis for $P_N(I)$ is the family
\begin{equation}
\{1, t, t^2, \ldots, t^{N-1}\}
\end{equation}
since every vector is a linear combination of powers of $t$. The natural basis, however, is not orthonormal so before we can address the issue of subspace approximation we need to rectify that.
\itempar{Gram-Schmidt Orthonormalization} A basis for a vector (sub)space can be transformed into an orthonormal basis via the Gram-Schmidth iteration. Given a basis $\{\mathbf{s}_k\}$, the procedure progressively ``whittles'' away from each basis vector the portions that are linearly dependent to the previously processed elements. In detail, the orthonormal basis $\{\mathbf{u}_k\}$ is built one vector at a time by first converting $\mathbf{s}_m$ into $\mathbf{p}_m$, a vector orthogonal to the set of the previously-computed $\{\mathbf{u}_0, \ldots, \mathbf{u}_{m-1}\}$, and then by normalizing $\mathbf{p}_m$ and adding it to the pool:
\begin{align}
\mathbf{p}_m &= \mathbf{s}_m - \sum_{k=0}^{m-1} \langle \mathbf{s}_m, \mathbf{u}_k \rangle \, \mathbf{u}_k \\
\mathbf{u}_m &= \mathbf{p}_m/ \|\mathbf{p}_m\|.
\end{align}
A graphical example of the orthonormalization procedure for two vectors in $\mathbb{R}^2$ is shown in Figure~\ref{fig:vs:gram_schmidt}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t]
\center\small
\psset{unit=5mm}
\begin{tabular}{cc}
\begin{pspicture}(0,0)(10,5)
\psline{->}(5,4)\rput[bl]{0}(5,4){~$\mathbf{s}_1$}
\psline{->}(10,2)\rput[tl]{0}(10,2){~$\mathbf{s}_0$}
\end{pspicture}
\hspace{1em}
&
\hspace{1em}
\begin{pspicture}(0,0)(10,5)
\psline[linecolor=lightgray]{->}(5,4)\rput[bl]{0}(5,4){~$\mathbf{s}_1$}
\psline[linecolor=lightgray]{->}(10,2)\rput[tl]{0}(10,2){~$\mathbf{s}_0$}
\psline[linecolor=red]{->}(10,2)
\rput[bl]{0}(10,2){~$\mathbf{p}_0^{~}$}
\end{pspicture}
\\ (a) & (b) \\
\begin{pspicture}(0,0)(10,5)
\psline[linecolor=lightgray]{->}(5,4)\rput[bl]{0}(5,4){~$\mathbf{s}_1$}
\psline[linecolor=lightgray]{->}(10,2)\rput[tl]{0}(10,2){~$\mathbf{s}_0$}
\psline[linecolor=blue]{->}(3.9223, 0.7844)
\rput[tl]{0}(3.9223, 0.7){\color{blue}~$\mathbf{u}_0$}
\end{pspicture}
\hspace{1em}
&
\hspace{1em}
\begin{pspicture}(0,0)(10,5)
\psline[linecolor=lightgray]{->}(5,4)\rput[bl]{0}(5,4){~$\mathbf{s}_1$}
\psline[linecolor=lightgray]{->}(10,2)\rput[tl]{0}(10,2){~$\mathbf{s}_0$}
\psline[linecolor=blue]{->}(3.9223, 0.7844)
\rput[tl]{0}(3.9223, 0.7){\color{blue}~$\mathbf{u}_0$}
\psline[linecolor=gray,linestyle=dashed](5.57,1.11)(5,4)
\psline[linecolor=red!50]{->}(5.57,1.11)
\rput[tl]{0}(5.57,1.11){\color{red!50}$\langle \mathbf{u}_0, \mathbf{s}_1 \rangle \mathbf{u}_0$}
\end{pspicture}
\\ (c) & (d) \\
\begin{pspicture}(0,0)(10,5)
\psline[linecolor=lightgray]{->}(5,4)\rput[bl]{0}(5,4){~$\mathbf{s}_1$}
\psline[linecolor=lightgray]{->}(10,2)\rput[tl]{0}(10,2){~$\mathbf{s}_0$}
\psline[linecolor=blue]{->}(3.9223, 0.7844)
\rput[tl]{0}(3.9223, 0.7){\color{blue}~$\mathbf{u}_0$}
\psline[linecolor=red!50]{->}(5,4)(-0.57,2.89)
\psline[linecolor=red]{->}(-0.57,2.89)
\rput[bl]{0}(-0.57,3){~$\mathbf{p}_1$}
\psline[linecolor=gray,linestyle=dashed](5.57,1.11)(5,4)
\end{pspicture}
\hspace{1em}
&
\hspace{1em}
\begin{pspicture}(0,0)(10,5)
\psline[linecolor=lightgray]{->}(5,4)\rput[bl]{0}(5,4){~$\mathbf{s}_1$}
\psline[linecolor=lightgray]{->}(10,2)\rput[tl]{0}(10,2){~$\mathbf{s}_0$}
\psline[linecolor=blue]{->}(3.9223, 0.7844)
\rput[tl]{0}(3.9223, 0.7){\color{blue}~$\mathbf{u}_0$}
\psline[linecolor=blue]{->}(-0.7740, 3.9244)
\rput[br]{0}(-0.7740, 3.9244){\color{blue}~$\mathbf{u}_1$}
\end{pspicture}
\\ (e) & (f)
\end{tabular}
\caption{The Gram-Schmidt algorithm applied to a basis for $\mathbb{R}^2$.}\label{fig:vs:gram_schmidt}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\itempar{Legendre Polynomials} Let's apply the Gram-Schmidt algorithm to the naive basis for $P_N([-1,1])$:
\begin{itemize}
\item $m=0$: $\mathbf{s}_0 = 1$
\begin{itemize}
\item $\mathbf{p}_0 = \mathbf{s}_0 = 1$
\item $\|\mathbf{p}_0\|^2 = 2$
\item $\mathbf{u}_0 = \mathbf{p}_0/\|\mathbf{p}_0\| = \sqrt{1/2}$
\end{itemize}
\item $m=1$: $\mathbf{s}_1 = t$
\begin{itemize}
\item $\langle \mathbf{s}_1, \mathbf{u}_0 \rangle = \int_{-1}^{1}t/\sqrt{2} = 0$
\item $\mathbf{p}_1 = \mathbf{s}_1 = t$
\item $\|\mathbf{p}_1\|^2 = 2/3$
\item $\mathbf{u}_1 = \sqrt{3/2}\,t$
\end{itemize}
\item $m=2$: $\mathbf{s}_2 = t^2$
\begin{itemize}
\item $\langle \mathbf{s}_2, \mathbf{u}_0 \rangle = \int_{-1}^{1}t^2/\sqrt{2} = 2/3\sqrt{2}$
\item $\langle \mathbf{s}_2, \mathbf{u}_1 \rangle = \int_{-1}^{1}t^3/\sqrt{2} = 0$
\item $\mathbf{p}_2 = \mathbf{s}_2 - (2/3\sqrt{2})\mathbf{u}_0 = t^2 - 1/3$
\item $\|\mathbf{p}_2\|^2 = 8/45$
\item $\mathbf{u}_2 = \sqrt{5/8}(3t^2-1)$
\end{itemize}
\item $m=3$: $\mathbf{s}_2 = t^3$
\begin{itemize}
\item $\ldots$
\end{itemize}
\end{itemize}
The resulting family of orthonormal basis vectors is known under the name of Legendre polynomials; the basis for $P_6([-1,1])$ is plotted in Figure~\ref{fig:vs:legendre_basis}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\center
\begin{dspPlot}[width=0.9\textwidth,yticks=1,xticks=1,height=5cm,sidegap=0,xout=true]{-1, 1}{-3, 2.3}
\begin{dspClip}
\dspFunc[linecolor=black]{ 1 2 div sqrt 1 mul}%
\dspFunc[linecolor=red]{ 3 2 div sqrt x mul}%
\dspFunc[linecolor=yellow]{5 2 div sqrt 3 x x mul mul 1 sub mul 2 div}%
\dspFunc[linecolor=green]{ 7 2 div sqrt 5 x x x mul mul mul -3 x mul add mul 2 div}%
\dspFunc[linecolor=blue]{ 9 2 div sqrt 35 x x x x mul mul mul mul -30 x x mul mul 3 add add mul 8 div}%
\dspFunc[linecolor=cyan]{ 11 2 div sqrt 63 x x x x x mul mul mul mul mul -70 x x x mul mul mul 15 x mul add add mul 8 div}%
\end{dspClip}
\dspCustomTicks{-1 -1 0 0 1 1}%
\dspLegend(-.5,-1.5){black $\mathbf{u}_0$ red $\mathbf{u}_1$ yellow $\mathbf{u}_2$}
\dspLegend(0, -1.5){green $\mathbf{u}_3$ blue $\mathbf{u}_4$ cyan $\mathbf{u}_5$}
\end{dspPlot}
\caption{The Legendre basis for $P_6([-1,1])$.}\label{fig:vs:legendre_basis}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\itempar{Subspace approximation} Now that we have an orthonormal polynomial basis for the interval, let's try to approximate a function via subspace projection. We will pick an element from $L_2([-1,1])$
\[
\mathbf{x} = \sin t
\]
and project it over the basis for $P_3([-1,1])$ that we computed before. We will then compare the result to another well known linear approximation for the sine, namely its Taylor series expansion.
The expansion coefficients over the Legendre basis are readily computed:
\begin{align}
\alpha_0 &= \langle \mathbf{x}, \mathbf{u}_0 \rangle = \langle \sin t, \sqrt{1/2} \rangle = 0 \\
\alpha_1 &= \langle \mathbf{x}, \mathbf{u}_1 \rangle = \langle \sin t, \sqrt{3/2} \, t \rangle \approx 0.7377 \\
\alpha_2 &= \langle \mathbf{x}, \mathbf{u}_2 \rangle = \langle \sin t, \sqrt{5/8}(3t^2 - 1) \rangle = 0;
\end{align}
note that we need to explicitly compute just the value for $\alpha_1$ since, in the other two cases, the inner products are between an even and an odd function, which we know to be orthogonal. In the end, the least-squares approximation to $\sin t$ over $P_3([-1,1])$ is:
\begin{equation}
\hat{\mathbf{x}} = \alpha_1\, \mathbf{u}_1 \approx 0.9035\,t.
\end{equation}
By comparison, the second-degree Taylor approximation of the sine around zero is the function $\mathbf{y} = t$. Although $\mathbf{y}$ and $\hat{\mathbf{x}}$ seem very similar, let's look at the approximation error over the interval as shown in Figure~\ref{fig:vs:legendre_vs_taylor}. Although the Taylor approximation is more precise around the origin, we can see that the Legendre approximation minimizes the \textit{global} error, i.e. the mean square error over the entire interval.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}[t!]
\center
\begin{dspPlot}[yticks=1,xticks=1,sidegap=0,xout=true]{-1, 1}{0, 0.22}
\dspFunc[linecolor=red]{x 3.14 div 180 mul sin x sub abs}
\dspFunc[linecolor=green]{x 3.14 div 180 mul sin x 0.9035 mul sub abs}
\dspLegend(-0.5,0.2){red {$|\sin t - t|$} green {$|\sin t - 0.9035t|$}}
\end{dspPlot}
\caption{Approximation erro using the optimal Legendre projection vs Taylor series.}\label{fig:vs:legendre_vs_taylor}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\endcoda

Event Timeline