\section{Spectral Representation of Random Signals}
Obtaining a meaningful frequency-domain representation of a stochastic process is perhaps a bit less straightforward than expected. From a practical point of view, given a finite set of samples from a random process, we can obviously always compute a DFT; the question, however, is whether the result is indicative at all of the \textit{general} spectral properties of the underlying process. Let's illustrate the problem with a simple example using the coin-tossing machine described in~(\ref{eq:rn:rademacher}).
Figure~\ref{fig:rn:dft1} shows three 32-point data sets generated by the system, together with the magnitude of their DFT. Each realization is clearly different and so are the associated transforms, so that it's really hard to discern a pattern in the spectral plots. We could try to increase the length of the datasets, as in Figure~\ref{fig:rn:dft2}, but the result is the same: no clear information is emerging from the (now longer) DFT's.
Given the random nature of the signal, we may be tempted to take the expected value of the transform (in this case by taking ensemble averages, since we can easily generate realizations); however both the DFT and the expectation are linear operators, and the expected value of the signal is zero for all $n$, so we have:
This is intuitively dissatisfying since we can see that the signal ``moves'' up and down in time and therefore the averaged spectrum should show some energy content. Since energy is related to the squared norm of the data, let's try an approach based on the squared magnitude of the DFT: given $N$ points of a realization of the process, its \textit{periodogram} is defined as the squared magnitude of the DFT, normalized by the length of the data set:
the normalization factor will keep the result independent of the number of data points. If we now average the periodogram over $L$ realizations of the process, we can see from Figure~\ref{fig:rn:sqdftavg} that, as $L$ grows, $\expt{P_N[k]} \approx1$; we can interpret this by saying that this particular random process contains ``energy'' at all frequencies in equal amounts. Intuitively this is actually quite reasonable since a realization of this process could very well turn out as a sequence of all identical values (which would have energy only at $k =0$), or as a sequence of strictly alternating values (which would have energy only at $k = N/2$), or as any other combination of values in between.
In the remainder of this section we will formalize this intuition and derive the expression for the preferred spectral representation of a general (wide-sense) stationary random process.
In Section~\ref{ErgPowSec} we discussed the difference between \textit{energy} and \emph{power} signals. An energy signal $\mathbf{x}$ is a square-summable sequence and its total energy is given by its squared norm; an energy signal has a well-defined Fourier transform and, because of Parseval's theorem, its norm in $\ell_2(\mathbb{Z})$ is equal to its norm in $L_2([-\pi,\pi])$:
From this, the squared magnitude of the DTFT can be interpreted as the \textit{energy spectral density} of the signal, expressed in units of energy over units of frequency; in practical terms, if we were to filter an energy sequence with an ideal bandpass filter, the total energy in the output would be proportional to the integral of $|X(e^{j\omega})|^2$ over the support of the filter's passband. Interestingly, the energy spectral density is related to $\mathbf{r_x}$, the deterministic autocorrelation of an energy signal, which is the convolution of the signal with a time-reversed version of itself:
\[
r_x[k]=\sum_{n=-\infty}^{\infty} x[n]x[n+k].
\]
It is easy to show that
\[
|X(e^{j\omega})|^2=\DTFT{\mathbf{r_x}}
\]
and this link between autocorrelation and energy distribution will reappear in slightly different form when we define a suitable spectral representation for random signals.
Signals that are not square-summable are generally problematic (think for instance of the exponentially growing sequences generated by unstable LTI systems); one exception is the class of \textit{power signals}, for which the amount of energy per unit of time remains finite; in that case the limit
exists and it represents the signal's average power.\footnote{
%
In the case of $M$-periodic sequences the power is simply \textit{defined} as $(1/M)\sum_{n=0}^{M-1}|x[n]|^2$.}.
%
In Section~\ref{sec:fa:dirac} we have shown how the DTFT formalism can be extended to successfully handle power signals via the use of Dirac delta functionals (indeed, the presence of a Dirac in a spectrum is \textit{the} tell-tale sign of infinite energy). But the concept of energy spectral density cannot be applied directly to power signals: intuitively this makes sense since the energy is infinite and, from a mathematical point of view, this is consistent with the fact that we simply cannot take the square of a Dirac delta. The solution is to move to a spectral representation that describes the signal's \textit{power} distribution in frequency instead; to do so, we can start with the formula for a truncated DTFT over $2M+1$ points around the origin:
which always exists and whose squared magnitude is the spectral energy distribution of the signal for the time interval $[-M, M]$. The \textit{power spectral density} (PSD) is defined as
and its physical dimension is expressed in units of energy over units of time over units of frequency. Obviously, the PSD is a $2\pi$-periodic, real, and non-negative function of $\omega$ and it is equal to zero for energy signal; in physical terms, if we put a power sequence through an ideal bandpass filter, the total power available the output is going to be proportional to the integral of the PSD over the support of the filter's passband.
Looking back at the power signals we have encountered so far, it can be shown for instance (see also Exercise~\ref{ex:rn:psd1}) that the PSD of the constant signal $\mathbf{x} : x[n]=\alpha$ is
\begin{equation}
P(e^{j\omega}) = \alpha^2\tilde{\delta}(\omega)
\end{equation}
whereas for the scaled unit step $\alpha\mathbf{u}$ we have
where $\tilde{\mathbf{X}}$ is the DFS of $\tilde{\mathbf{x}}$; this shows that, as we would expect, the power of a periodic signal is concentrated over the multiples of the fundamental frequency.
\caption{Computing the value of $c_{2M}(k)$ in~(\ref{eq:rn:ck}) as the number of ways a segment of length $k$ can fit into the $[-M, M]$ interval (here, $k=2$ and $M =2$).}\label{fig:rn:segfit}
Wide-sense stationary processes behave in expectation just like power signals; indeed (assuming without loss of generality a zero-mean, real-valued process), we have that
so that, as $M$ goes to infinity, the energy grows unbounded but the average power remains finite as long as the variance is finite. The truncated DTFT in~(\ref{eq:rn:truncDTFT}), when applied to the process, yields a random variable, parameterized by $\omega$, whose expected squared magnitude is:
In order to reduce the double summation to a single summation, set $k = n-m$; as $n$ and $m$ vary from $-M$ to $M$, $k$ will range from $-2M$ to $2M$ so that we can write
where $c_{2M}(k)$ is the number of times $m-n = k$ when $n$ and $m$ range over all values in the $[-M, M]$ interval. One easy way to find $c_{2M}(k)$ is to realize that it represents the number of distinct ways that a segment of length $k$ can fit into an interval of length $2M$ (see Figure~\ref{fig:rn:segfit}); after some simple geometrical considerations, this number turns out to be
\begin{equation}\label{eq:rn:ck}
c_{2M}(k) = 2M + 1 - |k|.
\end{equation}
To obtain the power spectral density we need to normalize the expected squared value by the length of the data set, and then take the limit as the size of the interval grows to infinity:
Since $\bigl| w_M(k) r_x [k]\, e^{-j\omega k} \bigr| \leq\bigl| r_x[k]\bigr|$, if the autocorrelation is absolutely summable then the sum~(\ref{eq:rn:PSDSum}) converges uniformly to a continuous function of $M$ and we can therefore move the limiting operation inside the sum. Now the key observation is that the weighting term $w_M(k)$, considered as a function of $k$, converges to the constant one as $M$ grows large, as shown graphically in Figure~\ref{fig:rn:weighFun},
\[
\lim_{M \rightarrow\infty} w_M(k)=1
\]
so that in the end we obtain \index{power spectral density|mie}
This fundamental result states that the power spectral density of a WSS random process is the \emph{discrete-time Fourier transform of the autocorrelation of the process}.
An important (if very specific) class of WSS random processes is that for which the underlying random variables are all zero-mean and statistically uncorrelated (that is, $\expt{x[n]x[m]} =\expt{x[n]}\expt{x[m]}$). In this case the autocorrelation simplifies to
\begin{equation}\label{eq:rn:whitenoisecorr}
\mathbf{r}_x = \sigma^2_x \,\boldsymbol{\delta}
\end{equation}
where $\sigma^2_x$ is the variance (or the average power) of the process. A process with these properties is called \textit{white noise} and it is a very useful model for additive noise sources. While not a necessary requirement, it is common to assume that the samples in a white noise process are identically distributed; according to the underlying pdf, we can therefore have Gaussian white noise, uniform white noise or any other flavor of white noise.
Using~(\ref{eq:rn:whitenoisecorr}), we can see that the PSD of white noise is simply
\begin{equation}\label{eq:rn:whitenoisepsd}
P_x(e^{j\omega}) = \sigma^2_x,
\end{equation}
that is, the power of a white noise process is uniformly distributed in frequency; this is in fact the reason behind the name: in optics, white light possesses approximately equal power over the visible frequency band.
Note that the coin-tossing signal in~(\ref{eq:rn:rademacher}) is indeed a white noise process since the samples are statistically independent (which implies they are uncorrelated) and identically distributed with zero mean. Indeed, the experimental results that we described at the beginning of this section lead to a power spectral density consistent with~(\ref{eq:rn:whitenoisepsd}).
\subsection{Filtering a Stochastic Process}
\label{sec:rn:filtering}
Given a WSS random process $\mathbf{x}$ and a deterministic \textit{stable} filter with impulse response $\mathbf{h}$, the filtered process
is itself a random process, since at each output index $n$ we are computing a linear combination of random variables. We will now show that $\mathbf{y}$ is itself WSS and we will derive the equivalent of the convolution theorem for WSS random processes.
\itempar{Time-Domain Analysis.}
The expected value of the filter's output at $n$ is
&= \sum_{k = -\infty}^{\infty}\sum_{i = -\infty}^{\infty} h[k]h[i] r_x[(n - m) - k + i];
\end{align*}
since the result depends only on the difference $n-m$, we have proven that a filtering operation preserves the WSS nature of the input. With a simple change of variable we can rewrite the expression for the output autocorrelation as
where $\mathcal{R}$ is the time-reversal operator. In other words, the autocorrelation of the output can be obtained by convolving the autocorrelation of the input with a sequence that is the convolution of the filter's impulse response with its time-reversed version. Note that $\mathbf{h} \ast\mathcal{R}\{\mathbf{h}\}$ is symmetric and therefore zero-phase.
In certain applications, we may be interested in the cross-correlation between input and output; it is easy to show that we have
\begin{equation}\label{eq:rn:filtcrosscorr}
\mathbf{r}_{xy} = \mathbf{h}\ast\mathbf{r}_x.
\end{equation}
\itempar{Frequency-Domain Analysis.}
Using the result in~(\ref{eq:rn:outputcorr} it is straightforward to obtain the relationship between input and output in the frequency domain:
The above result is the stochastic equivalent of~(\ref{eq:ls:convtheo}) and represents the fundamental result of stochastic signal processing, namely, the frequency-domain characteristics of LTI systems carry over (in magnitude) to the domain of random processes. In other words, a frequency-selective filter designed for deterministic signals can be used in a stochastic context and it will shape the \textit{power} distribution of the input process in the same way it would shape the energy distribution of a deterministic input.
The frequency response of the filter appears in~(\ref{eq:rn:filtpsd}) in magnitude only; an intuitive interpretation of that fact starts with recalling that a signal's phase component, for a given magnitude spectrum, determines a signal's ``shape'' in time. When it comes to random processes, we can make no prediction about the exact shape of a given realization, only about the average distribution in frequency of the power of the process. Therefore, when filtering a random process, we cannot hope to control the phase of the output signal.
\subsection{MA, AR, and ARMA Processes}
\label{sec:rn:arma}
The results in the previous section seem to indicate that, given a desired power spectral density, we could build the underlying discrete-time random process simply by running white noise through a filter with the appropriate magnitude response. In fact, a fundamental result in statistics, called Wold Representation Theorem, proves that \textit{any} stationary process $\mathbf{y}$ can be obtained from a white noise sequence $\mathbf{w}$ as
\[
y[n]=\sum_{k=0}^{\infty} b[k] w[n-k],
\]
that is, as the convolution between a white noise process and a \textit{causal} sequence of coefficients $\mathbf{b}$. The theorem is not constructive, in that it doesn't provide us with a way to compute the potentially infinite-support sequence $\mathbf{b}$; furthermore, there is no guarantee that $\mathbf{b}$ corresponds to the impulse response of a realizable filter. In practice, however, the representational power of rational transfer functions is sufficient to cover all cases of practical importance.
The ability to decompose an arbitrary stationary process as white noise plus a realizable filter is fundamental in two respects:
\begin{itemize}
\item for \textit{synthesis} purposes, when we want to generate realizations of a random process with a given autocorrelation (or, equivalently, with a given PSD)
\item for \textit{analysis} purposes, when we want to compactly model the spectral properties of a random process and gain insight on the underlying generating system. This is particularly useful in applications such as system identification and data compression, as we will see later.
\end{itemize}
In both cases, the hypothesis is that the random process is obtained by filtering white noise with a realizable filter with a rational transfer function of the form
\[
H(z)=\frac{B(z)}{A(z)}
\]
and the problem is to determine the coefficients of numerator and denominator. Two special cases are particularly important:
\itempar{MA processes:} when $A(z)=1$, $H(z)$ is an FIR filter and the resulting process is called a Moving Average process. From~(\ref{eq:rn:outputcorr}) we can easily determine that if the FIR filter has $M$ coefficients, the output autocorrelation will be finite-support as well with ony $2M+1$ coefficients:
\[
r_y[k]=0\mbox{ for } |k| > M.
\]
AR models are therefore useful to characterize random process with a short-range interdependence between samples.
\itempar{AR processes:} when $B(z)=1$, $H(z)$ is an IIR filter with poles only, and the resulting process is called a Auto-Regressive process. All-pole filters are particularly suitable to modeling physical systems, where the spectral characteristics are determined by natural \textit{resonances} (think about musical instruments or the human voice). The autocorrelation in these cases is a symmetric, two-sided infinite sequence, as we can easily see when $\mathbf{h}$ in~(\ref{eq:rn:outputcorr}) is an infinite impulse response.