Page MenuHomec4science

00-intro.tex
No OneTemporary

File Metadata

Created
Thu, Mar 13, 13:16

00-intro.tex

% load and display generic image file
\def\showImage#1{%
\begin{dspCP}[xticks=none,yticks=none]{0,255}{0,255}%
\dspImageFile{#1.eps}
\end{dspCP}}
% load and display 2D-DFT basis function image file and set labels
\def\basisImage#1#2{%
\begin{dspCP}[xticks=255,yticks=255,xout=true,xlabel={$k_1=#2,k_2=#1$}]{0,255}{0,255}%
\dspImageFile{2DDFT-#1-#2.eps}%
\end{dspCP}}
\setcounter{chapter}{12}
\chapter{Image Processing}
\label{ch:image}\label{ch:ip}
So far we have only encountered discrete-time signals that are ordered sequences of scalar values indexed by an integer variable $n \in \mathbb{Z}$. This type of signal models very well a very large number of physical frameworks where the data unfolds ``chronologically'' as a function of time. There are however many other situations where a data set is best represented by a function of a multidimensional index. The everyday example is provided by digital images, which can be modeled as a grayscale or color intensity signal over a set of points on a two-dimensional planar grid. Otherwise stated, an ``image signal'' is a set of values (the {\em pixels}), which are ordered in space rather than in time; the corresponding signal can be written as $x[n_1, n_2]$, with $n_1$ and $n_2$ denoting the horizontal and vertical discrete coordinates.
Note that, although in the rest of this Chapter we will concentrate on images, i.e. on two-dimensional signals, higher-dimensional signals do exist and are indeed quite common. Video, for instance, is a sequence of still pictures; as such it can be modeled as a signal defined over a triplet of indices, two for the image coordinates in each video frame and one for the frame number. Another class of three-dimensional signals emerges from volume rendering, where the data represents units of volume (or {\em voxels}) in three-dimensional space. Four dimensions naturally appear in time-varying volume rendering applications common in biomedical imaging. Most of the elementary concepts developed for one-dimensional signals admit a ``natural'' extension to the multidimensional domain, in the sense that the extension performs according to expectations; a multidimensional Fourier transform, for instance, decomposes a multidimensional signal into a set of multidimensional oscillatory basis functions. As the number of dimensions grows, however, intuition begins to fail us and high-dimensional signal processing quickly becomes extremely abstract (and quite cumbersome notation-wise); as a consequence, in this introductory material, we will focus solely on two-dimensional signals in the form of images.
Finally, a word of caution. With respect to 1D signal processing, image processing often appears less formal and less self contained; the impression is not altogether unfounded and there are two main reasons for that. To begin with, images are very specialized signals, namely, they are signals ``designed'' for the unique processing unit that is the human visual system\footnote{For instance, if humans had bat-like ranging capabilities, 2D signals wouldn't be so special and 3D depth maps would be all the rage.}. With some rare exceptions such as barcodes or QR-codes, in order to make sense of an image one needs to deploy a level of semantic analysis that profoundly transcend the capabilities of the standard signal processing toolbox; although linear processing is an almost universal first step, it so happens that a handful of simple tools are all that's necessary in the vast majority of cases. The second reason, which is not at all unrelated to the first, is that Fourier analysis does not play a major role in image processing. Most of the information in an image is encoded by its edges (as we learn in infancy from coloring books), but edges represent signal discontinuities that, in the frequency domain, affect mostly the global phase response and are therefore hard to see. Also consider this: a spectral magnitude is an invaluable {\em visual} tool to quickly establish the properties of a 1D signal; but images are {\em already} visual entities, and their spectra does not represet their information in a more readable manner.
\section{Preliminaries}
An $N_1\times N_2$ digital picture is a collection of $M=N_1 N_2$ values, called {\em pixels}\footnote{From {\em pic}ture {\em el}ement. For an interesting historical account of the word see ``A Brief History of Pixel'', by Richard F. Lyon, 2006.}, usually arranged in a regular way to form a rectangular image. Scalar values for the pixels give rise to grayscale images, with each pixel's value normally constrained between zero (black) and some maximum value (white). The most commom quantization scheme allocates 8~bits per pixel, which gives 256~levels of gray to choose from; this is adequate in most circumstances for standard viewing.
To encode color images, we first need to define a {\em color space}; the popular RGB~encoding, for instance, represents visible colors as a weighed sum of the three primary colors red, green and blue, so that each color pixel is a triple of (quantized) coordinates in RGB space. Many other color spaces exist, each with their pros and cons, but since colorimetry (the theory and practice of color perception and reproduction) is a very complete and rich discipline unto itself, we will not attempt even a brief summary here. For all practical purposes we will simply consider a color image as a collection of $C$ superimposed scalar images, where $C$ is the number of coordinates in the chosen color space. For 24-bit RGB images, for instance, the red, green and blue images will be handled as three independent 8-bit grayscale images.
\subsection{Pixel Arrangement}
From the point of view of its information content, a grayscale $N_1\times N_2$ image is just a collection of $M=N_1N_2$ scalar values; consequently, the image could be ``unrolled'' into a standard one-dimensional $M$-point signal in $\mathbb{R}^{M}$ and we could revert to the ``classical'' signal processing techniques of the previous chapters. And, as a matter of fact, there are several applications that do precisely that: the storage or the transmission of raw image, for instance, involves a {\em scanning}\/ operation that serializes the set of image pixels into a 1-D signal. Normally, the serialization takes place in a row-by-row fashion, i.e., the image is converted to a {\em raster scan} format. Classic reproducing devices such as printers or cathode-ray tubes perform the inverse operation by stacking the 1-D data back into rectangular form one row at a time. Nevertheless, in ``true'' image processing applications, we will always use a 2-D representation indexed by two distinct coordinates; the fundamental concept here is that images possess \emph{local correlations} in the two-dimensional spatial domain and that those correlations would be lost in the scanning process. As a simple example imagine a $N\times N$ black image with a single white line. A row-by-row raster scan would produce very different length-$N^2$ 1D signals according to the line's orientation: if the line is vertical, we would obtain an extremely sparse signal where only one every $N$ samples is nonzero; a horizontal line, on the other hand, would produce a signal in which only a block of $N$ contiguous pixels are nonzero; other orientations would produce intermediate results. It's clear that the spatial structure of this simple image is completely obfuscated by a one-dimensional representation and that's why fully two-dimensional operators are necessary.
Normally pixels are arranged on a regular two-dimensional array of points on the plane. Although different arrangements are possible (each corresponding to different {\em point lattices}\/ in the plane), here we will consider only the ``canonical'' grid associated to the regular lattice $\mathbb{Z}^2$, i.e. each pixel will be associated to a pair of integer-valued coordinates.
%Figure~\ref{} shows three such standard arrangements, of which the first one is doubtlessly the simplest and the most common; all three arrangements belong to the set of. In general, a planar point lattice is defined as such: take a $2\times 2$ nonsingular {\em generating matrix} $\mathbf{L}$ and consider all integer-valued linear combinations of its columns. Formally, the coordinates of all points in the lattice are the set
%\[
% \Lambda_{\mathbf{L}} = \{\mathbf{L}\mathbf{n}, \quad \mathbf{n} \in \mathbb{Z}^2\}
%\]
%where $\mathbf{n} = [n_1 \, n_2]^T$ is a generic coordinate pair on the plane.
%indexed by two spatial coordinates. Normally (and in the following we will stick to this model) the coordinates belong to the regular lattice $\mathbb{Z}^2$, i.e. they are drawn from a grid of regularly spaced points in the plane, each point corresponding to a pair of integer-valued coordinates. Although the regular integer grid is the most common arrangement of points in a two-dimensional signal, other configurations are possible
\subsection{Visualization}
Since images can be modeled as real-valued functions on the plane, a formal graphical representation would require a three-dimensional plot such as the one in Figure~\ref{Gauss3D}, which is a natural extension of the Cartesian plots of 1D signals as functions of discrete time. Of course this kind of representation is useful primarily for ``abstract'' 2D sequences, i.e. 2D signals that do not encode semantically intuitive visual information; examples of the latter include 2D impulse and frequency responses and signal transforms. Figure~\ref{Gauss3D}, for example, depicts a portion of a Gaussian impulse response used in image blurring (see Section~\ref{blurring}); since the impulse response is obtained from a 2D function sampled on the grid, it makes sense to represent the signal in such an abstract fashion.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}
\begin{center}
\begin{dspCP}[xout=true]{0,511}{0,511}%
\dspImageFile{baseImage}%
\end{dspCP}
\end{center}
\caption{A prototypical image, shown as such.}\label{TheImage}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
``Normal'' images, on the other hand, are usually represented by plotting a graylevel dot at each pixel location, where the shade of grey encodes the pixel's value; if the dots are closely spaced, the eye automatically interpolates the underlying pointwise structure and the plot gives the illusion of a ``continuous'' image\footnote{Of course, Pop artists (and Roy Lichtestein in particular) hit a gold mine when they realized that dots could be spaced far apart...}; perceived closeness is of course also dependent on viewing distance. A standard example of this ``pictorial'' representation of images is shown in Figure~\ref{TheImage}\footnote{For a nice and compact history of the ``Lena'' image please see {\tt http://www.lenna.org/}; in spite of potential controversy, the author's feeling is that Lena is really the Mona Lisa of image processing, that is to say, a universally recognized icon that completely transcends its earthly begetting. As such, Lena is indeed quite perfect for a short introduction to image processing.}. An important point to notice is that, since the range of gray levels available on print (or on a screen) is finite, most images will require a level adjustment in order to be readable. In fact, this is no different than rescaling a fixed-size axis in one-dimensional plots; in pictures, the representable range is normally standardized between zero, which is usually mapped to black, and one, which is mapped to white. Unless otherwise noted, all images in the rest of the chapter and scaled and shifted so that they occupy the full grayscale range. In other words, the represented signal is
\[
y[n_1, n_2] = (x[n_1, n_2] - m)/(M-m)
\]
where $m = \min\{x[n_1, n_2]\}$ and $M = \max\{x[n_1, n_2]\}$. In Matlab, this rescaling is performed automatically when using the {\tt imagesc} command.
A third, intermediate graphical option is given by ``support plots''. In these plots, we revert to a Cartesian viewpoint but we look at the 3D space ``from the top'', so to speak, and we only represent the nonzero values of a signal. The result is a 2D plot where the nonzero pixels appear as dots on the plane and examples can be seen in Figure~\label{basic2D}. This representation focuses on the support of a signal and is particularly useful in describing finite impulse responses and other algorithmic techniques.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}
\begin{center}
\psset{unit=2.4mm}
\begin{pspicture}(-5,-5)(5,15)
\psset{Beta=15}
\def\var{5 }
\psplotThreeD[hiddenLine=true,plotstyle=curve,drawStyle=yLines,% is the default anyway
yPlotpoints=29,xPlotpoints=29,linewidth=1pt](-14,14)(-14,14){%
x x mul y y mul add -2 \var \var mul mul div 2.71828 exch exp 10 mul}
\pstThreeDCoor[linecolor=darkgray,xMin=-15,xMax=16,nameX={$n_1$},yMin=-15,yMax=16,nameY={$n_2$},zMin=0,zMax=13,nameZ={$h[n_1,n_2]$}]
\end{pspicture}
\end{center}
\caption{Example of two-dimensional Gaussian impulse response plotted as a 3D graph.}\label{Gauss3D}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Event Timeline