tsividis.tex
No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Thu, Mar 13, 07:01

tsividis.tex
View Options

	\documentclass[12pt,a4paper]{article}

	\usepackage{makeidx}
	\usepackage{amsmath,amssymb,latexsym}
	\usepackage{fleqn}
	\usepackage[usenames]{color}
	\usepackage{epsfig,graphpap,subfigure,afterpage,lscape}
	\usepackage{pstricks,pst-node,pst-plot,pstricks-add,pst-3dplot}
	\usepackage{graphicx}
	\usepackage[utopia]{mathdesign}
	\usepackage{rotating}
	\usepackage{dspTricks}


	%\renewcommand{\baselinestretch}{1.5}
	\setlength\parindent{0pt}
	\setlength\parskip{1em}

	% main index entry
	\newcommand{\mie}[1]{{\bf #1}}
	\makeindex

	\def\hyph{-\penalty0\hskip0pt\relax}


	\begin{document}

	\title{Tsividis' Paradox}
	\author{Paolo Prandoni and Martin Vetterli}
	\date{}
	\maketitle

	abstract here

	\section{Introduction}

	In English, we use the word ``research'' mostly to describe an intellectual endeavor aimed at the discovery of new scientific facts; etymologically, however, the origin of the word is in the French \textit{rechercher}, an activity naturally matched by the act of \textit{retrouver}, that is, by finding back something that had gone missing. If the prize of a \textit{recherche} is the relief of reuniting with an item we had lost, it is understandable that in this last difficult year the comfort of rediscovery played no small role in our academic conversations; the tale that follows is, we believe, a example worth sharing of how a simple question took us on a fascinating journey across more than 60 years of signal processing.

	The story begins with a simple yet intriguing observation made by Iannis Tsividis (when?) apropos of a standard A/D conversion system such as the one shown in Figure~\ref{fig:ad}. The system features the cascade of an ideal sampler and a memoryless quantizer at $R$ bits per sample, and the input signal is assumed bandlimited to half the sampling rate so that no aliasing occurs. As any textbook will tell you, under rather reasonable conditions we can model the distortion introduced by the quantizer as an additive independent white noise source, yielding a signal-to-noise ratio at the output of $6R$~dBs. Now, here's where the story gets interesting: since both sampler and quantizer are memoryless devices, their order can be interchanged as shown in Figure~\ref{fig:ad2} with no effect on the output. The quantized analog signal $x_q(t)$, however, with its abrupt jumps between quantization levels, is clearly no longer bandlimited; this implies that at least part of the error introduced by the A/D converter is in fact due to aliasing, in paradoxical contrast to the premise of a bandlimited input signal.
	% which was believed not to play a part in the analysis because of the bandlimitedness of the input --- hence the paradox.
	Since distortion due to aliasing is nothing like white noise, this should raise all sorts of red flags when it comes to audio applications; and, indeed, Tsividis' shared his observation after hearing some very disturbing artifacts in a digital recording of Ravel's \textit{Bolero} --- but first things first.



	\subsection{Uniform quantization in a nutshell}
	In the rest of this story we will consider memoryless uniform scalar quantizers exclusively, so let's quickly review some of their fundamental properties. The two defining parameters are the number of quantization levels $M$ and the expected operating range for the input, called the \textit{non-overload} region, which for simplicity we assume to be the $[-1,1]$ interval. The quantizer implements a function $q: \mathbb{R} \rightarrow \{\hat{x}_0, \hat{x}_1, \ldots, \hat{x}_{M-1}\}$ that maps the real axis to a set of $M$ quantization values; it does so by dividing the non-overload region into $M$ contiguous bins, half closed on the left and all of equal size $\Delta = 2/M$, and by returning, for each input value, the midpoint of the bin that the value belongs to. If the input falls outside of the non-overload region, it will be mapped to the closest extremal quantization level, either $\hat{x}_0$ or $\hat{x}_{M-1}$. When $M$ is even, the quantizer is called a \textit{mid-riser} and its quantization function can be expressed as
	\begin{equation}
	q(x) = \Delta \left( \left\lfloor \frac{x}{\Delta} \right\rfloor + \frac{1}{2} \right)
	\end{equation}
	as shown for example in Figure~\ref{fig:q4}; note that a mid-riser will map a zero input to the first positive quantization level. When $M$ is odd, the quantizer is called a \textit{deadzone quantizer} and its characteristic becomes
	\begin{equation}
	q(x) = \Delta \left\lfloor \frac{x}{\Delta} + \frac{1}{2} \right\rfloor
	\end{equation}
	as shown in Figure~\ref{fig:q3}; deadzone quantizers are so called because input values close to zero are mapped to zero.

	Given a discrete-time signal $x[n]$ and its quantized version $\hat{x}[n]$ we can write
	\[
	\hat{x}[n] = x[n] + e[n]
	\]
	where $e[n]$ is the distortion introduced by the quantizer. Since the quantization function is nonlinear, a precise analysis of the error signal is extremely difficult and the most common approach attempts to simplify the problem by modeling $e[n]$ as an independent, additive white noise process. This linearized setup is obviously a bit of a stretch since the quantized values are a deterministic function of the input, but it does yield accurate enough results provided that the following \textit{high resolution hypotheses} are met:
	\begin{itemize}
	\item the input signal can be considered an i.i.d.\ process with a smooth probability density function $f_x$;
	\item the signal's range matches the non-overload region;
	\item the number of quantization levels is large;
	\item the signal is ``live'', in the sense that successive samples fall into different quantization bins.
	\end{itemize}
	Under these assumptions, it becomes plausible to believe that $e[n]$ forms a sequence of independent random variables taking values over the $[-\Delta/2, \Delta/2]$ interval; with the additional hypothesis that $f_x$ is uniform over the non-overload region, we obtain the well-known result for the power of the quantization error, namely $\sigma_e^2 = \Delta^2/12$, as well as the classic ``$6$~dB per bit'' rule of thumb for the SNR.


	\subsection{The four horsemen of the audiocalypse}

	When it comes to audio fidelity there are four major enemies to look out for, described here in order of increasing annoyance. Firstly, \textit{additive noise}\/ is just a fact of life and every analog link in the chain from recording to reproduction will contribute a bit; all-digital recordings (DDD) manage to eliminate most of the intermediate noise sources but in the end it is fair to say that the SNR of most home-audio amplifiers will most certainly be lower than 96~dB. In other words, unless you have a spare anechoic room in your home fitted with a very expensive stereo system, any background noise you hear will in fact originate in your amplifier. Next in line is \textit{linear distortion}, caused by the overall frequency response of the audio chain when considered as a linear time-invariant filter. In general all audio systems allow for at least some form of rudimentary equalization, so that a non-unitary frequency response is not necessarily a sign of trouble; rather, linear distortion is the difference between the desired and the actual response of the system and here the solution is simply to ``throw money at the problem'': more expensive equipment will perform better.

	The real complications begin once we take into account the unavoidable nonlinearities in the equipment. On the purely analog front, the concept of \textit{harmonic distortion}\/ describes the effects of an instantaneous nonlinearity on a pure sinusoidal input, namely, the appearance of spurious partials at integer multiples of the input's frequency. A common metric in this case is the Total Harmonic Distortion (THD), which expresses the power of the higher-order harmonics as a percentage of that of the fundamental; THD is usually frequency-specific and normally manufacturers provide THD values for a representative set of input test signals. If the input signal is $x(t) = \sin(2\pi f_0 t + \theta)$, the output after a nonlinearity $r$ will still be periodic with period $1/f_0$ and thus expressible as the Fourier series
	\[
	r\left(x(t)\right) = \sum_{k=-\infty}^{\infty} c_k\, e^{-j2\pi f_0 k t};
	\]
	the total harmonic distortion is thus
	\begin{equation} \label{eq:thd}
	\mathrm{THD} = \sqrt{\frac{\sum_{k > 1} \|c_k\|^2}{\|c_1\|^2}}
	\end{equation}
	where we have used only the coefficients with positive index since the spectrum is symmetric in magnitude for a real-valued input. As an interesting example, consider an ideal full-wave rectifier with characteristic $r(x) = \mathrm{sgn}(x)$, which would transform a unit-amplitude sinusoidal input into a square wave; from the Fourier series expansion of the latter,
	\[
	r\left(\sin(2\pi f_0 t)\right) = \frac{4}{\pi}\sum_{k = 1}^{\infty}\frac{1}{2k-1}\sin(2\pi(2k-1) f_0 t),
	\]
	we have that in this case the THD is
	\begin{equation} \label{eq:thdsw}
	\mathrm{THD} = \sqrt{\sum_{k = 2}^{\infty}\left(\frac{1}{2k-1}\right)^2} = \sqrt{\frac{\pi^2}{8}-1} \approx 0.48.
	\end{equation}
	Trying not to spoil the final denouement of this story: yes, a rectifier is just a two-level quantizer and so, in the discrete-time case, some of those partials will be aliased back and, from harmonic, will become non-harmonic... Which leads us to the last villain in the lineup, non-harmonic distortion. Typically this refers to what, in the trade, is called intermodulation (IMD): when two or more sinusoids are passed through a nonlinearity, the result is a signal containing frequencies at multiples not only of the original frequencies but also of their sum and differences.




	\subsection{Ravel's Bolero}

	Far from being a simple mathematical curiosity, this different interpretation of quantization distortion is particularly relevant in audio applications: because of its non-harmonic nature, aliasing is exceedingly grating to the ear, and certainly more noticeable than additive noise or other forms of harmonic distortion.\footnote{
	Side panel on harmonic vs non-harmonic distortion.}
	Tsividis' observation was indeed spurred by listening to a particular infelicitous release of Ravel's Bolero during the early days of the compact disk era and, to understand why Bolero, we need to consider the particular challenges that the piece presents to an A/D converter. The composition is famous for its slow, steady \textit{crescendo}\/ involving a relatively simple melodic theme repeated 17~times. The theme is first played \textit{pianissimo}\/ a by a single flute and, with every repetition, more and more instruments join in until, in the end, the last bars call for an orchestral \textit{tutti} playing \textit{fortissimo}, as shown in Figure~\ref{fig:bolero}. In a live performance, the resulting dynamic range exceeds 100dB, which is larger than what 16-bit quantization can handle (as per the $6R$-dB rule). This is usually addressed by using some form of dynamic range compression during mastering but, in any case, the waveform amplitude during the initial exposition of the theme will be much smaller than in the final, full-scale orchestral blast.\footnote{
	Orelob}
	From the point of view of A/D conversion, the input range at the beginning of the piece will only use a few quantization steps around zero and so the previously ``reasonable'' conditions for quantization noise modeling no longer hold. The noise will now

	that allow us to model quantization noise as independent additive noise no longer hold and quantization noise will sound like aliasing, i.e., really disturbing.


	must be deployed in order to avoid clipping; Figure~\ref{fig:bolero_wav} shows the typical waveform of a full performance
	Regardless of mastering strategy, however, the excursion in dynamics is such that the initial theme spans a much smaller number of quantization levels than a full-scale signal and therefore
	high resolution hypothesis may no longer holds. We can verify this phenomenon by exacerbating the problem via a coarse re-quantization of different Bolero segments.




	\section{Empirical Investigation}

	Thanks to the vast amounts of computing power provided even by the most standard-issue laptop, it is just so convenient to start looking at a problem in numerical terms first, to get some intuition and perform a few sanity checks. This is all the more true when it comes to audio applications, since we can easily generate sample files and listen to them. This is exactly how we started, and this part of the story is probably best enjoyed together with the companion Python notebook available at ...

	\subsection{Problem setup}

	Consider again the A/D converter in Figure~\ref{} and let the input signal be a continuous-time sinusoid with frequency $f_0 < F_s/2$; assume for now that $f_0$ is a fractional multiple of the sampling frequency, that is, $f_0 = (A/B)F_s$ with $A$ and $B$ coprime positive integers. The discrete-time sequence $\mathbf{x}$ produced by the ideal sampler is thus of the form
	\[
	x[n] = \sin\left(2\pi\frac{A}{B}n\right),
	\]
	which is periodic with period $B$ and which spans $A$ cycles over $B$ samples. The natural Fourier representation for this signal is its discrete Fourier series (DFS), namely a vector $\mathbf{X}\in \mathbb{C}^B$ whose elements are
	\[
	X[k] = \sum_{n=0}^{B} x[n] e^{j\frac{2\pi}{B} nk};
	\]
	since the signal is real-valued, $\mathbf{X}$ is hermitian-symmetric and so we need only consider its first $\lceil B/2 \rceil$ values. For a sinusoidal input, the only nonzero value in the first half of the DFS is $X[A]$, with magnitude equal to $B/2$. After quantization, the discrete-time sequence $\hat{\mathbf{x}}$ is still $B$-periodic but its DFS $\hat{\mathbf{X}}$ is potentially nonzero everywhere because of distortion. Following~(\ref{eq:thd}), we compute the total harmonic distortion as
	\begin{equation}
	\mathrm{THD} = \sqrt{\sum_{k = 2}^{P} \frac{\|\hat{X}[kA]\|^2}{\|\hat{X}[A]\|^2} }, \qquad P = \lfloor \lceil B/2 \rceil / A \rfloor
	\end{equation}
	whereas, for the non-harmonic distortion, we will use the following metric, which measures the power of the largest non-harmonic spectral line with respect to the power of the fundamental:
	\begin{equation}
	\mbox{NHD} = \max_{\substack{k = 1,\ldots, \lceil B/2 \rceil \\ (k \mod A) \neq 0}} \frac{\|\hat{X}[k]\|}{\|\hat{X}[A]\|}.
	\end{equation}
	Experimentally, both metrics are computed over the FFT of a $B$-point quantized data vector.


	\subsection{Numerical results}

	We want to investigate how the NHD varies as a function of the $f_0/F_s$ ratio and for this we iterate the FFT-based peak-finding algorithm over a large set of coprime integer pairs $(A, B)$ such that $A/B < 1/2$. The set is easily obtained from the \textit{Farey sequence}\/ of order $B$~\cite{}, namely, the sequence of all completely reduced fractions over the unit interval whose denominator is less than or equal to $B$. The algorithm to compute a Farey sequence is surprisingly simple~\cite{} and we only need a trivial modification to ensure that the produced fractions are between zero and $1/2$. The number of elements in a Farey sequence grows asymptotically as $B^2$ and so exploring a dense set of $f_0/F_s$ ratios can quickly become time-consuming.



	iterate the peak-finding algorithm over a large number of . The is called the ; the number of elements in the set grows asymptotically as $N^2$




	, is sampled at a rate $F_s$ and quantized over $M$ levels. A




	The simulation setup is as follows: the
	yielding the discrete-time


	and assume for now that. The discrete-time sinusoids will be of the form
	over $M$ levels,


	afforded by even a simple laptop nowadays is that it's very easy to start a line


	\section{Authoritative Answers}

	\subsection{Some SP archeology}


	\subsection{Full analysis of quantization noise}

	Robert Gray's 1990 paper \textit{Quantization Noise Spectra}~\cite{https://ieeexplore.ieee.org/document/59924} provides a complete, in-depth mathematical analysis of the distortion introduced by quantization, this time from a discrete-time perspective. In the case of uniform quantization, Gray generalizes Clavier's approach by considering the expression for the \textit{normalized}\/ quantization error of a uniform $M$-level quantizer
	\[
	\eta(x) = \frac{q(x) - x}{\Delta} = \frac{q(x) - x}{2/M}.
	\]
	For $x$ spanning the non-overload region, $\eta(x)$ looks like the examples in Figure~\ref{fig:eta}, namely, it is a a reverse sawtooth function with period $M/2$ and amplitude $1/2$; as such, it can be easily expressed (\textit{a la}\/ Clavier) via the Fourier series
	\[
	\eta(x) = \sum_{k=1}^{\infty} \frac{(-1)^{kM}}{\pi k}\sin\left(\pi k M x\right)
	\]
	where the term $(-1)^{kM}$ is identically one for mid-riser quantizers and alternates in sign for deadzone quantizers. In fact, it is more convenient to write out the series using complex exponentials, so let's use
	\begin{equation} \label{eq:etafs}
	\eta(x) = \sum_{k \neq 0} \frac{(-1)^{kM}}{j2\pi k}e^{j\pi k M x}.
	\end{equation}

	Assume now that the input to the quantizer is a pure discrete-time sinusoid of the form $x[n] = \sin(\omega_0 n + \theta)$, with $0 \le \omega_0 < 2\pi$; the resulting normalized quantization error is the sequence $\epsilon[n] = \eta(x[n])$ and we are interested in computing its spectral representation. If we replace $x$ by $x[n]$ in~(\ref{eq:etafs}) we end up with terms of the form $e^{j \alpha \sin \beta}$ and these can be expanded as a sum of Bessel functions using the so-called Jacobi-Anger formula:
	\[
	e^{j \alpha \sin \omega} = \sum_{m=-\infty}^{\infty} J_m(\alpha)e^{j\omega m}.
	\]
	With this, and using the fact that Bessel functions are even or odd in accordance to their order, we have:
	\begin{align*}
	\epsilon[n] = \eta(\sin(\omega_0 n + \theta)) &= \sum_{k \neq 0} \frac{(-1)^{kM}}{j2\pi k}e^{j\pi k M \sin(\omega_0 n + \theta)} \\
	&= \sum_{k \neq 0} \frac{(-1)^{kM}}{j2\pi k} \sum_{m=-\infty}^{\infty} J_m(\pi k M)e^{j (2m+1)\theta} e^{j (2m+1)\omega_0 n} \\
	&= \sum_{m=-\infty}^{\infty} \left[ e^{j (2m+1)\theta} \sum_{k = 1}^{\infty} \frac{(-1)^{kM}}{j\pi k}J_{2m+1}(\pi k M) \right] e^{j (2m+1)\omega_0 n} \\
	&= \sum_{m=-\infty}^{\infty} c_M(m, \theta) e^{j (2m+1)\omega_0 n}
	\end{align*}
	The expression shows that the normalized error is the sum of a set of weighed complex exponentials and, if we collect the terms whose frequencies are indistinguishable under aliasing, we can write
	\begin{equation}
	\epsilon[n] = \sum_{\varphi \in \Omega(\omega_0)} b(\varphi) e^{j \varphi n}
	\end{equation}
	where
	\begin{itemize}
	\item the set of frequencies $\Omega(\omega_0)$ is defined by the equivalence class $\{(2m+1)\omega_0 \mod 2\pi\}_{m \in \mathbb{Z}}$; this set includes all the odd multiples of the fundamental frequency (that appear in Clavier's analysis), aliased over the $[0, 2\pi]$ interval;

	\item for each frequency $\varphi \in \Omega(\omega_0)$ the magnitude of the relative complex exponential is given by
	\[
	b(\varphi) = \sum_{m \in I(\varphi)} \left[ e^{j (2m+1)\theta} \sum_{k = 1}^{\infty} \frac{(-1)^{kM}}{j\pi k}J_{2m+1}(\pi k M) \right] \label{eq:vector_ray}
	\]
	where $I(\varphi)$ is the set of integers $m$ for which $(2m+1)\omega_0 \equiv \varphi \mod 2\pi$.
	\end{itemize}
	This representation allows us to express the Power Spectral Density of the normalized quantization error for a full-range sinusoidal input as
	\begin{equation} \label{eq:psd}
	P(\omega) = \sum_{\varphi \in \Omega(\omega_0)} \|b(\varphi)\|^2 \delta(\omega - \varphi).
	\end{equation}

	In the experimental setup in Section~\ref{sec:exp} we looked at sinusoidal inputs with frequency a rational multiple of the sampling frequency; in the present discrete-time analysis, that is equivalent to assuming $\omega_0 = 2\pi(A/B)$ with $A$ and $B$ coprime positive integers such that $A/B < 1/2$. Using modular arithmetic, it can be shown that in this case the number of distinct frequencies in~(\ref{eq:psd}) is finite and given by
	\[
	\Omega\left(2\pi\frac{A}{B}\right) = \left\{\frac{2i\pi}{B}\right\} \quad \mbox{with } \begin{cases}
	i = 0, 1, 2, \ldots, B-1 & \mbox{if $A$ or $B$ even} \\
	i = 1, 3, 5, \ldots, B-1 & \mbox{otherwise}
	\end{cases}
	\]
	and
	\[
	I\left(\frac{2i\pi}{B}\right) = \{i[A]^{-1}_{B} + pB\}_{p \in \mathbb{Z}}
	\]

	where $[A]^{-1}_{B}$ is the modular inverse of $A$, i.e. $A[A]^{-1}_{B} \equiv 1 \mod B$. For instance, for $B=8$ and $A=3$, the values that yielded the largest non-harmonic peak in our numerical simulations, we obtain $\Omega = \{0, 1, 2, 3, 4, 5, 6, 7\}$ and
	\begin{align*}
	I(0) &= \{\ldots, -32, -24, -16, -8, 0, 8, 16, 24, 32, \ldots\} \\
	I(1) &= \{\ldots, -37, -29, -21, -13, -5, 3, 11, 19, 27, 35, \ldots\} \\
	I(2) &= \{\ldots, -34, -26, -18, -10, -2, 6, 14, 22, 30, 38, \ldots\} \\
	\ldots \\
	I(7) &= \{\ldots, -35, -27, -19, -11, -3, 5, 13, 21, 29, 37, \ldots\}
	\end{align*}



	\section{Conclusion}


	\bibliographystyle{unsrt}
	\bibliography{biblio}


	\end{document}

tsividis.texNo OneTemporaryActions

File Metadata

tsividis.texView Options

Event Timeline

tsividis.tex
No OneTemporary
Actions

tsividis.tex
View Options