precomp-contrib-pmap-techreport.tex
No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Tue, Jul 16, 05:46

precomp-contrib-pmap-techreport.tex
View Options

	\documentclass[a4paper,11pt]{report}
	\usepackage{color,graphicx,pslatex,rcs,gensymb,bigints,listings,rotating,
	listings
	}
	\usepackage[colorlinks,urlcolor=red,bookmarks=false,breaklinks=false]{
	hyperref
	}
	\usepackage[skip=0.5em]{subcaption}
	\usepackage[onlyrm,veryoldstyle]{kpfonts}
	%\usepackage[firstpage,stamp]{draftwatermark}
	\usepackage[T1]{fontenc}


	% Reduce margins
	\addtolength{\oddsidemargin}{-1.5cm}
	\addtolength{\evensidemargin}{-1.5cm}
	\addtolength{\textwidth}{3cm}
	\addtolength{\topmargin}{-2cm}
	\addtolength{\textheight}{3cm}

	\setlength{\marginparwidth}{2cm}
	\usepackage{todonotes}

	\graphicspath{{figs/}}

	\definecolor{lstbg}{gray}{0.9}
	\definecolor{lstfg}{rgb}{0,0,0.5}
	\lstset{language=C, frame=single, backgroundcolor=\color{lstbg},
	% xleftmargin=5pt, xrightmargin=5pt,
	% aboveskip=20pt, belowskip=20pt,
	numbers=none, captionpos=b, basicstyle=\footnotesize\color{lstfg},
	breaklines=true, breakatwhitespace=true, showstringspaces=false,
	escapeinside={(@}{@)}
	}

	\renewcommand{\familydefault}{\sfdefault}
	\newcommand{\radiance}{\textsc{radiance}}
	\newcommand{\radClassic}{\textsc{radiance classic$^\textsf{\tiny TM}$}}
	\newcommand{\var}[1]{$\mathit{#1}$}
	\newcommand{\lit}[1]{\textit{#1}}
	\newcommand{\opt}[1]{\textbf{#1}}
	\newcommand{\cmd}[1]{\textit{\textbf{#1}}\/}
	\newcommand{\realLife}{\textsc{RealLife$^\textsf{\tiny TM}$}}

	\newcommand{\mkpmap}{\cmd{mkpmap}}
	\newcommand{\rcontrib}{\cmd{rcontrib}}
	\newcommand{\rtrace}{\cmd{rtrace}}
	\newcommand{\rcClassic}{\cmd{rcontrib classic$^\textsf{\tiny TM}$}}


	\RCS $Revision: 1.13 $
	\RCS $Date: 2023/06/06 15:51:20 $

	\fontfamily{jpk}\selectfont
	\title{The \radiance{} Precomputed Contribution Photon Map\\
	--- Technical Report ---\\[5mm]
	{\fontfamily{jkpvos}\selectfont\Large
	\textrm{
	\textit{%
	Being an Account of the final remarkable Enterprises=
	and desperate Actions=, \&c, undertaken by the shipwreck'd Crew
	of the founder'd CC~EASE,
	in which is= describ'd a most fantastick Design,
	viz. furnishing
	a compleat computed Day-Light, by the Season,
	to the RADIANCE Publick
	}
	}
	}
	}
	\fontfamily{jpk}\selectfont
	\author{
	Roland Schregle (roland.schregle@\{hslu.ch, gmail.com\})\\
	CC Building Envelopes and Civil Engineering\\
	Lucerne University of Applied Sciences and Arts
	}
	\date{
	Revision \RCSRevision\\
	\today
	}


	\begin{document}
	\maketitle

	\begin{abstract}
	The \rcontrib{} tool included with the \radiance{} lighting simulation
	and rendering software suite is useful for calculating daylight
	coefficients for climate-based modelling, notably to assess annual
	daylight availability under representative sky conditions.
	This requires ``binning'' contributions from grouped sky directions
	(``patches'') and sun positions. This can be a slow process as
	\rcontrib{} does not cache irradiance, resulting in redundant rays
	being traced for neighbouring sensor positions.

	\radiance{} includes a contribution photon mapping module to bin
	contributions by tracing rays from the light sources (forward
	raytracing), in contrast to \radiance's standard backward raytracing
	approach from the sensors (referred to here as \rcClassic). Photon
	mapping is particularly efficient to simulate redirection through
	complex shading devices using data-driven BSDFs, such as prismatic
	films.

	This technical report documents the further development of the
	\radiance{} contribution photon map to support precomputed
	contributions to reduce redundancy in \rcontrib.
	To this end, the contribution photon map utilises a wavelet
	compression and efficient coefficient encoding to compactly
	represent the large volume of data the contributions incur.
	The encoded coefficients are then paged on-demand from disk
	and cached to reduce the in-core footprint of the contributions
	in \rcontrib.
	This document focuses on the implementation details of
	the photon mapping software in support of this new functionality.
	\end{abstract}

	\tableofcontents


	% ---------------------------------------------------------------------------

	%\clearpage
	\chapter{Introduction}

	\section{Motivation}

	Daylight coefficients and contributions are used in \radiance{} workflows
	to simulate and analyse annual daylight availability subject to
	seasonal variations. This includes accumulating contributions or
	daylight coefficients (the latter corresponding to normalised
	contributions) from solar sources and sky ``patches'', the latter
	corresponding to discrete subdivisions of the incident hemisphere
	on the receiving surface. These coefficients or contributions are
	then propagated in the scene to assess predictions of indirect
	daylight availablity on work planes, for example.

	In a static scene, these daylight coefficients can be subsequently
	scaled a posteriori with
	discretised sky luminance vectors obtained either from empirical sky
	models or climate data measured at the proposed site.
	This process can be repeated as desired for a time-series of luminance
	distributions without the need to recalculate the coefficients, since
	these remain static along with the geometry. The resulting vector
	calculations can be easily parallelised on modern computing platforms
	for very large time-series, affording a high temporal resolution.
	Calculating the coefficients at sufficiently high resolution in the
	first place, can, however, be computationally expensive.

	\radiance{} uses the \rcontrib{} tool to accumulate contributions
	or coefficients from light sources as well as arbitrary objects. It
	accounts for the light transport paths contributed by a set of
	user-specified objects (identified by their modifiers), optionally
	discretising their incident directions into ``bins'', and outputs these
	either to one or more files, or to the console. In the context of contributions,
	the incident direction is that of the first ray emitted or scattered
	from an object with a modifier whose contributions are sought.

	The \radiance{} photon map was extended to support annual daylight
	simulations using contribution photons when it became part of the
	official \radiance{} software distribution
	\cite{schregle-techreport-2015}. Its primary purpose was to efficiently
	compute daylight coefficients for daylight redirecting components with
	predominantly specular reflection or transmission.
	Initial results using contribution
	photon mapping indicated -- unsurprisingly -- that the number of
	required photons to adequately predict lighting levels
	scales linearly with the number of
	timestamps / light sources \cite{schregle-cisbat-2015}.
	This in turn motivated the development of an \textit{out-of-core} photon
	map that maintains its photons entirely on disk, and
	dynamically loads subsets of photons on demand
	\cite{schregle-oocpmap-jbps-2016, schregle-techreport-2016}.
	By loading photons on demand, the resident memory footprint is reduced to
	those photons which actually contribute to the sensor points under
	consideration. By doing so, complex annual daylight simulations with
	large photon maps can be efficiently performed on commodity office PCs.
	Yay.

	This technical report supplements the user documentation found in the
	\radiance{} photon map manual \cite{schregle-pmapManual-2022}
	and \mkpmap{} and \rcontrib{} manpages. It is primarily intended
	for researchers interested in extending the code, and serves as
	primary documentation of the source code.


	% ---------------------------------------------------------------------------

	\section{Overview}

	An overview of the precomputed contribution photon map toolchain is shown
	in figure \ref{fig:overview}. It follows the general \radiance{} photon
	mapping workflow consisting of a \mkpmap{} preprocess to generate the
	photon map, and a modified \radiance{} tool which evaluates
	irradiance from the photons for a given set of virtual
	sensor positions (often arranged in a grid).

	\begin{figure}[p]
	\centering
	\includegraphics[width=\linewidth]{contribpmap-overview3-crop}
	\parbox{\linewidth}{%
	\caption{%
	\label{fig:overview}
	Overview of contribution photon mapping workflow.
	Green paths denote inputs (parameters and sensor positions),
	while red paths denote output. Light source contributions for
	modifier mod are binned using an
	$\left\lfloor\sqrt{nbins}\right\rfloor^2$
	Shirley-Chiu disk-to-square mapping, and wavelet compressed
	for a fraction \var{precomp} of photons by \mkpmap.
	These contributions are saved along with the corresponding
	photons in separate files for each modifier, grouped in a
	subdirectory under the parent photon map \var{pmapfile}. The
	photons and their precomputed contributions are subsequently
	paged on demand, uncompressed, and cached by \rcontrib,
	which passes the contributions to the standard contribution
	calculation used by \rcClassic.
	}
	}
	\end{figure}

	The photon map represents
	a precomputed radiance distribution which can be evaluated multiple
	times under the assumption that the geometry and lighting remain static.
	This aspect of photon mapping amortises the expense of precomputation,
	thus the payoff scales with the number of sensor positions. Contribution
	photon mapping is then significantly faster than ``classic'' \rcontrib{}
	in the presence of complex geometry and specular materials, giving rise
	to multiple scattering events. This effect is further amplified by the
	fact that the irradiance cache is disabled in \rcontrib.

	In precomputed contribution photon mapping mode, \mkpmap{} is designed
	to behave similarly to \rcontrib, and accepts a subset of its parameters to
	``bin'' (discretise) contributions based on their incident directions.
	The contributors are identified by their modifiers with the \opt{-m}
	option, but unlike \rcClassic, they are restricted to light
	sources only. The binned direction is the incident direction of a
	photon's first interaction with a scene object after emission from a
	contributing light source. This corresponds to the \textit{photon primaries}
	in the earlier contribution photon mapping implementation, although these
	are no longer stored as they are no longer evaluated in \rcontrib.

	The workflow in figure \ref{fig:overview} can be divided into the
	following stages, each comprising distinct components as indicated
	in the gray boxes:
	\begin{enumerate}
	\item Photon map generation with \mkpmap:
	This precomputes the contribution photons and compresses them
	for use by \rcontrib. Under the hood, \mkpmap{} performs
	the following steps:
	\begin{enumerate}
	\item Distribution of \var{nphotons} photons as specified with
	the \opt{-apC} option.
	\item Binning of photon incident directions into \var{nbins} bins
	(as specified with the \opt{-bn} option) using a Shirley-Chiu
	disk-to-square mapping \cite{shirleyChiu-1997}. Technically,
	this already occurs on-the-fly during photon distribution.
	\item Precomputation of binned contributions for a fraction
	\var{precomp} of the distributed photons, as specified with
	the \opt{-apP} option (this option already served the same
	purpose for the existing precomputed global photon map).
	This involves performing a density estimate by locating the
	\var{bwidth} nearest photons in the vicinity of a precomputed
	photon, and accumulating their pre-binned contributions
	in the mapped Shirley-Chiu square.
	\item A wavelet transform over the binned contributions,
	resulting in a set of approximation and detail coefficients.
	\item Compression of wavelet detail coefficients by
	thresholding, i.e. keeping only the (\var{comp})\% most
	significant coefficients, as specified with the \opt{-apC}
	option.
	\item Compact encoding of thresholded wavelet coefficients
	using a customised 32-bit \textit{mRGBE} format similar to
	the RGBE format used for \radiance{} HDR files
	\cite{ward-RGBE-1994}, with an additional coefficient index
	to reconstruct the original bin ordering.
	\item Saving of the precomputed photons and their corresponding
	compressed wavelet coefficients; the previously distributed
	photons are no longer needed and discarded.
	\end{enumerate}

	\item The out-of-core precomputed contribution photon map on disk:
	This consists of the following files:
	\begin{enumerate}
	\item A \emph{parent} photon map file \var{pmapfile} as
	specified with the \opt{-apC} option.
	This file does not actually contain any photons, and merely
	serves as a header for \cmd{getinfo}.
	\item An option file \var{pmapfile.opt} containing the binning
	options used to precompute to contribution photons; this is
	passed to \rcontrib{} via its \opt{@} option to ensure consistent
	binning when evaluating the contributions.
	\item A subdirectory \var{pmapfile.rc} containing the actual
	precomputed contribution photon maps.
	\item Per-modifier \emph{child} photon maps \var{mod_i.pm}
	foreach modifier \var{mod_i} specified with the \opt{-m}
	option. These use the existing out-of-core photon map format
	\cite{schregle-techreport-2016} and include a companion
	\var{mod_i.pm.leaf} file containing the actual photon payload.
	\item Per-modifier compressed wavelet coefficients file
	\var{mod_i.wvt}, organised as aggregate records of fixed size
	per photon. The ordering is identical to that of the out-of-core
	leaf file to facilitate on-demand-paging.
	\end{enumerate}

	\item Evaluation of precomputed contributions with \rcontrib:
	\begin{enumerate}
	\item Lookup for single closest precomputed photon at each
	given sensor position, paging photons
	from the out-of-core leaf file as required. This returns
	a photon index $p$ corresponding to the photon's record number
	in the leaf file.
	\item Paging of the photon's associated mRGBE-encoded,
	compressed wavelet coefficients, which are located in the
	$p$-th record in the wavelet coefficient file.
	\item mRGBE decoding of the wavelet coefficients to floating
	point, plus coefficient index.
	\item Expansion of compressed wavelet coefficients by populating
	the coefficient matrix with the decoded coefficients in the
	ordering indicated by their decoded indices.
	\item Inverse wavelet transform to recover the original binned
	contributions in the Shirley-Chiu square.
	\item Caching of reconstructed contributions to hide latency
	incurred by paging and reconstruction. This cache is
	derived from that used by the existing out-of-core photon map,
	and contains \var{ncached} pages (as specified with the
	\opt{-aC} option), each containing a set of
	fully decoded binned contributions for a single precomputed
	photon.
	\item Passing of reconstructed contributions to \rcontrib's
	internal contribution lookup table, at which point control
	is handed over to \textit{Don Gregorio} to produce the usual
	binned contributions per sensor position according to the
	specified output format.
	\end{enumerate}
	\end{enumerate}

	Each stage and its constituents is detailed in section
	\ref{sec:implementation}.


	% ---------------------------------------------------------------------------

	%\clearpage
	\chapter{Proof of Concept}

	\section{Rationale and Prototyping}

	The design choices to implement the workflow in figure \ref{fig:overview}
	were borne out by the author's collective experience with wavelets.
	This dates back to a first exposure during a guest lecture on Wavelet
	Radiosity held by Prof. H.P. Seidel hosted by the University of Bonn in
	the mid-90s \cite{stamminger-waveletRadiosity-1995}. Further exposure
	followed as the 90s saw a flurry of activity in applying wavelets to
	computer graphics, notably the development of the lifting scheme
	\cite{sweldens-lifting-1996}, and the adaptation of wavelets to the
	spherical topology \cite{schroeder-sphWavelets-1995}.

	The latter publication inspired the author's first own application of
	wavelets in developing a 4D BSDF compression scheme as part of an
	``adjunct project'' (read: digression) to the development of the
	original photon map extension at Fraunhofer ISE.
	This unfinished work served as a proof of concept, but was shelved and
	never formally published. However, it was retrospectively
	presented at the \radiance{} workshop almost a decade later as part
	of the short-lived%
	\footnote{Extremely short-lived infact, since this was the only
	instalment!
	}
	lecture series, \lit{Secret Weapons of RADIANCE:
	Stuff That Never Took Off} \cite{schregle-bsdfComp-2011}.
	It was this initial, hands-on experience
	with wavelets that made the author aware of their inherent power as
	an analysis and compression tool.

	Some two decades later, the author's involvement in the SNSF-funded
	project \lit{Lightfields for glare assessment}
	rekindled his interest in wavelets.
	This was chiefly due to their use by his very talented
	colleague, Stephen Wasilewski, to adaptively sample spatio-temporal
	luminance distributions to represent a lightfield.
	This sampling scheme was based on the
	\cmd{pywavelets} Python module and led to the development of the
	\lit{Raytraverse} tool \cite{wasilewski-raytraverse-2021}.

	When a proposition was therefore made within the framework of the project
	to extend the existing contribution photon map with precomputed
	contributions, an application of wavelets was immediately apparent.
	The availablity of \cmd{pywavelets} as a convenient prototyping tool
	quickly prompted the development of a concept and initial experimentation
	to prove the feasibility of the proposition.

	A detailed account of the initial concept that led to the development
	of the precomputed contribution photon map can be found -- along with
	choice expletives documenting frustrating dead ends for a dose of realism
	-- in a separate proposal document
	\cite{schregle-preCompContribPmapProposal-2021}.

	It should be acknowledged that during the development of precomputed
	contributions, a number of existing -- and some \emph{very} old -- bugs
	were still discovered in various places around the photon mapping code.
	This work therefore not only extended the photon map's functionality,
	but also made the code more robust in general.


	% ---------------------------------------------------------------------------


	\section{Initial Compression Tests with \cmd{pywavelets}}
	\label{sec:waveletPoC}

	An initial back-of-the-envelope calculation made it clear that the
	storage required for precomputed contributions would be prohibitive
	even for current office workstations and laptops; 10M precomputed
	photons (a realistic number for a complex simulated environment) with
	32-bit RGBE-encoded contributions in 2305~bins (corresponding to a
	Reinhart~MF:4 mapping \cite{bourgeoisReinhart-2008}) would occupy
	92.2~Gb on disk.
	Note this calculation omits the storage occupied by the photons
	themselves, which would occupy a relatively insignificant 240 Mb.
	This simple example makes it clear that an effective compression scheme
	is needed.

	A quick statistical analysis of contributions from test scenes
	immediately obviated the notion of using a na\"ive run-length encoding of
	contiguous empty (zero) contributions. Instead, a wavelet compression
	scheme was favoured based on prior experience.

	To assess viable candidate compression techniques prior to hard-core
	C coding, preliminary tests were conducted with a simple Python prototype
	(\cmd{wavelet\_test.py}) using Filip Wasilewski's excellent
	\cmd{pywavelets} (\cmd{pywt}) package \cite{Lee2019}.
	The \rcontrib{} tool assumes a linear ordering of bins from
	what is essentially a 3D domain (hemisphere of incident directions)
	which, due to dimensional redundancy, can be reduced to a 2D
	domain ($\theta, \phi$), and in turn serialised to a 1D domain,
	as is the case with the popular Reinhart mapping.
	This is inherently a compromise, since the original topology,
	and therefore any multidimensional correlation in the contributions,
	is then lost.

	In an initial proof-of-concept, max-normalised scalar contributions from
	a uniform sky in a bilaterally lit test scene with skylight were
	binned using a Reinhart MF:4 mapping (2305 bins) and passed to a 1D
	wavelet transform. The resulting wavelet coefficients were then
	thresholded against a user-specified value to achieve what amounts to
	a lossy compression, effectively setting the thresholded coefficients
	to zero. Finally, the thresholded coefficients were reconstructed by
	applying the inverse wavelet transform, and their deviations assessed
	relative to the original.

	The initial results revealed that the type of wavelet used (and
	\cmd{pywt} offers many!) has a less significant impact on the
	reconstruction error compared to the linear ordering of the coefficients.
	In their original order, localised peaks in the linearised
	contributions were significantly attenuated with high compression
	(typically 90--98\% in these tests).
	Ordering the contributions by their value to obtain a monotonically
	increasing signal significantly reduced these artefacts, as it effectively
	imposes correlation on the input. Similar tests were also conducted
	with solar sources, which introduced higher dynamic range, more
	pronounced peaks, and major artefacts without sorting (see figure
	\ref{fig:solarcont-mf4}).

	\begin{figure}[htb]
	\centering
	\includegraphics[width=0.45\linewidth]{%
	solarcont-mf4-db4-90%
	}
	\includegraphics[width=0.45\linewidth]{%
	solarcont-mf4-sort-db4-98-detail%
	}\\
	\parbox{0.9\linewidth}{%
	\caption{%
	\label{fig:solarcont-mf4}
	Daubechies wavelet transform of Reinhart MF:4 mapped solar
	contributions (2305 bins) after thresholding 90\% of coefficients.
	In their original linear ordering (left), the binned contributions
	exhibit isolated discontinuities (peaks) which are poorly
	decorrelated by the
	wavelet transform, resulting in obvious compression artefacts.
	Ordering the contributions by increasing magnitude (right)
	imposes corellation and significantly reduces these artefacts.
	This reordering
	however complicates the reconstruction of the contributions,
	and doesn't decorellate the contributions in their original
	hemispherical domain.
	}
	}
	\end{figure}

	Other 1D bin orderings using 2D space-filling curves such as Morton codes
	or Hilbert indices yielded similarly disappointing results with peaky
	contributions.
	In addition, sorting the contributions complicates their reconstruction,
	as this requires storing the original bin order.
	Furthermore, this still doesn't decorellate the contributions
	in their original 2D hemispherical domain, which results in suboptimal
	compression. A 1D wavelet compression was therefore abandoned in
	favour of a more elaborate 2D mapping of the hemisphere, and a wavelet
	transform in this domain.

	Furthermore, it was desirable not to limit the number of bins to a
	power of two. This requires extending the contributions at the boundaries,
	as described in section \ref{sec:boundaryExt}. Boundary extensions in
	the context of wavelet transforms are poorly documented in the literature;
	consequently, a detailed analysis of the \cmd{pywt} C source code was
	conducted in order to obtain an insight into how the different boundary
	extension modes offered by the module are implemented, notably in a
	multi-dimensional context. This revealed the necessity of introducing
	additional \textit{padding coefficients} at the boundaries, consequently
	increasing the number of coefficients beyond the number of bins.
	Because it is so comprehensive and highly
	optimised, the \cmd{pywt} source code became the primary basis for the
	wavelets module that was developed for \radiance{}, which is described in
	section \ref{sec:wavelet3.c}.


	% ---------------------------------------------------------------------------


	\section{Initial Wavelet Coefficient Encoding Tests}
	\label{sec:mrgbePoC}

	An initial analysis of the dynamic range of normalised, scalar wavelet
	coefficients generated by the test script using \cmd{pywt} indicated
	that the majority of these spanned ca. 10 orders of magnitude (see
	figure~\ref{fig:coeffRange}).

	\begin{figure}[htb]
	\centering
	\includegraphics[width=0.9\linewidth]{skycont-mf4-sort-db4-coeffs}
	\parbox{0.9\linewidth}{%
	\caption{%
	\label{fig:coeffRange}
	Logarthmic plot of absolute wavelet coefficient magnitudes after
	normalisation. The coefficients (magenta crosses) are clustered
	in bands of increasing frequency from left to right, with the
	number of coefficients doubling in consecutive bands due to the
	wavelet transform's multiresolution analysis. In this example,
	the dynamic range of the coefficients is limited to ca. 10 orders
	of magnitude. It is also evident that the coefficients at the
	bottom between $10^{-19}$ and $10^{-20}$
	are negligible and can be omitted. It follows therefore that
	it suffices to encode coefficients up to the green line at ca.
	$4.6^{-10}$; this corresponds to an effective range of
	$[2^{-31}, 1]$, which can be encoded using a 5-bit binary mantissa.
	In practice, thresholding the coefficients will further reduce this
	dynamic range.
	}
	}
	\end{figure}

	By clamping the maximum absolute
	coefficient magnitude to 1, it is possible to map the coefficients as
	3-tuples to a normalised range and encode these as integer RGB
	mantissae%
	\footnote{Yeah, that \emph{is} the plural.}
	with a common exponent, thus accounting for 3 colour channels, as is done
	with \radiance's 32-bit RGBE format \cite{ward-RGBE-1994}.
	Furthermore, because the coefficients are thresholded, their minimum
	absolute magnitude is likewise bounded. This insight, coupled with the fact
	that the coefficients in the range $10^{-19}$ to $10^{-20}$ are clearly
	insignificant, indicated the viability of an encoding with limited
	precision. This encoding, which would require pre-normalising the
	coefficients and storing corresponding independent normalisation factors
	per colour
	channel, would also need to accommodate a coefficient index to indicate
	which coefficients remain after thresholding. This payload data would
	be incorporated into the modified RGBE encoding (or mRGBE, as we like
	to call it%
	\footnote{
	Actually, it can stand for \{micro, modified, minimalist\} RGBE;
	take your pick.
	It just has to sound cool nowadays; style over substance, you know.
	iRGBE would've been trendy too, but probably infringes on some
	trademark from Cupertino.
	}) as a dedicated field of fixed width, which of course limits the
	number of bits allocated to the RGB mantissae and the exponent, thus
	inherently reducing
	the precision compared to \radiance's native RGBE, which allocates
	8 bits to each field.

	\begin{figure}[!htb]
	\centering
	\includegraphics[width=0.8\linewidth]{%
	skycont-mf4-sort-db4-bincoeffs%
	}\\
	\includegraphics[width=0.8\linewidth]{%
	skycont-mf4-sort-db4-bincoeffs-detail%
	}\\
	\parbox{0.9\linewidth}{%
	\caption{%
	\label{fig:mrgbeTest}
	Logarithmic plot of wavelet coefficient range in figure
	\ref{fig:coeffRange} (top, detail inset at bottom) with
	encoded/decoded mRGBE coefficients superimposed as blue circles.
	Despite the limited precision
	(4 RGB mantissa bits + 1 sign, 5 exponent bits), the mRGBE
	encoding correlates well with the original floating point values
	within the constrained dynamic range $[2^{-31}, 1]$ covered by
	the exponent.
	}
	}
	\end{figure}

	A prototype mRGBE encoding/decoding script (\cmd{bincoeff\_test.py},
	importing routines from \cmd{wave\-let\_test.py}) was therefore developed in
	Python to check the accuracy (read: loss of precision) of the proposed
	encoding given the previous practical sample data. Figure
	\ref{fig:mrgbeTest} shows initial results of an mRGBE encoding using
	5 bits per RGB mantissa (including 1 bit each for the sign) and the
	exponent (implicitly negative), leaving 12 bits for the payload data,
	which was not used in this test. Despite the limited precision, the
	encoded and decoded coefficients exhibit remarkable coherence with the
	original floating point values. However, it should be noted that the
	RGB colour channels are identical in this test; although highly saturated
	luminance is unlikely in realistic scenarios, higher deviations were
	expected when the R, G, and B channels differ. This was further
	investigated and quantified in the unit test of the C module
	developed on the basis of the Python prototype, which is detailed in
	section \ref{sec:mrgbe.c}.


	% ---------------------------------------------------------------------------


	\section{Design Goals}

	The conclusions drawn from the initial tests helped set the design goals
	for the precomputed contribution photon map. These included:
	\begin{itemize}
	\item A fixed binning of photon incident directions to a square matrix
	using the area-preserving Shirley-Chiu disk-to-square mapping
	which enables decorrelating the contributions in 2D in
	a subsequent wavelet transform.
	\item Compression in 2D via a computationally lightweight 4-tap
	(having a support of 4 samples) wavelet transform, the Daubechies
	DB2 wavelet being a popular choice.
	\item An arbitrary number of bins (not just powers of 2). This requires
	boundary extension of the contributions during the wavelet
	transform.
	\item Reuse or adaptation of existing photon map code where
	possible; this included modifying the existing out-of-core photon
	cache, the original contribution photon distribution routines,
	the precomputation routines for global photons,
	photon lookup routines, out-of-core caching routines, and
	routines interfacing to \rcontrib.
	We doan' need to reinvent da wheel.
	\item Compact encoding (mRGBE) of 3-tuple wavelet coefficients and
	their indices in a 32-bit envelope in a separate wavelet
	coefficient file, with at least 5 bits per RGB mantissa.
	\item Thresholding of wavelet coefficients by dropping a fixed number;
	this results in a fixed size of compressed coefficients per
	photon, which facilitates directly accessing contribution records
	in the wavelet coefficient file.
	\item Binning settings in the precomputed photon map override those
	in \rcontrib{} when the photon map is loaded; these are passed
	via an option file generated by \mkpmap.
	\item Out-of-core storage of photons and their wavelet coefficients,
	with on-demand paging in \rcontrib,
	as already done by the standard out-of-core photon map.
	Consequently, the in-core kd-tree data structure is no longer
	supported with contribution photons.
	\item Caching of reconstructed contributions after mRGBE decoding and
	inverse wavelet transform in \rcontrib.
	\end{itemize}


	% ---------------------------------------------------------------------------

	%\clearpage

	\chapter{Implementation}
	\label{sec:implementation}

	The following sections present implementation details of each
	component of the precomputed contribution photon map workflow
	in reference to figure \ref{fig:overview}.
	Each section refers to the associated source code modules
	and relevant functions. The reader is referred to
	the software architecture graphs in section \ref{sec:swarch} in the
	appendix
	for an overview of the modules, how they are embedded within the
	\rcontrib{} framework, and how they interact.

	The exposition also frequently refers to data types defined in the source
	code, which are summarised in listings \ref{lst:codeDefs1},
	\ref{lst:codeDefs2}, and \ref{lst:codeDefs3} for the interested reader.


	% ---------------------------------------------------------------------------


	\section{Contribution Photon Generation \& Precomputation with
	\mkpmap}

	Precomputed contribution photon maps are generated with the \mkpmap{}
	tool. This entails distributing (and binning) the photons,
	precomputing their binned contributions by density estimation,
	compressing the resulting contributions via wavelet transform (handling
	potential boundary artefacts), and encoding the wavelet coefficients.


	\subsection{Contribution Photon Distribution}

	Contribution photons have their own photon distribution routine,
	\cmd{pmcontrib2:distribPhotonContrib()}.
	It differs from the standard photon distribution routine
	\cmd{pmap:distribPhotons()} in that each source contributes
	(approximately) the same number of photons
	(i.e. $nemit \approx nphotons \: /\, nsrc$, where $nsrc$ is the number of
	contributing sources).
	This measure is intended to
	balance the density of photons contributed by each source whose
	contributions are sought (as specified with the \opt{-m} option to
	\mkpmap).
	As in the standard photon
	distribution routine, the number of emitted photons to emit is adjusted
	to obtain the desired target photon count
	by emitting a fraction of the photons in a prepass, and then
	extrapolating this
	in proportion to the stored photons vs. the total target count.

	Since the number of photons emitted from each source no longer correlates
	with its emitted flux, as is the case with \cmd{distribPhotons()},
	the resulting flux per photon must be adjusted individually for
	each source to compensate for bias, at the expense of increased variance.


	% ---------------------------------------------------------------------------


	\subsection{Binning of Contributions}
	\label{sec:binning}

	When a contribution photon is emitted, its \textit{contribution source}
	is temporarily set in the photon map's \cmd{PhotonMap.lastContribSrc}
	field, consisting of a 16-bit integer
	source index and bin. The latter is obtained by calling
	\cmd{pmapcontrib:ray2bin()} via \cmd{pmapcontrib:contribSourceBin()},
	which performs a
	Shirley-Chiu disk-to-square mapping (see figure \ref{fig:shirleyChiu})
	of the photon's emitted direction
	vector.

	\begin{figure}[htb]
	\centering
	\includegraphics[width=0.8\linewidth]{shirley-chiu2-crop}\\
	\parbox{0.9\linewidth}{%
	\caption{%
	\label{fig:shirleyChiu}
	Shirley-Chiu disk-to-square mapping used to bin
	contribution photon directions. This mapping has the
	desirable property of preserving adjacency and fractional area.
	The disk is obtained by projecting the
	hemisphere of photon directions onto the plane defined by the
	surface normal $[rNx,\: rNy,\: rNz]$. The polar angle origin
	$\phi = 0$ in the disk is defined by the up vector
	$[Ux,\: Uy,\: Uz]$.
	}
	}
	\end{figure}

	The disk coordinates are obtained by projecting the photon's direction
	onto the disk plane as defined by its normal, $[rN_x,\: rN_y,\: rN_z]$
	(see below). The mapped square coordinates $[sc_x,\: sc_y]$
	are, in turn, mapped to a linear bin
	\begin{equation}
	b_p = sc_x \cdot scdim + sc_y,
	\label{eq:linearIdx}
	\end{equation}
	where $scdim = \left\lfloor\sqrt{nbins}\right\rfloor$ is the dimension
	of the Shirley-Chiu square,
	and is derived from the number of bins specified to \mkpmap{}
	with the \opt{-bn} option. If the latter is not an integer square,
	it will be rounded to the nearest such number by \mkpmap.
	If the Shirley-Chiu mapping fails (e.g if the photon's direction
	lies in the plane or is incident from the back, the bin is set to
	-1 as invalid. This mapping is fixed for contribution photons and
	cannot be modified by the user, as the resulting square topology
	is immediately applicable to a 2D wavelet transform in matrix form.

	\cmd{ray2bin()} evaluates the following variables to reorient the
	disk-to-square mapping:
	\begin{itemize}
	\item{$RHS$}: 1 for right-handed coordinate system
	(default), -1 for left-handed.
	\item{$rNx,\: rNy,\: rNz$}: disk plane surface normal
	(defines $\theta = 0$, default [0, 0, 1])
	\item{$Ux,\: Uy,\: Uz$}: up vector (defines $\phi = 0$,
	default [0, 1, 0])
	\end{itemize}

	The contribution source is set speculatively for each emitted photon,
	in the anticipation that its path will lead to stored photons. If the
	path spawns no photons (e.g. because the photon was backscattered
	from a port and left the scene, or because the materials have zero
	reflectance), the contribution source is simply
	discarded and overwritten by the next emitted photon.

	On the other hand, When a photon is created, the photon's contribution
	source field \cmd{Pho\-ton.\-aux.\-con\-trib\-Src} is set from
	\cmd{PhotonMap.lastContribSrc}, which remains constant for the photon
	path until the next photon is emitted. Note that \cmd{Photon.aux} is
	designated an auxiliary data field which is specific to the photon type.
	It is defined as a union structure and can also store a unique photon
	path ID for regular photons, and a photon's time of flight (expressed
	as distance travelled) for transient photons.

	The contribution sources stored with each photon are subsequently
	accumulated in bins when the contributions are precomputed, as
	detailed in the next section.


	% ---------------------------------------------------------------------------


	\subsection{Precomputation of Contribution Photons}
	\label{sec:precomp}

	Once the approximate target number of photons \var{nphotons} has been
	reached, a fraction \var{precomp} of these (specified with the \opt{-apP}
	option to \mkpmap) is drawn uniformly at random as candidates for
	precomputation, thus preserving the relative distribution of photons in
	the scene. The entire precomputation is wrapped by the routine
	\cmd{pmapcontrib:preComputeContrib()}.

	The contributions for a candidate precomputed photon
	are collected by function
	\cmd{pmap\-con\-trib:\-get\-Pho\-ton\-Con\-trib()}. This
	routine extracts the candidate photon's emitting light source from its
	contribution source field \cmd{Photon.aux.contribSrc.sourceIdx},
	from which, in turn, the source modifier is obtained. If the modifier is
	among those whose contributions are sought, it is passed to the
	standard photon lookup routine \cmd{pmapdata:findPhotons()}, which in
	turn calls the lookup routine specific to the out-of-core data structure
	(sorry, no OOP-style overloading here), \cmd{pmapooc:OOC\_FindPhotons()}.
	This routine performs a nearest neighbour search for the \var{bwidth}
	closest photons. It also sets a filter of instance
	\cmd{pmapfilt:OOC\_SearchFilterState}, which is passed to a callback
	function \cmd{pmapfilt:filterPhoton()} called by the lookup routine
	to accept/reject photons on the fly based on the filter's criteria.
	This function ordinarily tests only photon normals to reject photons
	incident from backfaces. However, with contribution photons, the
	filtering callback further adds the constraint that the photons are
	emitted from a given light source modifier. This ensures the
	lookup collects only contributions for the same modifier as the candidate
	precomputed photon.

	If the photon search is successful, \cmd{getPhotonContrib()} performs
	a modified density estimate of the photons, accumulating their
	contributions (obtained from their \cmd{Photon.flux} field) in
	the preassigned bins indicated by each photon's contribution source
	field \cmd{Photon.aux.contribSrc.sourceBin} (this incidentally also
	includes the candidate precomputed photon's own binned contribution).
	The binned contributions are then divided by the area of the disk
	intercepted by the search volume, with radius corresponding to the
	maximum found photon distance.

	The precomputation routine allocates a lookup table (\cmd{struct LUTAB},
	defined in the stock \radiance{} module \cmd{lookup.c}) of per-modifier
	\emph{child} photon maps, and assigns it to the \emph{parent} photon map's
	\cmd{PhotonMap.preCompContribTab} field. This enables the parent photon
	map to act as a container for the per-modifier child photon maps, with
	the actual precomputed photons stored in the latter. The modifier name
	then acts as the LUT key, as is also the case with \rcontrib's
	contribution table \cmd{rcontrib:modconttab}.

	The precomputation routine also allocates a container
	\cmd{PhotonMap.preCompContrib} of type \cmd{PreComputedContrib} in
	each child photon map (see listing \ref{lst:codeDefs2}, which serves
	as scratch space for the subsequent wavelet transform and compression.
	This includes an instance of \cmd{wavelet2.h:WaveletMatrix2}, which
	is a 2D array of floating point 3-tuples, corresponding to the
	RGB wavelet coefficients in each colour channel.
	In addition, the wavelet transform requires a second, transposed
	matrix for intermediate results, which is also preallocated
	and initialised to 0 prior to every transform.
	Note that these data structures are allocated and dimensioned separately
	per modifier (which is unique to each child photon map), as the binning
	parameters may differ for each.
	The preallocated data structure container of type \cmd{PreComputedContrib}
	is then passed to the main compression/encoding routine
	\cmd{pmapcontrib:encodeContribs()} along with the user-specified
	compression ratio \var{comp}.
	This routine performs the actual wavelet transform and
	subsequent coefficient thresholding.


	% ---------------------------------------------------------------------------


	\subsection{2D Wavelet Transform}
	\label{sec:wavelet3.c}

	Wavelets are a science unto themselves and a very broad topic that is
	mostly opaque for non-ma\-the\-ma\-ti\-ci\-ans and even computer
	scientists.
	Once one has grasped the fundamental concept of wavelets, however, the
	theory is relatively intuitive, if far from simple. The reader is
	referred to \cite{Graps:1995} for an excellent introduction with
	notable applications.

	Much like the Fourier transform, the wavelet transform essentially
	represents a given input signal as a series of sums of scaled
	(in terms of amplitude) basis functions, that have been dilated and
	translated along the signal's propagation axis.
	These functions are referred to as the actual wavelets, though in
	practice they aren't necessarily always undulating (see that funny
	squiggly thing in figure \ref{fig:waveletFunc} for an example).
	Unlike Fourier
	basis functions, wavelets have a finite support, i.e. a defined
	non-zero region of influence. Based on this principle, it is
	possible to decompose a given signal into a set of coefficients
	(the scaling terms) at different dilations and translations.
	The dilations and translations can be considered sliding windows
	depicting the signal at varying zoom levels, or resolutions.
	Consequently, wavelets are a form of \textit{multiresolution
	analysis}.

	Signals can be decomposed by a wavelet transform in arbitrary
	dimensions, though applications in 1 and 2 dimensions are most common
	(with image processing and compression being notable applications of
	the latter).
	The precomputed contribution photon map decomposes the pre-binned
	contributions in their native 2D domain to optimally leverage
	corellation along each axis of the Shirley-Chiu square, effectively
	treating it as a square matrix.

	Once the contributions for each precomputed photon have been accumulated,
	they are transferred as input to the preallocated wavelet matrix. In the
	general case, this requires a \cmd{memcpy()} of each matrix row from the
	linear contribution array, effectively reverting the linearisation
	that follows the Shirley-Chiu mapping.%
	\footnote{In the special case where input and output sizes of the
	transform are the same, i.e. when no padding oocurs, the matrix
	rows (which are \textit{Iliffe} vectors, or pointers to arrays) can
	simply be set up to map directly into the corresponding segments in
	the linear contribution array, obviating the need to push data about.
	Obviously this is much more efficient. This can be done in
	conjunction with the specialised 2D wavelet transform function
	\cmd{wavelet2:waveletXform2()}, which only operates on input
	sizes of powers of 2 and does not generate padding coefficients.
	However, due to the major restriction this imposes on the input size
	(and the number of bins), this function is not used by the precomputed
	contribution photon map.
	}

	The routine \cmd{pmapcontrib:encodeContribs()} calls the full 2D
	wavelet transform wrapper routine \cmd{wavelet3:padWaveletXform2()}.
	This routine, in turn calls \cmd{padD2Step2()} to perform one pass of the
	transform along a fixed (horizontal) axis. The latter function accepts
	an output matrix of the same size as the input, and transposes the
	resulting coefficients from the iteration ``on the fly'' as they are
	written to the output matrix.

	With the input and output matrices now swapped, the next iteration of
	this function will perform another horizontal pass over the transposed
	matrix, which is tantamount to a vertical pass over the original matrix.
	With the output matrix now transposed a second time, the original
	orientation is obtained, and the matrices swapped a second time.
	Consequently, each pair of invocations of a transform step constitutes a
	complete horizontal/vertical transform pass at a given resolution (see
	figure \ref{fig:waveletStep}).

	\begin{figure}[htb]
	\centering
	\includegraphics[width=0.7\linewidth]{Daubechies4-functions}
	\parbox{0.9\linewidth}{%
	\caption{\label{fig:waveletFunc}
	Daubechies DB2
	``4-tap'' wavelet function. The scaling function
	(blue) decorrelates the input signal as approximation
	coefficients. The wavelet function (red) -- itself orthogonal
	to the scaling function -- decorrelates the input as detail
	coefficients. Note the asymmetry of this wavelet family. This
	wavelet has a support of 4 adjacent samples and balances
	computational efficiency and boundary effects for poorer
	decorrelation compared to wavelets with larger supports,
	but also higher computational expense.
	}
	}
	\end{figure}

	\begin{figure}[p]
	\centering
	\includegraphics[width=0.7\linewidth]{%
	wavelet3-test-16x16-steps-reaspect-crop%
	}\\
	\parbox{\linewidth}{%
	\caption{\label{fig:waveletStep}
	Sample output of wavelet unit test for 16$\times$16 bins,
	showing the first two resolution levels of the 2D wavelet
	transform,
	each consisting of transform steps over the horizontal and
	vertical axes. The output of each transform step
	(right of arrows)
	becomes the input for the next step (left of arrows).
	Starting with the original input samples $y_{i,j}$, each step
	generates a set of approximation coefficients $s_{i,j}$ (red
	arrows) and detail coefficients $d_{i,j}$ (green arrows). To
	alternate the transform axes, the implementation transposes the
	output matrix on the fly (note the reversed output coefficient
	indices), consequently the transform need only be performed
	along one axis, which simplifies indexing; the original matrix
	orientation is then restored every two steps. After each
	iteration (horizontal/vertical transform pair), the
	approximation coefficients $s(\ldots (s_{i,j})\ldots ) =
	(s_{i,j})^k$ in the upper left of the output matrix become the
	input for the next (halved) resolution level. The sizes of the
	progressively smaller output submatrices (indicated by square
	brackets) include padding coefficients. Consequently, the
	output matrix leaves unoccupied cells (indicated by dots) to
	accommodate these additional coefficients, which accumulate in
	the upper left of the matrix in subsequent resolution levels.
	}
	}
	\end{figure}

	Each transform pass reduces the number of output
	coefficients thereby yielding a coarser representation of the
	original input signal, corresponding to a lower frequency band. This
	is termed
	\textit{multiresolution analysis} and is a key characteristic of
	the wavelet transform, namely that it decorrelates the input signal
	at various resolutions, or frequency bands (in filtering parlance).
	The alternating transposition of the matrix effects a decorellation
	along the horizontal and vertical axes.

	\begin{figure}[htb]
	\centering
	\includegraphics[width=0.75\linewidth]{%
	wavelet3-test-16x16-full-reaspect-crop%
	}
	\parbox{0.9\linewidth}{%
	\caption{\label{fig:waveletCoeffsFull}
	Sample output of wavelet unit test for 16$\times$16 bins,
	showing the wavelet coefficient matrix after a full transform.
	The coloured fields identify the coefficient type (
	red = approximation $s$, green = detail $d$) and the successive
	transform steps, from right to left. Each iteration generates
	coefficients from those of prior iterations at higher resolutions:
	approximations of prior details $sd$ (lower left),
	details of prior approximations $ds$ (upper right),
	details of prior details $dd$ (lower right),
	approximations of prior approximation $ss$ (upper left).
	Each subsequent iteration then recurses into the upper left
	submatrix, using the approximations $ss$ as
	increasingly coarse representations of the original contributions
	as input. After the final iteration, the red submatrix in the
	upper left corner contains the 3$\times$3 coarsest approximations.
	}
	}
	\end{figure}

	The wavelet transform projects the input signal $y_{i,j}$ onto a pair of
	orthogonal basis functions specific to each wavelet family (see figure
	\ref{fig:waveletFunc} for a simple example from the Daubechies family%
	\footnote{The naming convention within each wavelet family is not firmly
	established. The Daubechies wavelet in figure \ref{fig:waveletFunc}
	is either denoted \textit{DB2}, referring to its 2 vanishing moments,
	or \textit{D4}, referring to its support of 4 (having 4 ``taps'',
	in filter parlance).
	The \lit{pywavelets} module adheres to the former convention,
	consequently so does the module \cmd{wavelet3.c}.
	}),
	termed the \textit{scaling function} $\Phi_k(j)$ and the \textit{wavelet
	function} $\Psi_k(j)$, where $k$ is the resolution (or dilation) of the
	function, and $j$ is its position along the input signal's transform axis.
	Unlike Fourier basis function, wavelets have finite support and are
	localised in space, i.e. they are only non-zero for a finite subset of
	sample positions $j$. Larger supports decorrelate over more samples and
	therefore offer improved compression (smaller coefficients), but at
	higher computational expense.

	The scaling and wavelet functions $\Phi_k(j)$ resp. $\Psi_k(j)$
	decompose the $k$-th input matrix along a fixed axis $i$ (matrix row or
	column) into a pair of approximation coefficients $s^k_{i,j}$ and
	detail coefficients $d^k_{i,j}$. The input matrix is, itself, the set of
	approximation coefficients $s^{k-1}_{i,j}$ from the previous iteration.
	The set of approximation coefficients $s^k$ always represent the
	signal at half the resolution compared to those from the previous
	iteration, $s^{k-1}$. In the initial iteration $k=0$, the approximations
	are equal to the original input signal: $s^0_{i,j}$ = $y_{i,j}$.
	For a 4-tap function as shown in figure \ref{fig:waveletFunc}, the
	$k$-th iteration of the wavelet transform is as follows:
	\begin{eqnarray}
	\label{eq:d2FwdXform-s}
	s^k_{i,j} &=& h_0\ s^{k-1}_{i,2j} \ +\ h_1\ s^{k-1}_{i,2j+1}\ +\
	h_2\ s^{k-1}_{i,2j+2}\ +\ h_3\ s^{k-1}_{i,2j+3}\\
	\label{eq:d2FwdXform-d}
	d^k_{i,j} &=& g_0\ s^{k-1}_{i,2j} \ +\ g_1\ s^{k-1}_{i,2j+1}\ +\
	g_2\ s^{k-1}_{i,2j+2}\ +\ g_3\ s^{k-1}_{i,2j+3},
	\end{eqnarray}
	where $h_j$ and $g_j$ are constants obtained by evaluating
	$\Phi_k(j)$ and $\Psi_k(j)$ at the corresponding positions of the input
	signal samples that fall within the functions' support.
	For the 4-tap Daubechies wavelet, these are:
	\begin{equation}
	\label{eq:d2-hCoeffs}
	h_0 = \frac{1 + \sqrt{3}}{4\sqrt{2}} ;\quad
	h_1 = \frac{3 + \sqrt{3}}{4\sqrt{2}} ;\quad
	h_2 = \frac{3 - \sqrt{3}}{4\sqrt{2}} ;\quad
	h_3 = \frac{1 - \sqrt{3}}{4\sqrt{2}}
	\end{equation}
	\begin{equation}
	\label{eq:d2-gCoeffs}
	g_0 = h_3 ;\quad
	g_1 = h_2 ;\quad
	g_2 = h_1 ;\quad
	g_3 = h_0
	\end{equation}

	Note that the transform axis index $j$ is doubled on the RHS of
	equations \ref{eq:d2FwdXform-s} and \ref{eq:d2FwdXform-d};
	this is because each iteration of the
	transform reduces the number of approximation coefficients compared to the
	input, yielding the coarser approximation.
	Note also that, for the sake of clarity, equations \ref{eq:d2FwdXform-s}
	and \ref{eq:d2FwdXform-d}
	omit the on-the-fly transposition shown in figure \ref{fig:waveletStep};
	the actual implementation swaps the indices $i, j$ on the LHS
	during assignment.

	In the case of the precomputed contribution photon map, the contributions
	contain RGB radiometric data. Consequently, the above decomposition
	is extended to the three colour channels. Thus the coefficients
	$s^k_{i,j}$ and $d^k_{i,j}$ are infact 3-tuples.
	The colour channels are treated completely independently, and no
	decorellation occurs between them.

	Each subsequent iteration of the wavelet transform recurses on the
	approximation coefficients on the previous iteration; in a 2D matrix,
	this corresponds to the upper left subquadarant of the wavelet
	coefficient matrix (see figure \ref{fig:waveletCoeffsFull}). Detail
	coefficients are accumulated in the remaining subquadrants, starting at
	the corners. In the 2D context, this implies that detail coefficients
	are also obtained from details and approximations of the previous
	iteration, and likewise, approximations from previous approximations and
	details, thus fully decorrelating the input signal in every dimension.


	% ---------------------------------------------------------------------------


	\subsection{Wavelet Transform Boundary Extension}
	\label{sec:boundaryExt}

	In an ideal world (a.k.a. theory), input signals are assumed to be
	infinite and life is simple. In \realLife, everything has a beginning
	and an end, including life itself, and incidentally, the input to a
	wavelet transform.%
	\footnote{... which inevitably raises that perpetual philosophical
	question, ``why bother?''
	}
	Because wavelets have finite support, they will partly extend beyond the
	given input signal. There are many sensible (and equally less so) ways to
	extend a given input, but this really depends on assumptions about
	the nature of the underlying signal, i.e. whether it is periodic,
	monotonic, or singular (peaky).

	If the input size is a power of two, the signal can simply be wrapped
	around, and the number of approximation coefficients is halved in each
	iteration.%
	\footnote{The specialised 2D wavelet transform function
	\cmd{wavelet2:d4Step2()} only handles input sizes of powers of 2,
	and is hardwired to perform a cyclic (wraparound) extension of the
	input signal, without the need for padding coefficients. This is
	obviously
	far more efficient and compact, since the number of resulting
	coefficients is the same size as the input. While not used by the
	precomputed contribution photon map, this function is available for
	use by other \radiance{} applications where this restriction on input
	sizes is acceptable.
	}
	However, such a restriction on the input size may be deemed unacceptable
	for many applications, including precomputed contributions, which then
	warrants a boundary treatment of the input signal.

	\begin{figure}[htb]
	\centering
	\includegraphics[width=0.45\linewidth]{boundaryext-crop}
	\parbox{0.9\linewidth}{%
	\caption{\label{fig:boundaryExt}
	Example of boundary extension at the right edge of a signal
	of length $l$, consisting of samples $y_{i,0} \ldots y_{i,l-1}$.
	Common extension modes include zero (gray), symmetric/reflection
	(green), constant (red), and 1st order gradient (blue).
	}
	}
	\end{figure}

	Boundary issues are handled by extrapolating
	the input signal beyond the left and right boundaries. The wavelet
	module \cmd{wavelet3.c} offers the following boundary extension modes,
	(see also the definitions in \cmd{wavelet2.h}, and figure
	\ref{fig:boundaryExt} for examples):
	\begin{description}
	\item[WAVELET\_EXTEND\_CONST:] the input signal is assumed to be
	constant beyond
	the boudary, i.e. the first resp. last value is repeated as
	required. This is probably the safest choice in most cases, unless
	there is a significant gradient at the boundaries.
	\item[WAVELET\_EXTEND\_GRAD1:] the input signal is linearly extrapolated
	according to its 1st order gradient (slope).
	\item[WAVELET\_EXTEND\_GRAD2:] the input signal is linearly extrapolated
	according to its 2nd order gradient.
	\item[WAVELET\_EXTEND\_CYCL:] the input signal is assumed to be cyclic
	(i.e. periodic) and wrapped around at either boundary. This may
	not always be the optimal choice.
	\item[WAVELET\_EXTEND\_SYMM:] the input signal is assumed to be
	symmetric,
	and reflected at either boundary. This may not always be the optimal
	choice.
	\item[WAVELET\_EXTEND\_ZERO:] this input signal is simply set to zero
	beyond
	the boundary. This can lead to large coefficients if the absolute
	input values are large near the boundary, which, in turn, can lead to
	poor compression. This is optimal only for input signals consisting of
	a singularity (i.e. an isolated peak tapering towards the boundaries).
	\item[WAVELET\_EXTEND\_INV:] the input signal is reflected at the
	boundary,
	similarly to \cmd{WAVE\-LET\_\-EXT\-END\_\-SYMM}, but negated,
	resulting in anti-symmetry.
	This extension artificially boosts boundary detail coefficients
	to reduce the likelihood they will be tresholded compared to the
	interior coefficients, and therefore preserved. In practice tho,
	it turns out to be crap.
	\end{description}
	The boundary extension mode can be defined by setting
	\cmd{WAVELET\_EXTEND\_MODE} to one of the above. If undefined, the
	extension mode defaults to \cmd{WAVELET\_EXTEND\_CONST}. Several of these
	modes can also be found (possibly under different names) in the
	\lit{pywavelets} package, and are partly inspired by these.

	Because this is a compile-time option, it is not user-selectable. While
	this lacks flexibility, in practice most users would be overwhelmed if
	they had to choose a suitable boundary extension mode, since probably
	neither they (nor the author) can anticipate the nature of the resulting
	input to the wavelet transform.


	% ---------------------------------------------------------------------------


	\subsection{Padding Coefficients}

	Boundary extension implies the generation of additional approximation and
	detail coefficient pairs beyond the halved length of the input. These
	\textit{padding coefficients} are generated at the boundaries,
	and are essential to reduce artefacts at the edges of the reconstructed
	input signal during the inverse wavelet transform.
	The precomputed contribution module uses the more general 2D padded wavelet
	transform module \cmd{wavelet3.c} to compress contributions, which lifts
	the restriction on the input size imposed by the 2D non-padded transform
	in \cmd{wavelet2.c}.

	The number of padding coefficients depends on the support of the wavelet;
	the larger the support, the larger the number of padding coefficients.
	Furthermore, these are accumulated at every iteration of the transform,
	i.e. over multiple resolutions.
	This means the total number of wavelet coefficients (approximation and
	details) can be significantly \emph{larger} than the input,
	which makes an effective compression all the more necessary in order
	for the wavelet encoding to be viable.

	With padding, the number of coefficients for an input of length $l^k$
	samples at iteration $k$ is:
	\begin{equation}
	l^{k+1} = \left\lfloor\frac{l^k + \lambda - 1}{2}\right\rfloor,
	\label{eq:padLen}
	\end{equation}
	where $\lambda$ is the size of the wavelet's support.
	Based on this, it is possible to predict the additional space required in
	the wavelet coefficient matrix for the padding coefficients, and dimension
	it accordingly in order to accommodate the latter.
	This is why the coefficient matrix in figure
	\ref{fig:waveletCoeffsFull} exhibits unoccupied (read: wasted) space.
	But since nobody at Micro\$oft gives a stuff about efficiency and quality
	software, why fret over a few wasted kilobytes?
	\footnote{... unless you
	plan on porting this to your ZX81 or KIM-1, in which case I wish you
	luck!
	}

	To support the allocation of an array of suitable dimensions,
	the function \cmd{wavelet2:padD2Step()} returns the output length
	for a given input, including padding, if either the input or
	output array is NULL. Similarly, the full transform function
	\cmd{wavelet2:padWaveletXform2()}, which calls the former, can sum this
	output length over all iterations if either its input or output array is
	NULL. The function \cmd{pmapcontrib:preComputeContrib()} relies on
	this functionality when allocating the wavelet coefficient matrices for
	each contribution modifier.


	% ---------------------------------------------------------------------------


	\subsection{Wavelet Coefficient Thresholding}

	Contributions are compressed by thresholding their corresponding
	detail coefficients after the wavelet transform is complete. This entails
	removing the (\var{comp})\% least significant such coefficients (i.e.
	smallest in terms of absolute value, since coefficients can
	be negative). This is demonstrated in figure \ref{fig:waveletCoeffsThresh}
	for a compression of 75\%. In general, the details of details in the
	lower right subquadrants will be the smallest coefficients, and therefore
	most likely to be thresholded.

	\begin{figure}[htb]
	\centering
	\includegraphics[width=0.75\linewidth]{%
	wavelet3-test-16x16-thresh75-reaspect-crop%
	}
	\parbox{0.9\linewidth}{%
	\caption{\label{fig:waveletCoeffsThresh}
	The coefficient matrix from figure \ref{fig:waveletCoeffsFull}
	after thresholding the 75\% coefficients with the lowest absolute
	value, implicitly setting these to zero (indicated by bracketed
	dots). For contributions with a low gradient, as is typically
	the case with sky luminance distributions, the lowest
	coefficients will mostly be the details in the submatrices
	highlighted in green. The coarsest approximation coefficients
	in the upper left submatrix (highlighted in red) are essential
	for reconstructing the original matrix, and are therefore never
	thresholded.
	}
	}
	\end{figure}

	Once coefficients have been selected for thresholding, they can be omitted
	when storing or transmitting the wavelet encoded data, thereby saving
	mass storage or bandwidth.
	The rationale behind thresholding is that the omission of insignificant
	coefficients will not appreciably alter the reconstructed signal.
	In the context of the reconstruction, the omitted coefficients are
	implicitly treated as zero.

	\begin{samepage}
	Thresholding lies at the heart of wavelet compression, which is why a
	good tresholding strategy is important.
	There are many ways to threshold coefficients, such as:
	\begin{enumerate}
	\item using a fixed threshold (hard thresholding, possibly adapted
	to the resolution),
	\item attenuating coefficients if they exceed the threshold (soft
	thresholding), and
	\item dropping a fixed fraction of the smallest coefficients.
	\end{enumerate}
	\end{samepage}

	Options (1) and (2) generate a variable number
	of thresholded coefficients, which better adapts to the frequency content
	of the original signal. The primary disadvantage of this strategy is that
	it complicates the paging of coefficients from disk, as it requires a
	dedicated indexing structure.
	Option (3) is the least optimal, since the user has no control over the
	magnitude of thresholded coefficients, and therefore the incurred
	error. It does, however, afford control over the \emph{number} of
	thresholded coefficients, as it is constant. This in turn greatly
	simplifies the paging mechanism (no index is required, since the record
	size is fixed and known beforehand). This strategy was therefore chosen
	for the precomputed contributon photon map to simplify the implementation.
	(Path of least resistance, you know, guv...)

	It is important to note that the coarsest level approximation coefficients
	(highlighted in the upper 3$\times$3 submatrix in figure
	\ref{fig:waveletCoeffsThresh}) are \emph{never} thresholded. These
	are essential for the reconstruction, since
	the entire matrix containing the original input signal is derived from
	them by the inverse wavelet transform, in conjunction with the (thresholded)
	detail coefficients. These approximation coefficients are therefore
	unconditionally preserved by the thresholding.

	Thresholding in the precomputed contribution photon map is performed
	in a dedicated routine \cmd{pmapcontrib:thresholdContribs()}.
	This transfers all non-zero \emph{detail} coefficients (i.e. omitting
	the unoccupied space in the wavelet coefficient matrix left over by
	padding) to a thresholding buffer, consisting of an array of
	struct \cmd{PreComputedContribCoeff}. Each entry in this buffer contains
	a pointer to the corresponding detail coefficient in the wavelet
	coefficient matrix, and its linearised 2D matrix index, using the
	mapping in equation \ref{eq:linearIdx} to linearise 2D bin indices.
	The index is necessary to identify which coefficients were removed
	after thresholding.

	The thresholding routine ``knows'' how many non-zero detail coefficients
	to expect based on the summed number of padded coefficients (equation
	\ref{eq:padLen}) minus the $3\times 3 = 9$ approximation coefficients,
	which are excluded from thresholding. If the wavelet transform
	actually produced some zero coefficients ('appens more often than
	you'd expect, guv), the thresholding buffer will contain fewer
	coefficients than expected. In this case, the remaining buffer is
	simply filled with as many duplicates of a zero coefficient as
	required; specifically, the coefficient in the lower right corner is
	chosen, since it is guaranteed to be unoccupied
	(see figure \ref{fig:waveletCoeffsFull}),
	and will therefore be thresholded anyway.

	The actual thresholding consists of partitioning the coefficients
	in the thresholding buffer so that all coefficients at positions
	$[0,\ l(1-comp/100)-1]$ have larger magnitudes than those in positions
	\mbox{$[l (1-comp/100),\ l-1]$}. This is performed by a separate
	recursive routine with the ever so unlikely name
	\cmd{pmap\-cont\-rib:\-co\-eff\-Par\-tit\-ion()}, which swaps
	out-of-order buffer entries, much like quicksort. Unlike quicksort,
	however, the coefficients within each partition need not be sorted,
	which significantly reduces the number of recursive calls. In addition,
	swapping buffer entries (i.e. pointers to coefficients and their matrix
	indices) is more efficient than swapping the actual RGB floating point
	coefficients.
	The coefficient magnitude is evaluated as dot product over RGB,
	corresponding to squared vector magnitude.

	Once the coefficients have been partitioned in the thresholding buffer,
	only the most significant are kept, i.e. those in the partition
	$[0,\ l(1-comp/100)-1]$. These are subsequently
	sorted by their coefficient indices using \cmd{qsort()} from the
	standard C library, again by swapping pointers to coefficients and their
	matrix indices, instead of the RGB coefficients themselves. This results in
	an array of coefficients (well, pointers to them) with monotonically
	increasing matrix indices, which is required by the subsequent mRGBE
	coefficient encoding on disk.


	% ---------------------------------------------------------------------------


	\subsection{mRGBE Wavelet Coefficient Encoding}
	\label{sec:mrgbe.c}

	Once the wavelet coefficients have been thresholded, they are encoded
	using the modified
	RGBE encoding (mRGBE) prototyped in section \ref{sec:mrgbePoC}.
	The C implementation in module \cmd{mrgbe.c} as part of the contribution
	photon map is essentially a direct port of the Python prototype, and
	is based on the same assumptions.
	The mRGBE fields and their correponding 32-bit integer value are defined
	as a union of type \cmd{mRGBE} (surprise, surprise, guv...) whose
	bit field configurations are defined by the macros \cmd{MANTBITS},
	\cmd{EXPBITS}, and \cmd{DATABITS}.
	The default bit field configuration is shown in figure
	\ref{fig:mrgbeStruct}, but can be redefined to suit applications that
	require more precision at the expense of a reduced payload data range,
	or vice versa. In practice, the default presents a compromise suitable
	for most applications.

	\begin{figure}[htb]
	\centering
	\includegraphics[width=0.95\linewidth]{mrgbe-crop}\\
	\parbox{0.9\linewidth}{%
	\caption{\label{fig:mrgbeStruct}
	Structure of 32-bit mRGBE encoding for wavelet coefficients.
	The encoding consists of three mantissae per RGB colour channel.
	a common exponent (base 2), and an associated payload data field
	to store the coefficient index (linearised from its 2D matrix
	indices). The bits can be allocated within the 32-bit envelope
	at compile-time to trade off precision, encoding range, and
	payload data range. The default configuration,
	\cmd{MANTBITS} = 6, \cmd{EXPBITS} = 5, \cmd{DATABITS} = 9
	(abbreviated 6:6:6:5:9), balances
	these confliciting requirements.
	}
	}
	\end{figure}

	The primary contribution compression/encoding function
	\cmd{pmapcontrib:encodeContribs()} keeps track of the per-colour channel
	range $[d_{min,i},\: d_{max,i}]$ of the wavelet coefficients' absolute
	values, and passes these in a struct \cmd{mRGBERange} to the encoding
	initialisation function \cmd{mrgbe:mRGBEinit()}.
	This function is responsible for setting the per-colour channel
	normalisation factor $d_{norm,i}$, which is returned in
	\cmd{mRGBERange.norm}:
	\begin{equation}
	d_{norm,i} = \frac{1 - 2^{-\left(2^\textrm{EXPBITS}\right)}}{
	d_{max,i} - d_{min,i}
	},\quad i\in\lbrace r,g,b\rbrace,
	\end{equation}
	where \cmd{EXPBITS} is the number of bits allocated to the shared exponent
	in the mRGBE encoding, as shown in figure \ref{fig:mrgbeStruct}.

	Once the mRGBE normalisation is initialised, \cmd{encodeContribs()} calls
	\cmd{mRGBEencode()} for each RGB wavelet detail coefficient
	$d = [d_r,\; d_g,\; d_b]$
	in the thresholding buffer, passing the initialised \cmd{mRGBERange}
	instance containing the normalisation factor $d_{norm}$, to obtain the
	corresponding mRGBE encoding consisting of per-colour channel mantissae
	$m_i$, and a shared base-2 exponent $x$:
	\begin{eqnarray}
	\label{eq:mrgbe}
	m_i &=& \mathrm{sgn}\,\left(d_i\right)
	\left\lfloor \overline{m} \cdot \overline{d}_i\, +\, \epsilon
	\right\rfloor\, +\, m_{max}, \quad i\in\lbrace r,\, g,\, b\rbrace,
	\quad\epsilon \in [0, 1)\\
	\overline{m} &=& m \frac{m_{max}}{\overline{d}}\\
	(m, x) &=& \mathrm{frexp}\left(\overline{d}\right)\\
	\overline{d} &=& \max\left(
	\overline{d}_r,\; \overline{d}_g,\; \overline{d}_b
	\right)\\
	\overline{d}_i &=& \left(\lvert d_i\rvert\, -\, d_{min,i}\right)\,
	d_{norm,i}, \quad i\in\lbrace r,\, g,\, b\rbrace\\
	m_{max} &=& 2^\textrm{MANTBITS-1},
	\end{eqnarray}
	where overbars denote normalised values, and \cmd{MANTBITS} specifies
	the number of bits allocated to each mantissa $m_i$ in the mRGBE encoding
	(including sign bit).
	Note that each coefficient $d_i$ is offset by its corresponding mRGBE
	range minimum, $d_{min,i}$, before being normalised by $d_{norm,i}$ to
	obtain $\overline{d}_i$. Similarly, $\overline{d}$ is the latter's
	maximum over RGB.

	The $\mathrm{frexp}(\overline{d})$ function is part of the
	standard C library and returns a floating point mantissa $m$ and an
	integer base-2 exponent $x$ such that $m 2^x = \overline{d}$. Note that
	the absolute value $\lvert x \rvert$ is stored in the mRGBE
	exponent field, since $\overline{d} \leq 1$ implies a consistently
	negative exponent.

	Note also that each mantissa $m_i$ is rounded to the nearest integer via
	an optional constant $\epsilon$ (defaulting to 0), and then offset
	by the signed mantissa maximum, $m_{max}$, which corresponds to half the
	encoding range accommodated by \cmd{MANTBITS}.
	This offset encodes the mantissa's sign, with all values below
	$m_{max}$ being negative.

	\cmd{mRGBEencode()} accepts each wavelet coefficient's linear index as
	payload data, which increases mono\-ton\-ical\-ly since the coefficients
	were previously sorted with respect to their indices.
	Consequently, the coefficient index is \emph{incrementally} encoded,
	i.e. as the difference to that of its immediate predecessor, starting at
	0 (hence only the first coefficient index is absolute).
	This incremental index encoding generally requires fewer bits to encode
	than an absolute index, which reduces the likelihood of
	overflowing the encoding range of the mRGBE payload data field.

	However, it is important to realise that the likelihood of overflowing the
	mRGBE data field increases with the index increments, notably when the
	thresholded wavelet coefficient matrix becomes sparsely populated due to
	a high compression ratio and/or number of bins. This cannot be predicted as
	it depends on the distribution of the thresholded coefficients, which in
	turn depends on the nature and frequency content of the wavelet transformed
	contributions. If this occurs, we're caught up the creek without a paddle
	and toss in the towel (!), aborting the contribution precomputation with
	an error.
	This is far from optimal, and an issue that
	perhaps could have been more elegantly handled with more time and budget.
	As a half-baked user-friendly gesture, \mkpmap{} will at least warn
	if the total number of coefficients (including padding) exceeds the
	payload data field range, i.e. $2^\textrm{DATABITS}$, indicating that,
	under the aforementioned conditions, the mRGBE data field
	\emph{could} theoretically overflow.

	The encoding function \cmd{mRGBEencode()} sets the per-colour-channel
	mantissae $m_i$ in each of the bitfields \cmd{mRGBE.\{red,green,blue\}},
	and the shared exponent $x$ in \cmd{mRGBE.exp}. Furthermore, the
	incremental coefficient index is set in bitfield \cmd{mRGBE.dat}.
	Together, these occupy a 32-bit envelope, which can be conveniently
	accessed as a scalar integer value \cmd{mRGBE.all} via the union
	declaration.

	Each such 32-bit mRGBE-encoded wavelet coefficient is appended to a
	temporary array of struct \cmd{mRGBE} in the container
	\cmd{preComputedContrib}, along with the instance of \cmd{mRGBERange}
	containing the encoding normalisation and offset. These are the final
	return values of the contribution compression/encoding routine
	\cmd{encodeContribs()}, at which point the routine returns.

	Once the wavelet detail coefficients have been compressed and encoded,
	\cmd{preComputeContrib()} prepends them with the
	approximation coefficients in the upper $3\times 3$ submatrix
	of the wavelet coefficient matrix, followed by the per-colour-channel
	mRGBE range minimum and maxima, as returned in the \cmd{mRGBERange}
	instance by \cmd{encodeContribs()}. Since the approximation coefficients
	are not thresholded, their positions in the matrix are known, on top
	of which their encoding demands higher precision than mRGBE, consequently
	they are encoded with \radiance's standard 32-bit RGBE format.
	Similarly, the mRGBE encoding range is also encoded as RGBE to
	preserve the normalisation and offset with sufficient precision,
	thereby minimising the decoding error. Since the approximation
	coefficients can (rather suprisingly) be negative, but 32-bit RGBE
	can only encode positive values, a hack is employed whereby the least
	significant bit (bit 0) of each mantissa is sacrificed as a sign bit
	(see macro \cmd{PMAP\_CONTRIB\_SET\_RGBE32\_SGN()} in
	\cmd{pmapcontrib.h}), resulting in a slight reduction in precision.

	At this point, a new photon is created with the same attributes
	as the original photon selected for precomputation, by calling
	\cmd{pmapdata:newPhoton()}. This function sets specific attributes
	for various photon types; for precomputed contribution photons, it
	sets the auxiliary data field \cmd{Photon.aux.contribSrc}
	to the current light source index (passed via the photon ray's \cmd{rsrc}
	field), in order to identify the precomputed photon's emitting light
	source.

	The RGBE encoded approximation coefficients and mRGBE range, as well as
	the mRGBE encoded thresholded detail coefficients, are passed to
	\cmd{newPhoton()} and (if not NULL) accumulated in a contribution
	buffer \cmd{PhotonMap.contribHeapBuf}. This is analogous to the
	accumulation of photons in the heap buffer \cmd{PhotonMap.heapBuf}
	which already takes place in \cmd{newPhoton()}; in this case it
	holds the precomputed photons associated with the contributions, in
	the same order.
	Once the heap buffer is full (both photon and contribution buffers
	hold the same number of records), they are flushed to separate
	heap files on disk via the lazily initialised file pointers,
	\cmd{PhotonMap.\{heap, contribHeap\}}. This ensures that
	photon map contruction is consistently out-of-core.

	Once \cmd{preComputeContrib()} has iterated over all precomputed
	photons, and flushed the photons and contributions for precomputed
	child photon map to disk, it chucks -- uh, discards the original
	photon map. Iterating over each child photon map, it calls the
	contribution photon map specific build routine,
	\cmd{buildPreCompContribPmap()}. This, along with the
	saving routines, resides in a separate module \cmd{pmcontrib3.c},
	and is described in the next section.


	% ---------------------------------------------------------------------------


	\subsection{Building and Saving Precomputed Contribution Photon Maps}
	\label{sec:buildSave}

	The top-level build routine for per-modifier child photon maps,
	\cmd{buildPreCompContribPmap()}, sets up the output subdirectory
	from the parent photon map's name (as specified to \mkpmap{} via
	the \opt{-apC} option), and the filename of each child photon map and
	its wavelet coefficient file by appending the corresponding modifier
	name, as specified by the \opt{-m} option.
	In addition, the routine recursively deletes any previous files
	in the subdirectory using the file tree walk function \cmd{nftw()} from
	the standard C library.%
	\footnote{
	This function supersedes \cmd{ftw()}, which
	is deprecated according to POSIX.1-2008. Both of these functions are
	not included in the standard Wind0z API, and therefore no cleanup
	takes place on this platform. So tuff luck.
	}

	Each child photon map is stored on disk in a subdirectory
	\lit{<pmapfile>.rc/} derived from the parent photon map's name,
	\lit{<pmapfile>}. The photons themselves are stored out-of-core in a
	file \lit{<pmapfile>.rc/<mod>.leaf}, while the compressed wavelet
	coefficients are stored in a separate file \lit{<pmapfile>.rc/<mod>.wvt}
	(see also figure \ref{fig:overview}).

	In addition, all binning parameters
	relevant to \rcontrib, i.e all instances of the options \opt{-bn},
	\opt{-e}, \opt{\mbox{-m}}, \opt{-M} and \opt{-p} passed to \mkpmap,
	are collected from the command line and saved to an option file
	\lit{<pmapfile>.opt} in the parent directory. This also includes options
	to specify a Shirley-Chiu mapping via the \cmd{disk2square.cal} function
	file, which is functionally equivalent to the internal contribution
	photon binning routine, \cmd{pmapcontrib:ray2bin()}. For
	definitions of the relevant option strings, see macros
	\cmd{mkpmap:PMAP\_CONTRIB\_RCOPT\_$\mathbf{*}$}.
	Instances of these option strings are appended to a cumulative string by
	\mkpmap{} as it parses the command line, which is then passed to the main
	photon distribution routine \cmd{pmcontrib2:distribPhotonContrib}. The
	latter assigns the option string to the generated photon map's
	\cmd{PhotonMap.rcOpts} field for subsequent saving to disk.
	The option file is intended to be passed to \rcontrib{} via the \opt{@}
	option to ensure the binning parameters are consistent with those in the
	photon map (the technical reasons why this isn't handled automatically
	are detailed in section \ref{sec:load}).

	Once the output filenames have been initialised,
	\cmd{buildPreCompContribPmap()} calls the standard photon map building
	routine, \cmd{pmapdata:buildPhotonMap()}. This consolidates the
	photon and contribution heaps generated by the multiple processes
	forked by the \opt{-n} option, normalises the
	photon flux if applicable (skipped if daylight coefficients are enabled
	with the \opt{-V} option, in which
	case the photon flux is already normalised). Since the
	contribution photon map requires out-of-core storage,
	this function calls the specific out-of-core build routine,
	\cmd{pmapooc:OOC\_BuildPhotonMap()}. With a regular photon map, this
	calls \cmd{oocsort:OOC\_Sort()}, which sorts photons out-of-core
	according to the Morton code derived from the photons' 3D positions
	\cite{schregle-techreport-2016}. The photons are then saved to the
	out-of-core octree leaf file \lit{<pmapfile>.rc/<mod>.leaf} before
	the out-of-core octree structure to index the leaves is built by
	calling \cmd{oocbuild:OOC\_Build()}.

	In the case of a contribution photon map, an extended out-of-core
	sorting function \cmd{pmcontribsort:contribPhotonSort()} is called,
	which is derived from \cmd{OOC\_Sort()} and uses the same low-level
	operations to access intermediate files.
	Besides sorting the photons in their consolidated heap file, the
	adapted routine also sorts the corresponding wavelet coefficients in
	their likewise consolidated heap file. While the photons are unordered
	w.r.t their Morton codes within each heap, both heaps are ordered w.r.t
	each other, since they were synchronously flushed to disk by
	\cmd{pmapcontrib:preComputeContrib()} during precomputation.
	Thus, sorting both heaps w.r.t the photon's Morton indices maintains the
	correspondence between photons and their wavelet coefficients.
	Because the number of thresholded coefficients is fixed, the size of each
	set of coefficients is too, which greatly simplifies this step of the
	build process.

	Like the original out-of-core sorting routine, the contribution photon
	sorting routine employs an external mergesort \cite{Knuth:1998:ACP:280635,
	Seyedafsari:2010}. This recursively subdivides the unordered
	photon and contribution heap files into progressively smaller
	subblocks (maintaining these out-of-core in temporary files), until they
	are small enough to be quicksorted in-core (in parallel, if the
	\opt{-n} option is passed to \mkpmap). The sorted blocks are
	then merged into progressively larger out-of-core blocks as the recursion
	unwinds. On exiting the sorting routine, the sorted photons are returned
	in the out-of-core octree leaf file \lit{<pmapfile>.rc/<mod>.leaf},
	while the corresponding wavelet coefficients are returned in the
	wavelet file \lit{<pmapfile>.rc/<mod>.wvt}. At this point, the
	out-of-core octree to index the photons in the leaf file
	is built by \cmd{oocbuild:OOC\_Build()}, as with any out-of-core
	photon map.

	Once built, precomputed contribution photon maps are saved by
	\cmd{pm\-cont\-rib3:save\-Cont\-rib\-Pho\-ton\-Map()}, which is called
	for the (parent) contribution photon map by the standard
	photon saving routine \cmd{pmapio:\-save\-Photon\-Map()}.
	\cmd{saveContribPhotonMap()} saves the binning parameters specified
	in the parent photon map's \cmd{rcOpts} string to the option file
	for later use with \rcontrib. The routine subsequently iterates over the
	(child) photon maps referenced in the parent's \cmd{preCompContribTab}
	using the standard
	lookup tables iterator routine \cmd{look\-up:lu\_do\-all()},
	which calls the saving routine \cmd{pmcontrib3:savePreCompContrib()}
	for each per-modifier child photon map in the LUT. This latter routine
	calls, in turn, the standard photon map saving routine
	\cmd{pmapio:\-save\-Photon\-Map()} again, this time to save the actual
	per-modifier photons.%
	\footnote{
	Hey, are we confused yet? And in case you're wondering,
	\cmd{savePhotonMap()} will \emph{not} wind up in a recursive loop, as
	each child photon map's \cmd{preCompContribTab} field will be NULL,
	since, in the na\"ively ideal world of the \radiance{} photon map
	at least, kids don't have kids!
	}
	To this end, \cmd{save\-Photon\-Map()} was modified to generate
	contribution-specific info in the file header (number of coefficients,
	compression rate, etc), and
	saves the photon map itself. Saving the photon map entails encoding the
	indexing structure of the out-of-core octree to reference the photons
	in its leaves (which physically reside on disk in the leaf file).

	Once all child contribution photon maps and their parent are saved,
	the precomputation is concluded and \mkpmap{} cleans up and terminates.
	Woohoo!


	% ---------------------------------------------------------------------------


	\subsection{Logarithmic vs. Linear Encoding of Contributions}
	\label{sec:logEncoding}

	Contributions can optionally be encoded logarithmically during the
	wavelet transform via the \cmd{PMAP\_\-CON\-TRIB\_\-LOG} compile-time
	option (see \cmd{pmapcontrib.h}). If defined,
	\cmd{pmap\-contrib:\-en\-code\-Con\-tribs()}
	applies a natural logarithm to every RGB contribution tuple prior
	to the the wavelet transform.

	The advantage of this encoding is that it reduces the range of the
	input, and consquently that of the resulting wavelet coefficients, which
	improves the precision of the subsequent mRGBE encoding.
	It also elegantly handles potential negative values resulting from
	compression artefacts during the inverse wavelet transform, since the
	contributions will always be positive after inverting the log
	encoding through exponentiation.

	The downside of this logarithmic encoding is that it complicates the
	boundary extension during the wavelet transform, as most extension modes
	assume linear input data, particularly the gradient extension modes.
	Consequently, logarithmic encoding should only be used for a "safe"
	extension mode, such as \cmd{WAVELET\_EXTEND\_CONST}. A further downside
	is the increased sensitivity of the encoding to compression artefacts,
	as well as jitter due the limited precision of the mRGBE encoding.


	% ---------------------------------------------------------------------------


	\subsection{Achtung, Baby: Sparsely Populated Bins}
	\label{sec:emptyBins}

	A potential anomaly with contribution photon mapping is that some bins
	may not be populated during precomputation. This can happen in situations
	with weak illuminance, and is difficult to predict. If the ratio of
	populated (i.e. nonzero) bins is too low, bias may result from the
	sparsely populated bins (see figure \ref{fig:cpmapTest-emptyBinsHDR}).
	Note that this issue is solely dependent on the number of bins and
	photons, not the compression ratio.

	\begin{figure}[htb]
	\centering
	\includegraphics[width=0.7\linewidth]{cpmapTest-emptyBins}
	\parbox{0.7\linewidth}{%
	\caption{%
	\label{fig:cpmapTest-emptyBinsHDR}
	Bias caused by sparsely binned contributions. With 64 bins
	(right, shown for bin 63), an average of 90\% of bins contain
	nonzero contributions.
	With 256 bins (left, shown for bin 255), this ratio drops to
	under 50\%, leading to obvious bias, notably on the left wall
	and ceiling. See figure section \ref{sec:cpmapTestHDR} for a
	description of the scene and a composite rendering of all bins.
	}
	}
	\end{figure}

	The average fraction of empty bins
	can be obtained by compiling with the
	\cmd{PMAP\_\-CONT\-RIB\_\-BIN\-HIS\-TO}
	macro. Plotting this against the number of bins (see figure
	\ref{fig:cpmapTest-emptyBinsPlot}) reveals a pronounced
	discontinuity at the point where bias sets in due to sparsely populated
	bins. This suggests an ill-posed solution if the average number of
	bins rises above ca. 25\%.

	\begin{figure}[htb]
	\centering
	\includegraphics[width=0.7\linewidth]{cpmapTest-64m-bn64-emptyBins}
	\parbox{0.7\linewidth}{%
	\caption{%
	\label{fig:cpmapTest-emptyBinsPlot}
	Graph of empty bin ratio (averaged over all precomputed photons)
	as a function of the number of bins. This ratio rises sharply
	above 9$\times$9 = 81 bins, indicating the majority of
	precomputed contributions include unpopulated bins, which in
	turn manifests itself as visible bias as shown figure
	\ref{fig:cpmapTest-emptyBinsHDR}.
	}
	}
	\end{figure}

	As a consequence, the contribution photon
	density estimate routine \cmd{getPhotonContrib()} issues a warning if
	fewer than 50\% of bins are populated (doubling the empirical threshold
	for good measure). The frequency of these warnings
	indicates the necessity to increase the overall number of photons used
	for precomputation, as well as the photon lookup bandwidth. However,
	there is no guarantee this will remedy the situation, as the actual
	luminance distribution may simply not cover the entire incident
	hemisphere.



	% ---------------------------------------------------------------------------



	\clearpage
	\section{Unit Tests}

	The main constitutent modules of the precomputed contribution photon map
	contain optional unit tests which can be built at compile time. These
	consist of a \cmd{main()} function which is enabled by compiler macros
	to produce a standalone binary for individual testing of each module's
	basic functionality.



	\subsection{Contribution Binning Unit Test}
	\label{sec:pmapcontrib-test}

	The \cmd{pmapcontrib} module contains a unit test which can be built
	with the command
	\begin{center}
	\begin{minipage}{0.7\linewidth}
	\begin{lstlisting}
	rmake pmapcontrib-test
	\end{lstlisting}
	\end{minipage}
	\end{center}
	where the compilation target \cmd{pmapcontrib-test} enables the macro
	\cmd{PMAP\_\-CONT\-RIB\_\-TEST}. The test verifies
	the Shirley-Chiu binning function and its orientation variables by calling
	\cmd{ray2bin()} for a set of sample rays in the incident hemisphere, and
	outputs the corresponding linearised bin numbers.
	The unit test is invoked as follows:

	\begin{center}
	\begin{minipage}{0.7\linewidth}
	\begin{lstlisting}
	pmapcontrib-test <scdim> <nsamp> [<var>=<value>; ..]
	\end{lstlisting}
	\end{minipage}
	\end{center}
	where \lit{scdim} is the dimension of the Shirley-Chiu square, and
	\lit{nsamp} is the number of sample rays. The sample rays are distributed
	over the hemisphere by stratifying in $\theta$ and $\phi$. The orientation
	of the Shirley-Chiu mapping can be modified by appending optional
	variable assignments to the command line (see section \ref{sec:binning}
	for the list of relevant variables). Figure \ref{fig:pmapcontrib-test}
	shows a sample output of this unit test.

	\begin{figure}[htb]
	\centering
	\includegraphics[width=0.8\linewidth]{pmapcontrib-test}
	\parbox{0.9\linewidth}{%
	\caption{%
	\label{fig:pmapcontrib-test}
	Output of contribution binning unit test
	for an 8$\times$8 Shirley-Chiu mapping.
	}
	}
	\end{figure}



	\subsection{Wavelet Unit Test}
	\label{sec:wavelet3-test}

	The \cmd{wavelet3} module contains a unit test of the 2D padded wavelet
	transform which can be built with the command
	\begin{center}
	\begin{minipage}{0.7\linewidth}
	\begin{lstlisting}
	rmake wavelet3-test
	\end{lstlisting}
	\end{minipage}
	\end{center}
	where the compilation target \cmd{wavelet3-test} enables the macro
	\cmd{WAVE\-LET\_\-TEST\_\-2D\-PAD}.\footnote{
	The \cmd{wavelet} and \cmd{wavelet2} modules also contain unit tests
	of the 1D and 2D unpadded wavelet transforms, which are enabled at
	compile time with the \cmd{WAVE\-LET\_\-TEST\_\-1D} and
	\cmd{WAVE\-LET\_\-TEST\_\-2D} macros, respectively. These are
	functionally similar to module \cmd{wavelet3}'s unit test.
	}
	The unit test is invoked as follows:
	\begin{center}
	\begin{minipage}{0.7\linewidth}
	\begin{lstlisting}
	wavelet3-test <scdim> [threshold] [dataFile]
	\end{lstlisting}
	\end{minipage}
	\end{center}
	where \lit{scdim} is the dimension of the Shirley-Chiu square,
	\lit{threshold} is an optional (hard) threshold which sets all wavelet
	coefficients below this value to zero, and \lit{dataFile} is an optional
	ASCII file containing the values for an input matrix, one matrix row per
	line, which must contain at least \lit{scdim} lines and as many values
	per line (the excess being ignored). This is particularly useful to import
	and wavelet transform actual contributions output by \rcontrib{} for
	the same binning configuration.

	The test allocates and initialises a 2D input matrix of the specified size
	\lit{scdim} $\times$ \lit{scdim}. The matrix is either initialised from
	the optional \lit{dataFile} (if specified), or with generated values as
	determined by the \cmd{WAVELET\_TEST\_INIT} macro, which supports the
	following settings:
	\begin{enumerate}
	\item Random data, with independent colour channels:
	$y_{i,j} = [\xi_r,\; \xi_g,\; \xi_b]$.
	\item Random data, with correlated colour channels:
	$y_{i,j} = [\xi_r,\; (0.1+0.9\xi_g)\:\xi_r,\; (0.1+0.9\xi_b)\:\xi_r]$,
	i.e. RGB differ by less than a factor of 10.
	\item Random data, identical for all colour channels:
	$y_{i,j} = [\xi_r,\; \xi_r,\; \xi_r]$.
	\item Monotonically increasing along 1st axis:
	$y_{i,j} = [i + 1, \cdots ,\cdots]$, where an offset of 1 avoids
	taking the logarithm of zero if log encoding of coefficients is
	enabled (see section \ref{sec:logEncoding}).
	\item Monotonically increasing along both axes:
	$y_{i,j} = [ij+1, \cdots, \cdots]$, with an offset of 1 again to
	avoid taking the log of zero.
	\item Monotonically increasing by linear index:
	$y_{i,j} = [i \cdot scdim + j + 1, \cdots, \cdots]$,
	i.e. serialised matrix rows.
	\item Symmetric ``bump'' function:
	$y_{i,j} = \left[
	\left(1.1 - \frac{\lvert j - scdim/2 + 0.5\rvert}
	{scdim/2 - 0.5}
	\right)
	\left(1.1 - \frac{\lvert i - scdim/2 + 0.5\rvert}
	{scdim/2 - 0.5}
	\right),
	\cdots,
	\cdots
	\right]$, with an offset of 0.1 to avoid taking the log of zero.
	\end{enumerate}
	Note that $\xi_i \in [0,1]$ denotes independent random variables per
	colour channel, and ellipses indicate repeated values for the remaining
	colour channels. These initialisation options are useful to compare the
	reconstructed matrix against a known (simple) reference, as opposed to
	a more complex distribution from the \lit{dataFile}.
	Regardless of the source,
	the input array is optionally log-encoded if \cmd{WAVELET\_TEST\_LOG} is
	enabled.

	\begin{figure}[p]
	\centering
	\includegraphics[width=\linewidth]{wavelet3-test}
	\parbox{\linewidth}{%
	\caption{%
	\label{fig:wavelet3-test}
	Output of the wavelet transform unit
	test for a 5$\times$5 input matrix with thresholding (the
	mRGBE output is omitted for the sake of brevity).
	Space for padding coefficients in the matrix is indicated by
	dots, while thresholded coefficients are indicated by
	dots surrounded by square brackets.
	}
	}
	\end{figure}

	The unit test then allocates an output matrix and its transpose according
	to the padded size returned by \cmd{padWaveletXform2()} with NULL input
	arrays. It then performs a full 2D padded wavelet transform using this
	function. If \lit{threshold} was specified on the command line,
	the resulting wavelet detail coefficients with absolute value%
	\footnote{
	The scalar absolute value is obtained as dot product from
	the RGB tuples.
	}
	below \lit{threshold} are
	set to zero; similarly to \cmd{pmapcontrib:thresh\-old\-Cont\-ribs()},
	the approximation coefficients in the upper left of the output matrix are
	not thresholded.
	The test outputs the wavelet coefficient matrix to the console,
	marking the thresholded coefficients as bracketed dots (see figure
	\ref{fig:wavelet3-test}).

	If \cmd{WAVELET\_TEST\_mRGBE} is defined, the unit test also encodes the
	(thresholded) wavelet coefficients to mRGBE, placing these in
	an additional output matrix, which is also output below the original
	floating point coefficients for comparison. Again, the approximation
	coefficients in the upper left of the matrix are preserved.

	Finally, the test inverts the wavelet transform using the (thresholded)
	wavelet coefficients by calling \cmd{padWaveletInvXform2()}, and outputs
	the reconstructed data along with the root mean square error (RMSE)
	compared to the original input. This process is optionally repeated for
	the mRGBE-encoded coefficients.

	As a convenience for subsequent analysis, the (thresholded) wavelet
	coefficients can also be output to a file defined by the macro
	\cmd{WAVELET\_TEST\_COEFFOUT}, which defaults to
	\cmd{wavelet3-test-coeff.dat}.
	Similarly, the mRGBE-encoded coefficients can be output to
	\cmd{WAVELET\_\-TEST\_\-COEFF\-OUT\_\-mRGBE}, which defaults to
	\cmd{wavelet3-test-coeff-mrgbe.dat}. Analogously, the reconstructed
	data can be output to \cmd{WAVELET\_TEST\_OUT} (defaulting to
	\cmd{wavelet3-test-xform.dat}), while the reconstruction from
	mRGBE-encoded coefficients can be output to
	\cmd{WAVELET\_TEST\_OUT\_mRGBE} (defaulting to
	\cmd{wavelet3-test-xform-mrgbe.dat}).
	Undefining any of these macros (e.g. by commenting out their definitions
	in \cmd{wavelet3.c}) disables the corresponding file output.

	The wavelet transform unit test is crucial to evaluate deviations
	(artefacts) incurred by thresholding and the limited precision of the mRGBE
	encoding (particularly with different bitfield configurations of the
	latter). Consequently, the overwhelming majority of testing was conducted
	with this tool in order to troubleshoot and optimise the wavelet transform,
	particularly with respect to the various boundary extension modes.



	\clearpage
	\subsection{mRGBE Unit Test}
	\label{sec:mrgbe-test}

	The \cmd{mrgbe} module contains a unit test of the mRGBE encoding
	which can be built with the command
	\begin{center}
	\begin{minipage}{0.7\linewidth}
	\begin{lstlisting}
	rmake mrgbe-test
	\end{lstlisting}
	\end{minipage}
	\end{center}
	where the compilation target \cmd{mrgbe-test} enables the macro
	\cmd{mRGBE\_TEST}. The unit test is invoked as follows:

	\begin{center}
	\begin{minipage}{0.7\linewidth}
	\begin{lstlisting}
	mrgbe-test <numTests>
	\end{lstlisting}
	\end{minipage}
	\end{center}

	The test initially triggers possible exceptions with invalid or marginal
	encoding input (notably zero, which is handled separately in the encoding
	routine), under/overflow (relative to the specified encoding range),
	and empty encoding range.

	It then proceeds to encode and decode \lit{numTests} random 3-tuples in
	a predetermined range (defined by macros \cmd{RGBMIN} and \cmd{RGBMAX}),
	and dumping the RMSE between encoded and decoded values, calculated as
	dot product of their component differences.
	Figure \ref{fig:mrgbe-test} shows a sample run.

	\begin{figure}[htb]
	\centering
	\includegraphics[width=\linewidth]{mrgbe-test}
	\parbox{\linewidth}{%
	\caption{%
	\label{fig:mrgbe-test}
	Sample output of mRGBE unit test for a 6:6:6:5:9 mRGBE bit
	configuration. With 6 bits per mantissa, the RMSE averages
	around 3\%.
	}
	}
	\end{figure}


	% ---------------------------------------------------------------------------


	\clearpage
	\section{Precomputed Contribution Evaluation with \rcontrib}

	The evaluation of precomputed contributions from the photon map in
	\rcontrib{} entails locating the nearest precomputed photon, paging
	its compressed contributions from disk on demand, decoding the
	mRGBE-encoded wavelet coefficients, and inverting the wavelet transform
	to reconstruct the contributions.
	The reconstructed contributions are then transferred to \rcontrib's
	contribution lookup table, and can be optionally cached to accelerate
	neighbouring evaluations that require the same photon again.



	\subsection{Loading Precomputed Contribution Photon Maps}
	\label{sec:load}

	Similarly to saving the contribution photon maps in \mkpmap{}, \rcontrib{}
	calls the main photon map loading routine \cmd{pmapio:loadPhotonMap()} with
	the parent photon map, which passes control to a dedicated routine
	\cmd{pmcontrib4:loadContribPhotonMap()}. The latter again calls
	\cmd{loadPhotonMap()} to load each per-modifier child photon map. Note
	that the photons and their contributions remain out-of-core, and reside on
	disk. The out-of-core octree data structure, however, remains in-core to
	facilitate navigating the photon map during lookups.

	The binning parameters used by \mkpmap{} to precompute the contributions
	are dumped to an option file \lit{<pmapfile>.opt} in the photon map's
	parent directory when it is built and saved. The contents of this
	file are passed to \rcontrib{} via the \opt{@} option to ensure the
	binning parameters are consistent with those in the photon map. This
	dimensions \rcontrib's contribution array to accommodate the precomputed
	bins via the \opt{-bn} option, and also specifies the binning function
	file \cmd{disk2square.cal} for a Shirley-Chiu mapping
	along with its orientation parameters, via the options \opt{-f}, \opt{-b},
	\opt{-e} and \opt{-p}. Although the contributions from the photons are
	already prebinned, the primary ray traced by \rcontrib{} which triggers
	the photon lookup (possibly via
	one ambient or several specular interactions) will still be
	binned, and this binning must of course be consistent with that of the
	precomputed contributions.

	While automatic binning parameters would clearly be more convenient, the
	option file is a necessity dictated by \rcontrib's architecture.
	This is because
	by the time the photon map is loaded, \rcontrib's contribution arrays
	have already been allocated and initialised, along with its output
	streams if the \cmd{-o} option is specified, which can be assigned to
	individual modifiers and even bins. Reallocating these resources once the
	photon map is loaded, and the binning parameters have been extracted,
	would be too complex and error prone. While this may change in the future,
	it was considered safer to use the already existing concept of option
	files to pass the correct parameters to \rcontrib{} at startup,
	which of course then becomes the user's responsibility.%
	\footnote{
	This doesn't seem unreasonable as it conforms to \radiance's
	``it's the user's fault'' philosophy. ;\^{})
	}



	\subsection{Locating and Paging Precomputed Photons}

	As in \rtrace{}, the evaluation of contributions is triggered via the
	ambient calculation routine \cmd{ambient:multambient()}, which in turn
	calls the photon mapping ``hook'' \cmd{pmapamb:ambPmap()} (resp.
	\cmd{pmap\-amb:\-amb\-Pmap\-Caustic()} for caustic photons).%
	\footnote{
	Caustic photons carry a flag indicating they have been specularly
	scattered, and are exclusively accepted by \cmd{pmapfilt:filterPhoton()}
	during lookups initiated by \cmd{ambPmapCaustic()}. Note, however,
	that the dedicated caustic photon map generated with \mkpmap's
	\opt{-apc} option doesn't support contributions.
	}
	These routines call the appropriate photon lookup routine depending on
	the photon type (defined in the callback \cmd{PhotonMap.lookup}), which
	in the case of contribution photons is
	\cmd{pmcontrib4:getPreCompPhotonContrib()}.

	\cmd{getPreCompPhotonContrib()} calls
	\cmd{raytrace:raycontrib()} to obtain the cumulative contribution of the
	incident ray; typically this is the last ray in a path starting with a
	primary ray followed by zero or more specular scattering events, and
	zero or one (if \opt{-ab} is positive) diffuse scattering events.
	This is then passed to \cmd{getPreCompContribByMod()}, which is called
	for each per-modifier child photon map.

	\cmd{getPreCompContribByMod()} locates the single closest precomputed
	contribution photon in the corresponding child photon map for the
	current modifier by calling the standard lookup routine
	\cmd{pmapdata:find1Photon()}. The latter, in turn, calls the
	out-of-core lookup routine \cmd{pmapooc:OOC\_\-Find\-1\-Pho\-ton()}, which
	pages photons from the out-of-core octree leaf file and caches them
	via an instance of the out-of-core photon cache
	\cite{schregle-techreport-2016}.

	The contributions for the found photon are decoded by calling
	\cmd{pmcontrib4:get\-Pre\-Comp\-Con\-trib\-By\-Pho\-ton()} (see below)
	and passing the photon along with its index.
	\cmd{getPreCompContribByMod()} subsequently scales the decoded
	contributions by the incident ray's cumulative contribution which was
	passed by the caller, \cmd{getPreCompPhotonContrib()}, after which
	the modifier's contributions are returned.



	\subsection{mRGBE Wavelet Coefficient Decoding}

	\cmd{getPreCompContribByPhoton()} loads the 32-bit RGBE-encoded
	detail coefficient range and approximation coefficients from the
	wavelet coefficient file for a given photon, using the photon's
	numeric index as file offset.
	The approximation coefficients are placed in the upper left of the
	(initially zeroed) wavelet coefficient matrix. The sign of each
	approximation coefficient's colour channel is set
	according to bit 0 of the corresponding 8-bit mantissa in the 32-bit RGBE
	encoding, via a convenient macro \cmd{PMAP\_CONTRIB\_GET\_RGBE32\_SGN()}.

	\cmd{getPreCompContribByPhoton()} then loads the mRGBE-encoded
	detail coefficients for the given photon from the wavelet coefficient
	file (where they are stored immediately after the mRGBE range and
	approximation coefficients).
	Miscellaneous bookkeeping such as lazy initialisation of the
	primary and transposed wavelet matrices also takes place here.
	The detail coefficients are loaded into a lazily allocated
	decoding buffer embedded in the current child photon map's field of
	type struct \cmd{PreComputedContrib}.

	Once the detail coefficients and their range have been loaded, they are
	decoded from the decoding buffer by calling \cmd{decodeContribs()},
	which also performs the inverse wavelet transform.
	\cmd{decodeContribs()} initialises the mRGBE normalisation from the
	retrieved detail coefficient range by calling \cmd{mrgbe:mRGBEinit()},
	and subsequently passes this to the mRGBE decoding routine
	\cmd{mrgbe:\-mRGBE\-de\-code()} for each detail coefficient. The mRGBE
	decoding routine returns each decoded detail coefficient as a floating
	point RGB 3-tuple along with its corresponding incremental linear
	coefficient index.

	Given a 32-bit mRGBE-encoded coefficient consisting of the per-colour
	channel mantissae $m_i$ and the shared base-2 exponent $x$,
	\cmd{mRGBEdecode()} returns the corresponding floating point
	RGB wavelet coefficient $d = [d_r,\: d_g,\: d_b]$:
	\begin{eqnarray}
	\label{eq:invmrgbe}
	\overline{d} &=& \frac{\mathrm{ldexp}\left(1, -\lvert x \rvert\right)}
	{m_{max}} \\
	d_i &=& \mathrm{sgn_m}\left(m_i\right) \: \left(
	\lvert m_i - m_{max} + \epsilon \rvert \
	\frac{\overline{d}}{d_{norm,i}}\: +\: d_{min,i}
	\right) \\
	\mathrm{sgn_m}\left(m_i\right) &=& \left\{
	\begin{array}{l r}
	-1 & \quad \textrm{if } m_i < m_{max} \\
	1 & \quad \textrm{if } m_i \geq m_{max} \\
	\end{array}
	\right. \\
	m_{max} &=& 2^\textrm{MANTBITS-1},
	\end{eqnarray}
	where overbars denote normalised values, and $d_{norm,i}$ and $d_{min,i}$
	are the mRGBE normalisation factor and range minimum for color channel $i$,
	respectively. Each coefficient $d_i$ is denormalised by $d_{norm,i}$ and
	offset by $d_{min,i}$, inverting the encoding in equations
	\ref{eq:mrgbe}ff. The function $\mathrm{sgn_m}\left(m_i\right)$ is
	a macro that returns the sign of the mRGBE mantissa $m_i$ by comparing its
	value with the signed mantissa offset, $m_{max}$. This offset is
	subtracted from each $m_i$ to obtain $d_i$.

	The $\mathrm{ldexp}\left(m, x\right)$ function is part of the
	standard C library and computes integer powers of 2 scaled by a floating
	point mantissa, i.e. $\overline{d} = m 2^x$, where $m = 1$ in this case
	to obtain a normalised power of 2. Note that the exponent $x$ is stored
	as absolute value in the mRGBE exponent field, and is therefore negated
	so that $\overline{d} \leq 1$.

	Due to the limited precision of the mRGBE
	encoding, the decoded value will deviate from the original. A jitter
	value $\epsilon \in [0,0.5]$ can be added to break up quantisation
	artefacts (aliasing). In practice, values in excess of 0.5 increase the
	RMSE reported by the unit test (see section \ref{sec:mrgbe-test}).

	The decoded detail coefficients are transferred to the wavelet coefficient
	matrix at their corresponding indices, thus incrementally populating
	the matrix. The matrix index is
	obtained by summing the incremental values embedded in the mRGBE data
	field of the consecutively stored coefficients. Since this index $k$ is
	1-dimensional, it is deserialised to 2D coordinates
	$i, j$ by invering the mapping in equation \ref{eq:linearIdx}:
	\begin{eqnarray}
	\label{eq:invLinearIdx}
	i &=& \lfloor k / m \rfloor \\
	j &=& k \textrm{ mod } m,
	\end{eqnarray}
	where $m$ is the array dimension. Once the matrix has been populated with
	all decoded coefficients, the unpopulated coefficients remain zero (as
	initialised); these represent those that were thresholded during
	compression.



	\subsection{2D Inverse Wavelet Transform}

	Once the wavelet coefficient matrix has been decoded and populated,
	\cmd{decodeContribs()} calls
	\cmd{wave\-let2:\-pad\-Wave\-let\-Inv\-Xform2()} to
	perform a full inverse Daubechies DB2 wavelet transform to recover the
	original contributions -- or more specifically, an approximation thereof
	subject to compression artefacts.
	The latter function therefore acts as a wrapper
	for \cmd{pad\-D2\-Inv\-Step2()}, which performs one pass of the inverse
	wavelet transform along a fixed (horizontal) axis.
	Like its forward transforming counterpart \cmd{pad\-D2\-Step2()}
	described in section \ref{sec:wavelet3.c}, this function returns an
	output matrix in which the inverted coefficients have been transposed so
	it can be swapped with the input matrix to perform a second pass along the
	alternate axis. Consequently, each pair of invocations of the inverse
	transform step constitutes a complete horizontal/vertical inverse
	transform pass at a given resolution.

	Given approximation and detail coefficients, $s^k_{i,j}$ amd $d^k_{i,j}$
	at iteration $k$, the 4-tap Daubechies wavelet transform in equations
	\ref{eq:d2FwdXform-s} and \ref{eq:d2FwdXform-d} is inverted as follows
	to reconstruct the adjacent approximations $s^{k-1}_{i,2j}$ and
	$s^{k-1}_{i,2j+1}$ at doubled resolution for the next iteration $k-1$:
	\begin{eqnarray}
	\label{eq:d2InvXform}
	s^{k-1}_{i,2j} &=& h_2\ s^k_{i,j} \ +\ g_2\ d^k_{i,j} \ +\
	h_0\ s^k_{i,j+1} \ +\ g_0\ d^k_{i,j+1}\\
	s^{k-1}_{i,2j+1} &=& h_3\ s^k_{i,j} \ +\ g_3\ d^k_{i,j} \ +\
	h_1\ s^k_{i,j+1} \ +\ g_1\ d^k_{i,j+1},
	\end{eqnarray}
	where $h_j$ and $g_j$ are the constants from equations
	\ref{eq:d2-hCoeffs} and \ref{eq:d2-gCoeffs}.
	At the final iteration where $k=0$, the approximation coefficients
	correspond to the original signal, so that $s^0_{i,j} \approx y_{i,j}$.
	Note that, for the sake of clarity, equation \ref{eq:d2InvXform}
	omits the on-the-fly transposition shown in figure \ref{fig:waveletStep}
	for the forward transform; in actuality, the implementation swaps the
	indices $i, j$ on the RHS.

	Upon completing the inverse wavelet transform,
	\cmd{decodeContribs()} exits and returns the reconstructed binned
	contributions, which are subsequently transferred row-by-row via
	\cmd{memcpy()} to \cmd{rcontrib}'s linear binned contribution array
	for the current modifier. Having accumulated the contributions from all
	modifiers, \cmd{getPreCompPhotonContrib()} finally returns and
	hands control back to \rcontrib.



	\subsection{Decoded Contribution Caching}

	The out-of-core photon cache used by the lookup routine
	\cmd{pmapooc:OOC\_\-Find\-1\-Pho\-ton()} anticipates the need to
	repeatedly evaluate photons for spatially adjacent lookups,
	thus reducing I/O latency incurred by paging from disk.
	The cache is an instance of struct
	\cmd{ooccache:OOC\_Cache} and is organised into a variable number of
	entries, where each entry contains a page (a block loaded on demand
	from disk) consisting of a fixed number of neighbouring photons.
	A standard least-recently-used (LRU) page replacement strategy
	governs the preemption of obsolete pages when the cache fills
	\cite{Tanenbaum:2014:MOS:2655363}.

	To hide the latency involved in decoding the photon's contributions, an
	additional, per-modifier precomputed contribution cache (another
	instance of struct \cmd{OOC\_Cache}) is interrogated by
	\cmd{getPreCompContribByMod()} to determine whether the contributions
	have already been paged and decoded.
	A dedicated routine \cmd{pmcontribcache:getContribCache()}
	accesses the contribution cache using the found photon's index as
	caching key, and returns a flag indicating the cached status of the
	photon's contributions. This in turn calls the out-of-core caching routine
	\cmd{ooccache:OOC\_CacheData()}, which returns a pointer to an array of
	binned contributions.
	If \cmd{getContribCache()} signals that the decoded contributions are
	already cached, they are returned in the array and can be directly
	accumulated in \cmd{rcontrib}'s contribution bins, obviating the need
	to page and decode these (again).
	If the contributions have not yet been cached, a new entry is allocated
	with an empty array, which must be populated by the caller.

	The contribution cache uses the same LRU page replacement strategy as
	the out-of-core photon cache, except that a page now contains the decoded
	contributions associated with a single photon; consequently, the page
	size of this \cmd{OOC\_Cache} instance is hardwired to 1 (as specified in
	number of photons).
	This cache instance is lazily initialised by calling a dedicated
	routine \cmd{pmcontribcache:initContribCache()}.

	If no cached contributions were found for the current photon, they are
	paged from disk and decoded as described in the previous sections,
	and transferred to the new cache entry.
	The contributions must obviously then be
	weighted by the incident ray's cumulative contribution \emph{after}
	the cache entry has been initialised, since the incident ray is variable.



	% ---------------------------------------------------------------------------


	\clearpage
	\section{Compilation}

	The precomputed contributon photon map is based on the out-of-core
	photon map data structure, which requires enabling \cmd{PMAP\_OOC} at
	compile time:

	\begin{center}
	\begin{minipage}{0.5\linewidth}
	\begin{lstlisting}
	rmake OPT+=-DPMAP_OOC
	\end{lstlisting}
	\end{minipage}
	\end{center}
	This enables \cmd{PMAP\_CONTRIB} in \cmd{pmapcontrib.h}, which in turn
	enables the contribution photon mapping code and the corresponding
	options in \mkpmap. Note that this functionality is absent in the
	in-core kd-tree data structure. Since the latter is still the default
	as of this writing, out-of-core (and therefore contribution) photon
	mapping must be explicitly enabled.

	In addition, verbose debugging output and sanity checks in the
	contribution photon map code can be enabled as follows:

	\begin{center}
	\begin{minipage}{0.5\linewidth}
	\begin{lstlisting}
	rmake OPT+="-DPMAP_OOC -DPMAP_CONTRIB_DBG"
	\end{lstlisting}
	\end{minipage}
	\end{center}
	These sanity checks are expensive and should be disabled
	in production code. Among the tests performed are:
	\begin{itemize}
	\item Checking for out-of-order wavelet coefficients after calling
	\cmd{coeffPartition()} in \cmd{thresholdContribs()}.
	\item Encoding of generated contributions in \cmd{encodeContribs()},
	including a ``bump'' function similar to the initialisation
	options available in the wavelet unit test (see section
	\ref{sec:wavelet3-test}).
	\item Checking of mRGBE encoded wavelet coefficients in
	\cmd{encodeContribs()} by comparing the
	decoded coefficient with the original, and triggering a
	consistency error if their difference is outside tolerance.
	This error is also triggered if the encoded and decoded
	incremental coefficient index differs. Furthermore, a
	coefficient sign reversal (in any colour channel) triggers
	this error too.
	\item A 10-second delay in \cmd{preComputeContrib()} after forking
	child processes in multiprocessing mode (when passing the
	\opt{-n} option to \mkpmap), giving the user time
	to attach a debugger to a particular subprocess.
	The PID of all forked subprocesses is dumped to the console
	for convenience.
	\item Checking for duplicate photon selection in
	\cmd{preComputeContrib()}; no photon should be selected twice
	for precomputation, unless the random photon index selection
	is buggy.
	\item Tally and output of average mRGBE deviations, similar to
	the mRGBE unit test (see section \ref{sec:mrgbe-test});
	\item Checking for invalid mRGBE-encoded wavelet coefficient indices
	in \cmd{decodeContribs()}.
	\end{itemize}

	Unit tests for the contribution binning, wavelet transform and mRGBE
	encoding can also be enabled at compile time. See sections
	\ref{sec:pmapcontrib-test}, \ref{sec:wavelet3-test} and
	\ref{sec:mrgbe-test}, respectively, for details.



	% ---------------------------------------------------------------------------


	%\clearpage
	\chapter{Results}

	\section{Wavelet Transform Colourmaps}
	\label{sec:xformCmaps}

	\subsection{CIE Sunny Sky}

	To illustrate the effects of wavelet compression, a simple CIE sunny
	sky model (without solar source) was generated with \cmd{gensky} to
	serve as reference. The contributions from this sky source were
	collected with \rcClassic{} in 32 $\times$ 32 = 1024 bins using a
	Shirley-Chiu mapping, and reshaped into a 2D matrix. A fisheye rendering
	of the sky and a colourmap of the binned contributions are shown for
	reference in figure \ref{fig:skycontrib1024-cie-orig}. The corresponding
	script to bin the contributions is shown in listing
	\ref{lst:skycontrib-1024.sh}.

	\lstinputlisting[float=h, inputpath={listings},
	label=lst:skycontrib-1024.sh,
	caption = {Script to bin sky contributions with \rcClassic{} using
	Shirley-Chiu mapping. Note the use of the \opt{-c} option to
	sample multiple rays per bin and reduce variance.
	}
	]
	{skycontrib-1024.sh}

	The 2D contributions where loaded as datafile into the \cmd{wavelet3-test}
	unit test (see section \ref{sec:wavelet3-test}) and transformed using the
	padded 2D wavelet transform.
	Figure \ref{fig:skycontrib1024-cie-coeffs} shows colourmaps of
	the resulting wavelet coefficient matrices after thresholding, alongside
	the reconstructed contributions, for a sequence of compression ratios.
	The colourmaps encode absolute value as colour, with black corresponding
	to zero. Black regions in the wavelet matrices indicate either unused
	space to accommodate padding coefficients, or thresholded detail
	coefficients. The largest detail coefficients lie in the upper left of
	the matrix, towards the coarser resolutions, indicating the signal has
	predominantly low frequency content, as expected from the CIE model.
	As expected, these coefficients are only thresholded at very high
	compression ratios.
	The approximation coefficients in the extreme upper left submatrix
	(shown in yellow) are largest in magnitude and never thresholded.
	In the reconstructed contributions from the inverse wavelet transform,
	artefacts become noticeable beyond 80\% compression ratio.
	The resulting loss of detail is however tolerable until ca. 95\% due
	to the low frequency content of the contributions, at which point
	the circumsolar region bleeds into the boundary.

	\begin{figure}[p]
	\centering
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/ciesky%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-orig-imgplot-crop%
	}
	\subcaption{%
	\label{fig:skycontrib1024-cie-orig}
	Original: fisheye rendering (left),
	binned contributions (right).
	}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-coeff-00-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-xform-00-imgplot-crop%
	}
	\subcaption{Wavelet transform: no compression}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-coeff-50-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-xform-50-imgplot-crop%
	}
	\subcaption{Wavelet transform: 50\% compression}
	\end{subfigure}\\
	\medskip
	\caption{%
	Colourmaps of wavelet coefficient matrix (left) and
	reconstructed contributions (right) from a CIE sky model
	for different compression ratios. Black regions in the
	coefficient matrix indicate unused or thresholded coefficients.
	The original sky distribution and binned contributions are shown
	in \ref{fig:skycontrib1024-cie-orig}.
	}
	\end{figure}

	\begin{figure}[p]\ContinuedFloat
	\centering
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-coeff-60-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-xform-60-imgplot-crop%
	}
	\subcaption{60\% Compression}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-coeff-70-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-xform-70-imgplot-crop%
	}
	\subcaption{70\% Compression}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-coeff-80-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-xform-80-imgplot-crop%
	}
	\subcaption{80\% Compression}
	\end{subfigure}\\
	\medskip
	\caption{(continued).}
	\end{figure}

	\begin{figure}[p]\ContinuedFloat
	\centering
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-coeff-90-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-xform-90-imgplot-crop%
	}
	\subcaption{90\% Compression}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-coeff-95-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-xform-95-imgplot-crop%
	}
	\subcaption{95\% Compression (supernova)}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-coeff-98-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-cie/skycontrib-2d-1024-xform-98-imgplot-crop%
	}
	\subcaption{98\% Compression (FUBAR)}
	\end{subfigure}\\
	\medskip
	\caption{(continued).}
	\label{fig:skycontrib1024-cie-coeffs}
	\end{figure}


	% ---------------------------------------------------------------------------

	\subsection{HDR Sky Capture}

	The procedure in the last section was applied to an HDR sky capture
	obtained from a camera with fisheye lens, which is shown for reference
	along with the corresponding binned contributions in figure
	\ref{fig:skycontrib1024-hdr-orig}. Note the capture also includes
	vegetation on the horizon. Figure \ref{fig:skycontrib1024-hdr-coeffs}
	again shows colourmaps of the thresholded wavelet coefficient matrices
	and the resulting reconstructed contributions.

	This example exhibits more frequency content (notably in the
	boundary regions due to vegetation on the horizon), and
	artefacts already become objectionable at 90\% and above, with the
	sky distribution completely breaking up above 95\%.

	\begin{figure}[p]
	\centering
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/hdrsky%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-orig-imgplot-crop%
	}
	\subcaption{%
	\label{fig:skycontrib1024-hdr-orig}
	Original: HDR camera capture (left),
	binned contributions (right).
	}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-coeff-00-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-xform-00-imgplot-crop%
	}
	\subcaption{Wavelet transform: no compression}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-coeff-50-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-xform-50-imgplot-crop%
	}
	\subcaption{Wavelet transform: 50\% compression}
	\end{subfigure}\\
	\medskip
	\caption{%
	Colourmaps of wavelet coefficient matrix (left) and
	reconstructed contributions (right) from an HDR sky capture
	for different compression ratios. Black regions in the
	coefficient matrix indicate unused or thresholded coefficients.
	The original sky distribution and binned contributions are shown
	in \ref{fig:skycontrib1024-hdr-orig}.
	}
	\end{figure}

	\begin{figure}[p]\ContinuedFloat
	\centering
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-coeff-60-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-xform-60-imgplot-crop%
	}
	\subcaption{60\% Compression}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-coeff-70-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-xform-70-imgplot-crop%
	}
	\subcaption{70\% Compression}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-coeff-80-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-xform-80-imgplot-crop%
	}
	\subcaption{80\% Compression}
	\end{subfigure}\\
	\medskip
	\caption{(continued).}
	\end{figure}

	\begin{figure}[p]\ContinuedFloat
	\centering
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-coeff-90-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-xform-90-imgplot-crop%
	}
	\subcaption{90\% Compression (supernova)}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-coeff-95-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-xform-95-imgplot-crop%
	}
	\subcaption{95\% Compression (FUBAR)}
	\end{subfigure}\\
	\medskip
	\begin{subfigure}{\textwidth}
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-coeff-98-imgplot-crop%
	}
	\hfill
	\includegraphics[height=0.28\textheight]{%
	skycontrib-hdr/skycontrib-2d-1024-xform-98-imgplot-crop%
	}
	\subcaption{98\% Compression (black hole?)}
	\end{subfigure}\\
	\medskip
	\caption{(continued).}
	\label{fig:skycontrib1024-hdr-coeffs}
	\end{figure}


	% ---------------------------------------------------------------------------

	\section{Binned Contribution Renderings}
	\label{sec:cpmapTestHDR}

	Figure \ref{fig:cpmapTestHDR-rc} shows a set of per-bin fisheye
	falsecolour renderings generated by \rcClassic{} for a
	simple,
	bilaterally lit scene. The scene contains two opposing fenestrations with
	cyan tint, and 3970 suns uniformly distributed in the sky dome. The
	sensor position is located in the centre of the space, at floor height.
	The number of ambient bounces was set to 4 (\opt{-ab 4}) to obtain
	a reasonable indirect illuminance estimate. Using 256$\times$256 primary
	rays per bin, the running time on an Intel Xeon E5-2660 @ 2.60GHz
	with 20 cores was ca. 1 hour. The shell script%
	\footnote{
	Yeah, it's \cmd{(t)csh}; we're too old-school for \cmd{bash}.
	}
	used to generate the renderings is shown in
	listing~\ref{lst:cpmapTestHDR-rc}.

	\lstinputlisting[float=h, label=lst:cpmapTestHDR-rc, inputpath={listings},
	caption = {Script to generate the per-bin fisheye renderings in
	figure \ref{fig:cpmapTestHDR-rc} with \rcClassic.
	}
	]
	{rc-hdr.sh}


	Figure \ref{fig:cpmapTestHDR-pmap} shows the same scene rendered with
	\rcontrib{} using ca. 64000 precomputed contribution photons, whose
	contributions were wavelet compressed by 80\%. The contributions were
	rendered without ambient bounce (\opt{-ab -1}) and instead evaluated
	directly via photon lookups at each pixel's world coordinates.
	Despite the visible noise due to direct photon visualisation, there is
	overall good agreement with figure \ref{fig:cpmapTestHDR-rc}, except for
	the leftmost column, which corresponds to bin numbers that are even
	multiples of 16. These bins lie on the boundary of the wavelet transform
	domain and therefore subject to compression artefacts, as discussed
	in the next section.
	While \mkpmap{} took ca. 13.5 minutes to precompute the photon map on
	the same CPU, rendering with \rcontrib{} took only 14 seconds.


	\lstinputlisting[float=h, label=lst:cpmapTestHDR-pmap, inputpath={listings},
	caption = {Script to generate the per-bin fisheye renderings in
	figure \ref{fig:cpmapTestHDR-pmap} with \rcontrib{} using
	precomputed contribution photon mapping.
	}
	]
	{cpmapTest-hdr.sh}


	\begin{figure}[h]
	\centering
	\includegraphics[width=\linewidth]{%
	rc-bn64-ab4-compos-false%
	}
	\parbox{\linewidth}{%
	\caption{%
	\label{fig:cpmapTestHDR-rc}
	Falsecolour rendered contributions from \rcClassic{} for
	each of 64 bins in the bilaterally lit test scene containing
	3970 solar positions.
	The sensor is located in to centre of the space on the floor,
	facing the zenith. The glazings on either side of the space
	are tinted cyan.
	This sequence was rendered with up to 4 ambient
	bounces \opt{-ab 4}. On 20 cores,
	the renderings took ca. 1 hour to complete at
	256$\times$256 pixels per bin.
	}
	}
	\end{figure}


	\begin{figure}[h]
	\centering
	\includegraphics[width=\linewidth]{%
	cpmapTest-64m-bn64-2400-comp0.8-reorient-compos-false%
	}
	\parbox{\linewidth}{%
	\caption{%
	\label{fig:cpmapTestHDR-pmap}
	The bilaterally lit scene from figure \ref{fig:cpmapTestHDR-rc}
	rendered with precomputed contribution photons
	after 80\% wavelet compression.
	This sequence was rendered with \opt{-ab -1},
	consequently no ambient bounces are performed. This greatly
	accelerates computation at the expense of some noise: on 20 cores,
	the entire sequence took just 14 seconds to complete at
	256$\times$256 pixels per bin.
	}
	}
	\end{figure}



	\section{Wavelet Compression and Boundary Artefacts}

	Figure \ref{fig:contribCompression} shows a series of 3D plots of
	the contributions in the bilaterally lit scene for different
	wavelet compression ratios. This example uses a first order gradient
	boundary extension and logarithmic contribution encoding.
	While noise in the photon density estimate dominates at lower compression,
	boundary artefacts tend to dominate at higher compression ratios. This
	suggests the transform is very sensitive to thresholded padding
	coefficients, and offers an explanation for the deviations in the
	leftmost column in figure \ref{fig:cpmapTestHDR-pmap}.

	\begin{figure}[p]
	\centering
	\begin{minipage}{0.49\linewidth}
	\centering
	\includegraphics[width=\linewidth]{%
	cpmapTest-64m-bn256-3200-comp0.5%
	}\\
	Compression 0.5
	\end{minipage}
	\hfill
	\begin{minipage}{0.49\linewidth}
	\centering
	\includegraphics[width=\linewidth]{%
	cpmapTest-64m-bn256-3200-comp0.6%
	}\\
	Compression 0.6
	\end{minipage}\\[3mm]
	\begin{minipage}{0.49\linewidth}
	\centering
	\includegraphics[width=\linewidth]{%
	cpmapTest-64m-bn256-3200-comp0.7%
	}\\
	Compression 0.7
	\end{minipage}
	\hfill
	\begin{minipage}{0.49\linewidth}
	\centering
	\includegraphics[width=\linewidth]{%
	cpmapTest-64m-bn256-3200-comp0.8%
	}\\
	Compression 0.8
	\end{minipage}\\[3mm]
	\centering
	\begin{minipage}{0.49\linewidth}
	\centering
	\includegraphics[width=\linewidth]{%
	cpmapTest-64m-bn256-3200-comp0.9%
	}\\
	Compression 0.9
	\end{minipage}\\[3mm]
	\parbox{\linewidth}{%
	\caption{%
	\label{fig:contribCompression}
	3D plots of contributions calculated by \rcClassic{}
	(blue) and photon map (red) in the bilaterally lit scene.
	The contributions are accumulated in 256 bins for a
	sensor located in the centre of the floor.
	The contribution photon map was compressed with ratios
	of 0.5 to 0.9. In this example, compression artefacts
	appear at the boundaries (corresponding to the horizon)
	with 80\% compression and above. This is a known limitation
	of wavelet compression if the input signal exhibits high
	gradients at the boundary.
	}
	}
	\end{figure}


	% ---------------------------------------------------------------------------


	%\clearpage
	\setcounter{secnumdepth}{-2}
	\chapter{Acknowledgements}
	\setcounter{secnumdepth}{2}

	This research was supported by the Swiss National Science Foundation
	as part of the project ``Light Fields for Spatio-Temporal Glare
	Assessment'' (\#179067).

	The author would like to thank his colleagues Dr. Lars Grobe and
	Dr. Stephen Wasilewki for their collaboration, for providing feedback
	during weekly lunchtime meetings, and for testing the code on MacOS.
	The author would also like to express special thanks to the head of
	the former Competence Centre for Envelopes and Solar Energy (CC EASE),
	Prof. Stephen Wittkopf.
	%, who gave our crew a sleek
	%vessel, until it was torpedoed by the powers that be in 2016.

	Finally, the author would like to thank his colleagues at
	LBNL and Ladybug Tools for supporting the release of the new
	photon mapping code through automated testing on supported platforms,
	notably Mingbo Peng and Taoning Wang. And of course, the author's
	biggest thanks go out to Greg Ward for his enduring support and
	service to the \radiance{}
	{\fontfamily{jkpvos}\selectfont\textit{Publick}}
	... uh, community.

	{\fontfamily{jkpvos}\selectfont\Large\vspace{1cm}\noindent
	\textit{\textbf{The crew of the former CC EASE bids= you Farewell
	for now, and thanks= the RADIANCE Publick for their Support.
	}
	}
	}


	% ---------------------------------------------------------------------------


	%\clearpage
	\bibliographystyle{alpha}
	\bibliography{precomp-contrib-pmap-techreport}


	% ---------------------------------------------------------------------------


	%\clearpage
	\appendix
	\chapter{Source Code Definitions}


	\lstinputlisting[float=h, label=lst:codeDefs1, inputpath={listings},
	caption = {Source code definitions in \cmd{pmapcontrib.h}
	relevant to precomputed contribution photons.
	}
	]
	{pmapcontrib.h}


	\lstinputlisting[float=h, label=lst:codeDefs2, inputpath={listings},
	caption = {Source code definitions in \cmd{wavelet2.h} and
	\cmd{mrgbe.h} relevant to precomputed contribution photons.
	}
	]
	{wavelet2-mrgbe.h}


	\lstinputlisting[float=h, label=lst:codeDefs3, inputpath={listings},
	caption = {Source code definitions in \cmd{pmapdata.h} relevant to
	precomputed contribution photons.
	}
	]
	{pmapdata.h}


	% ---------------------------------------------------------------------------


	\chapter{Software Architecture}
	\label{sec:swarch}

	Figure \ref{fig:swarch-mkpmap} provides an overview of the precomputed
	contribution photon map's software architecture in the context of
	photon map generation and precomputation with \cmd{mkmap}.
	Figure \ref{fig:swarch-rcontrib} gives a similar overview
	in the context of the photon map's evaluation with \cmd{rcontrib}.
	The blue blocks group relevant fuctions (red text) within software
	modules built from C source files with the indicated names.
	Red arrows correspond to function calls, which are ordered top-down.

	Note that some routines, notably
	\cmd{savePhotonMap()} and \cmd{loadPhotonMap()} recurse on a parent
	photon map's per-modifier child photon maps, hence the arrows indicate
	a loop.
	A special case are the variables \cmd{pmapcontrib:pmapContribTab} and
	\cmd{pmapcontrib:pmapContribMode}, which are set by
	\cmd{pmcontrib4:initPmapContribTab()} in the context of \rcontrib.
	These serve as references (pointers) to \rcontrib's contribution table
	\cmd{rcontrib:modconttab} and coefficient/contribution mode flag
	\cmd{rcontrib:contrib}, but are unused (i.e. NULL) in the context of
	\mkpmap, where no such variables exist.

	\begin{sidewaysfigure}
	\center
	\includegraphics[width=0.95\textwidth]{contrib-swarch-mkpmap2-crop}
	\caption{
	\label{fig:swarch-mkpmap}
	Software architecture of precomputed contribution photon map in
	the context of \cmd{mkpmap}. Red arrows represent function calls,
	ordered top-down.
	}
	\end{sidewaysfigure}


	\begin{sidewaysfigure}
	\center
	\includegraphics[width=0.95\linewidth]{contrib-swarch-rcontrib-crop}
	\caption{
	\label{fig:swarch-rcontrib}
	Software architecture of precomputed contribution photon map in
	the context of \cmd{rcontrib}. Red Arrows represent function calls,
	ordered top-down.
	}
	\end{sidewaysfigure}

	\end{document}

precomp-contrib-pmap-techreport.texNo OneTemporaryActions

File Metadata

precomp-contrib-pmap-techreport.texView Options

Event Timeline

precomp-contrib-pmap-techreport.tex
No OneTemporary
Actions

precomp-contrib-pmap-techreport.tex
View Options