\item the LARS/homotopy algorithm~\cite{osborne,efron} (variants for solving Lasso and Elastic-Net problems);
\item a weighted version of LARS;
\item OMP and LARS when data come with a binary mask;
\item a coordinate-descent algorithm for $\ell_1$-decomposition problems~\cite{fu,friedman,wu};
\item a greedy solver for simultaneous signal approximation as defined in~\cite{tropp2,tropp3} (SOMP);
\item a solver for simulatneous signal approximation with $\ell_1/\ell_2$-regularization based on block-coordinate descent;
\item a homotopy method for the Fused-Lasso Signal Approximation as defined in~\cite{friedman} with the homotopy method presented in the appendix of~\cite{mairal9};
\item a tool for projecting efficiently onto a few convex sets
inducing sparsity such as the $\ell_1$-ball using the method of
\cite{brucker,maculan,duchi}, and Elastic-Net or Fused Lasso constraint sets as
proposed in the appendix of~\cite{mairal9}.
\end{itemize}
\item The \textbf{Proximal toolbox}: An implementation of proximal methods
(ISTA and FISTA~\cite{beck}) for solving a large class of sparse approximation
problems with different combinations of loss and regularizations. One of the main
features of this toolbox is to provide a robust stopping criterion based on
\emph{duality gaps} to control the quality of the optimization, whenever
possible. It also handles sparse feature matrices for large-scale problems. The following regularizations are implemented:
$\psi$ is a sparsity-inducing regularizer and $\C$ is a constraint set for the dictionary. As shown in \cite{mairal9}
and in the help file below, various combinations can be used for $\psi$ and $\C$ for solving different matrix factorization problems.
What is more, positivity constraints can be added to $\alphab$ as well. The function admits several modes for choosing the optimization parameters, using the parameter-free strategy proposed in \cite{mairal7}, or using the parameters $t_0$ and $\rho$ presented
in \cite{mairal9}. {\bf Note that for problems of a reasonable size, and when $\psi$ is the $\ell_1$-norm,
the function mexTrainDL\_Memory can be faster but uses more memory.}
\lstinputlisting{../src_release/mexTrainDL.m}
% {\footnotesize
% \verbatiminput{../src_release/mexTrainDL.m}
% }
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_TrainDL.m}
\subsection{Function mexTrainDL\_Memory}
Memory-consuming version of mexTrainDL. This function is well adapted to small/medium-size problems:
It requires storing all the coefficients $\alphab$ and is therefore impractical
for very large datasets. However, in many situations, one can afford this memory cost and it is better to use this method, which
is faster than mexTrainDL.
Note that unlike mexTrainDL this function does not allow warm-restart.
This function is an example on how to use the function mexTrainDL for the
problem of non-negative matrix factorization formulated in \cite{lee2}. Note
that mexTrainDL can be replaced by mexTrainDL\_Memory in this function for
small or medium datasets.
% {\footnotesize
% \verbatiminput{../src_release/nmf.m}
% }
\lstinputlisting{../src_release/nmf.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_nmf.m}
\subsection{Function nnsc}
This function is an example on how to use the function mexTrainDL for the
problem of non-negative sparse coding as defined in \cite{hoyer}. Note that
mexTrainDL can be replaced by mexTrainDL\_Memory in this function for small or
medium datasets.
% {\footnotesize
% \verbatiminput{../src_release/nnsc.m}
% }
\lstinputlisting{../src_release/nnsc.m}
% The following piece of code contains usage examples:
% \lstinputlisting{../test_release/test_nnsc.m}
\section{Sparse Decomposition Toolbox}
This toolbox implements several algorithms for solving signal reconstruction problems. It is mostly adapted for solving a large number of small/medium scale problems, but can be also efficient sometimes with large scale ones.
\subsection{Function mexOMP}
This is a fast implementation of the Orthogonal Matching Pursuit algorithm (or forward selection)~\cite{mallat4,weisberg}. Given a matrix of signals $\X=[\x^1,\ldots,\x^n]$ in $\Real^{m \times n}$ and a dictionary $\D=[\d^1,\ldots,\d^p]$ in $\Real^{m \times p}$, the algorithm computes a matrix $\A=[\alphab^1,\ldots,\alphab^n]$ in $\Real^{p \times n}$,
where for each column $\x$ of $\X$, it returns a coefficient vector $\alphab$ which is an approximate solution of the following NP-hard problem
For efficienty reasons, the method first computes the covariance matrix
$\D^T\D$, then for each signal, it computes $\D^T\x$ and performs the
decomposition with a Cholesky-based algorithm (see \cite{cotter} for instance).
Note that mexOMP can return the ``greedy'' regularization path if needed (see below):
% {\footnotesize
% \verbatiminput{../src_release/mexOMP.m}
% }
\lstinputlisting{../src_release/mexOMP.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_OMP.m}
\subsection{Function mexOMPMask}
This is a variant of mexOMP with the possibility of handling a binary mask.
Given a binary mask $\B=[\betab^1,\ldots,\betab^n]$ in $\{0,1\}^{m \times n}$, it returns a matrix $\A=[\alphab^1,\ldots,\alphab^n]$ such that for every column $\x$ of $\X$, $\betab$ of $\B$, it computes a column $\alphab$ of $\A$ by addressing
where $\diag(\betab)$ is a diagonal matrix with the entries of $\betab$ on the diagonal.
% {\footnotesize
% \verbatiminput{../src_release/mexOMPMask.m}
% }
\lstinputlisting{../src_release/mexOMPMask.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_OMPMask.m}
\subsection{Function mexLasso}
This is a fast implementation of the LARS algorithm~\cite{efron} (variant for solving the Lasso) for solving the Lasso or Elastic-Net. Given a matrix of signals $\X=[\x^1,\ldots,\x^n]$ in $\Real^{m \times n}$ and a dictionary $\D$ in $\Real^{m \times p}$, depending on the input parameters, the algorithm returns a matrix of coefficients $\A=[\alphab^1,\ldots,\alphab^n]$ in $\Real^{p \times n}$ such that for every column $\x$ of $\X$, the corresponding column $\alphab$ of $\A$ is the solution of
For efficiency reasons, the method first compute the covariance matrix $\D^T\D$, then
for each signal, it computes $\D^T\x$ and performs the decomposition with a
Cholesky-based algorithm (see \cite{efron} for instance). The implementation
has also an option to add {\bf positivity constraints} on the solutions
$\alphab$. When the solution is very sparse and the problem size is
reasonable, this approach can be very efficient. Moreover, it gives the
solution with an exact precision, and its performance does not depend on the
correlation of the dictionary elements, except when the solution is not unique
(the algorithm breaks in this case).
Note that mexLasso can return the whole regularization path of the first signal $\x_1$
and can handle implicitely the matrix~$\D$ if the quantities~$\D^T\D$ and~$\D^T\x$ are passed
as an argument, see below:
% {\footnotesize
% \verbatiminput{../src_release/mexLasso.m}
% }
\lstinputlisting{../src_release/mexLasso.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_Lasso.m}
\subsection{Function mexLassoWeighted}
This is a fast implementation of a weighted version of LARS~\cite{efron}. Given a matrix of signals $\X=[\x^1,\ldots,\x^n]$ in $\Real^{m \times n}$, a matrix of weights $\W=[\w^1,\ldots,\w^n] \in \Real^{p \times n}$, and a dictionary $\D$ in $\Real^{m \times p}$, depending on the input parameters, the algorithm returns a matrix of coefficients $\A=[\alphab^1,\ldots,\alphab^n]$ in $\Real^{p \times n}$,
such that for every column $\x$ of $\X$, $\w$ of $\W$, it computes a column $\alphab$ of $\A$, which is the solution of
Coordinate-descent approach for solving Eq.~(\ref{eq:lasso}) and
Eq.~(\ref{eq:lasso2}). Note that unlike mexLasso, it is not implemented to solve the Elastic-Net formulation.
To solve Eq.~(\ref{eq:lasso2}), the algorithm solves a
sequence of problems of the form (\ref{eq:lasso}) using simple heuristics.
Coordinate descent is very simple and in practice very powerful. It performs
better when the correlation between the dictionary elements is small.
% {\footnotesize
% \verbatiminput{../src_release/mexCD.m}
% }
\lstinputlisting{../src_release/mexCD.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_CD.m}
\subsection{Function mexSOMP}
This is a fast implementation of the Simultaneous Orthogonal Matching Pursuit algorithm. Given a set of matrices $\X=[\X^1,\ldots,\X^n]$ in $\Real^{m \times N}$, where the $\X^i$'s are in $\Real^{m \times n_i}$, and a dictionary $\D$ in $\Real^{m \times p}$, the algorithm returns a matrix of coefficients $\A=[\A^1,\ldots,\A^n]$ in $\Real^{p \times N}$ which is an approximate solution of the following NP-hard problem
To be efficient, the method first compute the covariance matrix $\D^T\D$, then for each signal, it computes $\D^T\X^i$ and performs the decomposition with a Cholesky-based algorithm.
% {\footnotesize
% \verbatiminput{../src_release/mexSOMP.m}
% }
\lstinputlisting{../src_release/mexSOMP.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_SOMP.m}
\subsection{Function mexL1L2BCD}
This is a fast implementation of a simultaneous signal decomposition formulation. Given a set of matrices $\X=[\X^1,\ldots,\X^n]$ in $\Real^{m \times N}$, where the $\X^i$'s are in $\Real^{m \times n_i}$, and a dictionary $\D$ in $\Real^{m \times p}$, the algorithm returns a matrix of coefficients $\A=[\A^1,\ldots,\A^n]$ in $\Real^{p \times N}$ which is an approximate solution of the following NP-hard problem
To be efficient, the method first compute the covariance matrix $\D^T\D$, then for each signal, it computes $\D^T\X^i$ and performs the decomposition with a Cholesky-based algorithm.
% {\footnotesize
% \verbatiminput{../src_release/mexL1L2BCD.m}
% }
\lstinputlisting{../src_release/mexL1L2BCD.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_L1L2BCD.m}
\subsection{Function mexSparseProject}
This is a multi-purpose function, implementing fast algorithms for projecting
on convex sets, but it also solves the fused lasso signal approximation
problem. The proposed method is detailed in \cite{mairal9}. The main problems
addressed by this function are the following: Given a matrix
$\U=[\u_1,\ldots,\u_n]$ in $\Real^{m \times n}$, it finds a matrix
$\V=[\v_1,\ldots,\v_n]$ in $\Real^{m \times n}$ so that for all column $\u$ of $\U$,
adapted for solving a large number of small and medium-scale sparse
decomposition problems with the square loss, which is typical from the
classical dictionary learning framework. We now present
a new software package that is adapted for solving a wide range of
possibly large-scale learning problems, with several combinations of losses and
regularization terms. The method implements the proximal methods
of~\cite{beck}, and includes the proximal solvers for the tree-structured
regularization of \cite{jenatton3}, and the solver of~\cite{mairal10} for
general structured sparse regularization.
The solver for structured sparse regularization norms includes a C++ max-flow
implementation of the push-relabel algorithm of~\cite{goldberg}, with
heuristics proposed by~\cite{cherkassky}.
This implementation also provides robust stopping criteria based on
\emph{duality gaps}, which are presented in Appendix~\ref{appendix}. It can handle intercepts (unregularized variables). The general formulation that our software
\item \textbf{The group Lasso}: $\psi(\w) \defin \sum_{g \in \GG} \eta_g \|\w_g\|_2$, where $\GG$ are groups of variables.
\item \textbf{The group Lasso with~$\ell_\infty$-norm}: $\psi(\w) \defin \sum_{g \in \GG} \eta_g \|\w_g\|_\infty$, where $\GG$ are groups of variables.
\item \textbf{The sparse group Lasso}: same as above but with an additional $\ell_1$ term.
\item \textbf{The tree-structured sum of $\ell_2$-norms}: $\psi(\w) \defin \sum_{g \in \GG} \eta_g \|\w_g\|_2$, where $\GG$ is a tree-structured set of groups~\cite{jenatton3}, and the $\eta_g$ are positive weights.
\item \textbf{The tree-structured sum of $\ell_\infty$-norms}: $\psi(\w) \defin \sum_{g \in \GG} \eta_g \|\w_g\|_\infty$. See \cite{jenatton3}
\item \textbf{General sum of $\ell_\infty$-norms}: $\psi(\w) \defin \sum_{g \in \GG} \eta_g \|\w_g\|_\infty$, where no assumption are made on the groups $\GG$.
\end{itemize}
Our software also handles regularization functions~$\psi$ on matrices~$\W$ in~$\Real^{p \times r}$ (note that $\W$ can be transposed in these formulations). In particular,
\begin{itemize}
\item \textbf{The $\ell_1/\ell_2$-norm}: $\psi(\W) \defin \sum_{i=1}^p \|\W_i\|_2$, where $\W_i$ denotes the $i$-th row of $\W$.
\item \textbf{The $\ell_1/\ell_\infty$-norm on rows and columns}: $\psi(\W) \defin \sum_{i=1}^p \|\W_i\|_\infty+\lambda_2 \sum_{j=1}^r\|\W^j\|_\infty$, where $\W^j$ denotes the $j$-th column of $\W$.
\item \textbf{The multi-task tree-structured sum of $\ell_\infty$-norms}:
where the first double sums is in fact a sum of independent structured norms on the columns $\w^i$ of $\W$, and the right term is a tree-structured regularization norm applied to the $\ell_\infty$-norm of the rows of $\W$, thereby inducing the tree-structured regularization at the row level. $\GG$ is here a tree-structured set of groups.
\item \textbf{The multi-task general sum of $\ell_\infty$-norms} is the same as Eq.~(\ref{software:eq:struct}) except that the groups $\GG$ are general overlapping groups.
All of these regularization terms for vectors or matrices can be coupled with
nonnegativity constraints. It is also possible to add an intercept, which one
wishes not to regularize, and we will include this possibility in the next
sections. There are also a few hidden undocumented options which are available in the source code.
We now present 3 functions for computing proximal operators associated to the previous regularization functions.
\subsection{Function mexProximalFlat}
This function computes the proximal operators associated to many regularization functions, for input signals $\U=[\u^1,\ldots,\u^n]$ in $\Real^{p \times n}$, it finds a matrix $\V=[\v^1,\ldots,\v^n]$ in $\Real^{p \times n}$ such that:
$\bullet$~ If one chooses a regularization function on vectors, for every column $\u$ of $\U$, it computes one column $\v$ of $\V$ solving
This function computes the proximal operators associated to tree-structured regularization functions, for input signals $\U=[\u^1,\ldots,\u^n]$ in $\Real^{p \times n}$, and a tree-structured set of groups~\cite{jenatton3}, it computes a matrix $\V=[\v^1,\ldots,\v^n]$ in $\Real^{p \times n}$. When one uses a regularization function on vectors, it computes a column $\v$ of $\V$ for every column $\u$ of $\U$:
which is a formulation presented in \cite{mairal10}.
This function can also be used for computing the proximal operators addressed by mexProximalFlat (it will just not take into account the tree structure). The way the tree is incoded is presented below, (and examples are given in the file test\_ProximalTree.m, with more usage details:
This function computes the proximal operators associated to structured sparse regularization, for input signals $\U=[\u^1,\ldots,\u^n]$ in $\Real^{p \times n}$, and a set of groups~\cite{mairal10}, it returns a matrix $\V=[\v^1,\ldots,\v^n]$ in $\Real^{p \times n}$.
When one uses a regularization function on vectors, it computes a column $\v$ of $\V$ for every column $\u$ of $\U$:
This function can also be used for computing the proximal operators addressed by mexProximalFlat. The way the graph is incoded is presented below (and also in the example file test\_ProximalGraph.m, with more usage details:
where the $\X=[\x^i,\ldots,\x^n]^T$ (the $\x^i$'s are here the rows of $\X$).
\subsubsection{Classification Problems with the Logistic Loss}
The next formulation that our software can solve is the regularized logistic regression formulation.
We are again given a training set $\{\x^i,y_i\}_{i=1}^n$, with $\x^i \in
\Real^p$, but the variables~$y_i$ are now in $\{-1,+1\}$ for all $i$ in
$\InSet{n}$. The optimization problem we address is
\begin{displaymath}
\min_{\w \in \Real^p, b \in \Real} \frac{1}{n}\sum_{i=1}^n \log(1+e^{-y_i(\w^\top\x^i+b)} + \lambda\psi(\w),
\end{displaymath}
with again $\psi$ taken to be one of the regularization function presented above, and $b$ is an optional intercept.
\subsubsection{Multi-class Classification Problems with the Softmax Loss}
We have also implemented a multi-class logistic classifier (or softmax).
For a classification problem with $r$ classes, we are given a training set~$\{\x^i,y_i\}_{i=1}^n$, where the variables~$\x^i$ are still vectors in $\Real^p$, but the $y_i$'s have integer values in $\{1,2,\ldots,r\}$. The formulation we address is the following multi-class learning problem
where $\W = [\w^1,\ldots,\w^r]$ and the optional vector $\b$ in $\Real^r$ carries intercepts for each class.
\subsubsection{Multi-task Regression Problems with the Square Loss}
We are now considering a problem with $r$ tasks, and a training set
$\{\x^i,\y^i\}_{i=1}^n$, where the variables~$\x^i$ are still vectors in~$\Real^p$, and $\y^i$
is a vector in $\Real^r$. We are looking for $r$ regression vectors $\w^j$, for $j\in \InSet{r}$, or equivalently for a matrix $\W=[\w^1,\ldots,\w^r]$ in~$\Real^{p \times r}$. The formulation we address is the following
\subsubsection{Multi-task and Multi-class Classification Problems with the Softmax Loss}
The multi-task/multi-class version directly follows from the formulation of Eq.~(\ref{software:eq:class}), but associates with each class a task, and as a consequence, regularizes the matrix $\W$ in a particular way:
How duality gaps are computed for any of these formulations is presented in Appendix~\ref{appendix}.
We now present the main functions for solving these problems
\subsection{Function mexFistaFlat}
Given a matrix $\X=[\x^1,\ldots,\x^p]^T$ in $\Real^{m \times p}$, and a matrix $\Y=[\y^1,\ldots,\y^n]$, it solves the optimization problems presented in the previous section, with the same regularization functions as mexProximalFlat.
see usage details below:
\lstinputlisting{../src_release/mexFistaFlat.m}
The following piece of code contains usage examples:
Given a matrix $\X=[\x^1,\ldots,\x^p]^T$ in $\Real^{m \times p}$, and a matrix $\Y=[\y^1,\ldots,\y^n]$, it solves the optimization problems presented in the previous section, with the same regularization functions as mexProximalTree.
see usage details below:
% {\footnotesize
% \verbatiminput{../src_release/mexFistaTree.m}
% }
\lstinputlisting{../src_release/mexFistaTree.m}
The following piece of code contains usage examples:
Given a matrix $\X=[\x^1,\ldots,\x^p]^T$ in $\Real^{m \times p}$, and a matrix $\Y=[\y^1,\ldots,\y^n]$, it solves the optimization problems presented in the previous section, with the same regularization functions as mexProximalGraph.
see usage details below:
% {\footnotesize
% \verbatiminput{../src_release/mexFistaGraph.m}
% }
\lstinputlisting{../src_release/mexFistaGraph.m}
The following piece of code contains usage examples:
Implementation of a conjugate gradient for solving a linear system $\A\x=\b$
when $\A$ is positive definite. In some cases, it is faster than the Matlab
function \verb|pcg|, especially when the library uses the Intel Math Kernel Library.
% {\footnotesize
% \verbatiminput{../src_release/mexConjGrad.m}
% }
\lstinputlisting{../src_release/mexConjGrad.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_ConjGrad.m}
\subsection{Function mexBayer}
Apply a Bayer pattern to an input image
% {\footnotesize
% \verbatiminput{../src_release/mexBayer.m}
% }
\lstinputlisting{../src_release/mexConjGrad.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_ConjGrad.m}
\subsection{Function mexCalcAAt}
For an input sparse matrix $\A$, this function returns $\A\A^T$. For some reasons, when $\A$ has a lot more columns than rows, this function can be much faster than the equivalent matlab command \verb|X*A'|.
% {\footnotesize
% \verbatiminput{../src_release/mexCalcAAt.m}
% }
\lstinputlisting{../src_release/mexCalcAAt.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_CalcAAt.m}
\subsection{Function mexCalcXAt}
For an input sparse matrix $\A$ and a matrix $\X$, this function returns $\X\A^T$. For some reasons, when $\A$ has a lot more columns than rows, this function can be much faster than the equivalent matlab command \verb|X*A'|.
% {\footnotesize
% \verbatiminput{../src_release/mexCalcXAt.m}
% }
\lstinputlisting{../src_release/mexCalcXAt.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_CalcXAt.m}
\subsection{Function mexCalcXY}
For two input matrices $\X$ and $\Y$, this function returns $\X\Y$.
% {\footnotesize
% \verbatiminput{../src_release/mexCalcXY.m}
% }
\lstinputlisting{../src_release/mexCalcXY.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_CalcXY.m}
\subsection{Function mexCalcXYt}
For two input matrices $\X$ and $\Y$, this function returns $\X\Y^T$.
% {\footnotesize
% \verbatiminput{../src_release/mexCalcXYt.m}
% }
\lstinputlisting{../src_release/mexCalcXYt.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_CalcXYt.m}
\subsection{Function mexCalcXtY}
For two input matrices $\X$ and $\Y$, this function returns $\X^T\Y$.
% {\footnotesize
% \verbatiminput{../src_release/mexCalcXtY.m}
% }
\lstinputlisting{../src_release/mexCalcXtY.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_CalcXtY.m}
\subsection{Function mexInvSym}
For an input symmetric matrices $\A$ in $\Real^{n \times n}$, this function returns $\A^{-1}$.
% {\footnotesize
% \verbatiminput{../src_release/mexInvSym.m}
% }
\lstinputlisting{../src_release/mexInvSym.m}
The following piece of code contains usage examples:
\lstinputlisting{../test_release/test_InvSym.m}
\subsection{Function mexNormalize}
% {\footnotesize
% \verbatiminput{../src_release/mexNormalize.m}
% }
\lstinputlisting{../src_release/mexNormalize.m}
The following piece of code contains usage examples:
\item \textbf{The multiclass logistic loss (or softmax)}. The primal variable is now a matrix $\Z$, in $\Real^{n \times r}$, which represents the product $\X^\top \W$. We denote by $\K$ the dual variable in $\Real^{n \times r}$.
where $\1_{p+1}$ is a vector of size $p+1$ filled with ones. This step,
ensures that $\sum_{i=1}^{p+1}\kappab''(\w)_i= 0$.
\item \textbf{For the logistic loss}, the situation is slightly more complicated since additional constraints are involved in the definition of $\tilde{f}^\ast$.