LinAL.tex
No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Wed, Sep 4, 06:21

LinAL.tex
View Options

	\section{Homotopy Linearized AL Algorithm \label{sec:holal}}

	To solve the equivalent formulation of problem \eqref{prob:01} presented in \eqref{eq:minmax}, we propose a Homotopy Linearized Augmented Lagrangian algorithm (HoLAL), summarized in Algorithm~\ref{Algo:2}. At every iteration, Algorithm~\ref{Algo:2} takes a primal descent step followed by a dual ascent step. The increasing sequence of penalty weights $\{\b_k\}_k$ in Step 1, and the dual updates in Steps 6 and 7 are responsible for continuously enforcing the constraints in~\eqref{prob:01}.

	As we will see in the convergence analysis, the particular choice of $\b_k$ in Algorithm~\ref{Algo:2} strikes a balance between reducing the objective of \eqref{prob:01} and enforcing its constraints. Moreover, the choice of dual step size $\s_k$ in Algorithm~\ref{Algo:2} ensures that the dual variable $y_k$ remains bounded; see~\cite{bertsekas1976penalty} for a precedent in the literature of augmented Lagrangian method with a similar choice for the dual step size.

	%\notea{this next paragraph is not good.}
	%Step 4 of Algorithm~\ref{Algo:2} removes pathological cases with divergent iterates. The first function of Step 4 is to clip the iterates of Algorithm~\ref{Algo:2} to a ball of radius $\rho'$.
	%As an example, suppose that $g=1_C$ in \eqref{prob:01} is the indicator function for a bounded convex set $C\subset \RR^d$ and take $\rho' > \max_{x\in C} \\|x\\|$. Then, for sufficiently large $k$, it is not difficult to verify, by Theorem~\ref{thm:main} below, all the iterates of Algorithm~\ref{Algo:2} automatically satisfy $\\|x_k\\|\le \rho'$. The second function of Step 4 is to control the primal step sizes. \notea{that's bad.}


	%\notea{we should at least say something about the adaptive version, even if we don't include the proof. maybe we should include the algorithm, claim we have proved its convergence (which we did) and then use it in simulations alongside the nonadaptive version. }


	\begin{algorithm}
	\label{Algo:2}
	Input: Parameters $\beta_1,\s_1,\rho,\tau> 0$, primal initialization $x_{1}\in \RR^d$ with $\\|A(x_1)\\|\le \rho$, dual initialization $y_1\in \RR^m$.
	\\

	For $k=1,2,\ldots$, execute\\
	\begin{enumerate}

	\item \textbf{(Update penalty weight)}
	$
	\beta_k \leftarrow \beta_{1}\sqrt{k}\log(k+1)/\log 2 .
	$\\
	\item \textbf{(Line search)} Use the line search in \eqref{eq:defn of gamma line search} with $x=x_k,y=y_k,\b=\b_k$ and let $\g_k \leftarrow \g$.\\

	\item \textbf{(Primal descent step)} $ x_{k+1} \leftarrow P_{g }(x_{k} - \gamma_{k} \nabla \mathcal{L}_{\beta_{k}}(x_{k},y_{k}))$, where $\L_{\b_k}$ is the augmented Lagrangian and $P_g$ denotes the proximal operator, see (\ref{prob:01},\ref{eq:Lagrangian},\ref{eq:defn of prox}).\\
	%\item \textbf{(Control)} If necessary, modify $x_{k+1}$ to ensure that $\\|x_{k+1}\\|\le \rho'$ and $\\|x_{k+1}-x_k\\|\le \rho''$. \notea{will this cause any problem?}\\
	\item \textbf{(Stopping criterion)} If $\g_k \\|G_{\b_k,\g_k}(x_k,y_k)\\|^2 + \s_k \\|A(x_k)\\|^2 \le \tau$, then quit and return $x_{k+1}$. See \eqref{eq:grad map} for the definition of gradient mapping $G_{\b_k,\g_k}$. \\
	\item \textbf{(Update dual step size)}
	$
	\s_{k+1} \leftarrow \s_{1} \min\left( {\frac{1}{\sqrt{k+1}}},\frac{\\|A(x_1)\\|}{\\|A(x_{k+1})\\|}\cdot \frac{ \log^2 2 }{(k+1)\log^2(k+2)} \right).
	$\\
	\item \textbf{(Dual ascent step)} $y_{k+1} \leftarrow y_{k} + \sigma_{k+1}A(x_{k+1})$.
	\end{enumerate}
	\caption{HoLAL algorithm for solving problem \eqref{prob:01}}
	\end{algorithm}







	%Slater's condition plays a key role in convex optimization and we require a similar condition for the success of our algorithm.
	%\begin{definition}\label{defn:nonconvex slater} \textbf{(Nonconvex Slater's condition)}
	%Consider an interval $K=[k_0:k_1]$ and a sequence $\{x_k\}_{k=k_0}^{k_1}\subset \RR^d$. Let us set
	%\begin{align}
	%\cone(x_{k+1}) := \text{cone}(\partial g(x_{k+1}) )^* \subset \RR^d,
	%\qquad \forall k\in K,
	%\end{align}
	%for short, and let $S_K\subset \RR^d$ be a subspace such that
	%\begin{align}
	%S _{K} \supseteq \bigcup_{k\in K} P_{\cone(x_{k+1})}\left( DA(x_{k+1})^\top A(x_{k+1}) \right),
	%\label{eq:defn of S defn}
	%\end{align}
	%and, to simplify the notation, let $S_{K}$ also denote an orthonormal basis for this subspace.
	%For $\rho,\rho'>0$, we set
	%\begin{align}
	%\nu_A :=
	%\begin{cases}
	%\underset{v,x}{\min} \, \frac{\left\\| S_{K}^\top P_{\cone(x)} ( DA(x)^\top v) \right\\|}{\\|v\\|} \\
	%\\|v\\|\le \rho\\
	%\\|x\\|\le \rho',
	%\end{cases}
	%%\label{eq:new slater defn}
	%\end{align}
	%and say that Program (\ref{prob:01}) satisfies the nonconvex Slater's condition if $\nu_A >0$.
	%\end{definition}
	%To gain some insight about Definition \ref{defn:nonconvex slater}, suppose that $f:\RR^d\rightarrow\RR$ is convex. Consider a nonempty and bounded convex set $C\subseteq\RR^d$ and let us restrict ourselves to the choice of $g = 1_C:\RR^d\rightarrow\RR$, which takes $x\in \RR^d$ to
	%\begin{align}
	%g(x) = 1_C(x) =
	%\begin{cases}
	%0 & x \in C\\
	%\infty & x\notin C.
	%\end{cases}
	%\label{eq:indicator}
	%\end{align}
	%We also let $T_C(x)$ denote the tangent cone to $C$ at $x$, and $P_{T_C(x)}:\RR^d\rightarrow\RR^d$ denotes the orthogonal projection onto this tangent cone.
	%We also set $S_K = \RR^d$ to simplify the discussion. Then, it is not difficult to verify that \eqref{eq:new slater defn} is equivalent to
	%\begin{align}
	%\nu_A :=
	%\begin{cases}
	%\underset{v,x}{\min} \, \, \frac{\left\\| P_{T_C(x)} A^\top v \right\\|}{\\|v\\|} \\
	%\\|v\\|\le \rho\\
	%\\|x\\|\le \rho'
	%\end{cases}
	%= \begin{cases}
	%\underset{x}{\min} \,\, \eta_{m}\left( P_{T_C(x)} A^\top \right) \\
	%\\|x\\|\le \rho',
	%\end{cases}
	%%\label{eq:nu for cvx}
	%\end{align}
	%where $\eta_{m}$ returns the $m$th largest singular value of its input matrix. The nonconvex Slater condition here, namely, $\nu_A>0$, is closely related to the standard Slater's condition in convex optimization.
	%\begin{proposition}%\label{prop:convex}
	%In problem (\ref{prob:01}), suppose that $f$ is a convex function, $g$ is as in (\ref{eq:indicator}), and $A:\RR^d\rightarrow\RR^m$ a linear operator, represented with the matrix $A\in \RR^{m\times d}$. Assume also that Program (\ref{prob:01}) is feasible, namely, there exists $x\in C$ such that $Ax=0$. In (\ref{eq:nu for cvx}), let us take $S_K =\RR^d$. Then the standard Slater's condition holds if the nonconvex Slater's condition ($\nu_A>0$) holds.
	%\end{proposition}
	%\begin{proof}
	%Suppose that the standard Slater's condition does not hold, namely, that
	%\begin{equation}
	%\relint(\Null(A) \cap C) = \Null(A)\cap \relint(C) = \emptyset,
	%\label{eq:no feas}
	%\end{equation}
	%where $\Null(A)$ and $\relint(C)$ denote the null space of the matrix $A$ and the relative interior of $C$, respectively.
	%By assumption, there exists $x_0\in C$ such that $Ax_0=0$. It follows from \eqref{eq:no feas} that $x_0\in \text{boundary}(C)$ and that $\Null(A)$ supports $C$ at $x_0$, namely,
	%$A x_0\ge 0$, for every $x_0\in C$.
	%Consequently, $\Null(A) \cap T_C(x) \ne \{0\}$, where $T_C(x_0)$ is the tangent cone of the set $C$ at $x_0$.
	%Equivalently, it holds that
	%$\row(A)\cap N_C(u) \ne \{0\}$, where $\row(A)$ is the row space of the matrix $A$. That is, there exists a unit-norm vector $v$ such that
	%$P_{T_C(x_0)}A^\top v=0$,
	%and, consequently,
	%$
	%\eta_{m}(P_{T_C(x_0)}A^\top) = 0
	%$.
	%Let us take $\rho'=\\|x_0\\|$ in \eqref{eq:nu for cvx}. We then conclude that $\nu_A=0$, namely, the nonconvex Slater's condition also does not hold, thereby completing the proof of Proposition \ref{prop:convex}.\hfill \qed
	%\end{proof}
	%

	%The nonconvex Slater's condition is necessary for the success of Algorithm~\ref{Algo:2}.
	%
	%
	%The following result, proved in Appendix \ref{sec:proof of bnd Ak}, uses the nonconvex Slater's condition to find a converse, thus bounding the feasibility gap with the gradient mapping.
	%\begin{lemma}\label{lem:bnd bnd Ak} \textbf{\emph{(Nonconvex Slater's condition)}}
	%For an integer interval $K=[k_0:k_1]$, let us set
	%\begin{align}
	%\cone(x_{k+1}) := \text{cone}(\partial g(x_{k+1}) )^* \subset \RR^d,
	%\qquad \forall k\in K,
	%\end{align}
	%for short, and
	%consider a subspace $S_K\subseteq \mathbb{R}^d$ such that
	%\begin{align}
	%S _{K} \supseteq \bigcup_{k\in K} P_{\cone(x_{k+1})}\left( DA(x_{k+1})^\top A(x_{k+1}) \right),
	%\label{eq:defn of S lemma}
	%\end{align}
	%and, to simplify the notation, let $S_{K}$ also denote an orthonormal basis for this subspace.
	%For $\rho,\rho',\rho''>0$, assume that
	%\begin{align}
	%\max_{k\in K}\\|A(x_k)\\| \le \rho,
	%\qquad
	%\max_{k\in K}\\|x_k\\| \le \rho',
	%\qquad
	%\max_{k\in K} \\|x_{k+1}-x_k\\|\le \rho''.
	%\label{eq:good neighb}
	%\end{align}
	%Let us also set
	%\begin{align}
	%\nu_A :=
	%\begin{cases}
	%\min_{v,x} \, \frac{\left\\| S_{K}^\top P_{\cone(x)} ( DA(x)^\top v) \right\\|}{\\|v\\|} \\
	%\\|v\\|\le \rho\\
	%\\|x\\|\le \rho'.
	%\end{cases}
	%\label{eq:new slater lemma}
	%\end{align}
	%and assume that $\nu_A\ge 2\lambda_A\rho''$.
	%Then it holds that
	%\begin{align}
	%\\|A(x_k)\\|
	%& \le \frac{2}{\nu_A\b_k} \left( \\|G_k\\| + \lambda'_f + \lambda'_A \\| y_k\\| \right),
	%\qquad \forall k\in K,
	%\label{eq:bnd on Ak final}
	%\end{align}
	%where
	%\begin{align}
	%\lambda'_f := \max_{\\|x\\| \le \rho'} \\|\nabla f(x)\\|,
	%\qquad
	%\lambda'_A := \max_{\\|x\\|\le \rho'} \\| DA(x)\\|.
	%\end{align}
	%\end{lemma}


	\noindent In the $k^{\text{th}}$ iteration, if the primal step size $\g_k$ is sufficiently small, it is natural to expect Step 3 of Algorithm~\ref{Algo:2} to reduce the objective of \eqref{eq:minmax}. Perhaps less obviously, the geometric regularity in Definition~\ref{defn:nonconvex slater} ensures that this primal update also reduces the feasibility gap of \eqref{prob:01}. This intuition is the key to the analysis of Algorithm~\ref{Algo:2}, as formalized below and proved in Appendix \ref{sec:proof of bnd Ak}.
	\begin{lemma}\label{lem:bnd bnd Ak} %\textbf{\emph{(Nonconvex Slater's condition)}}
	For integers $k_0< k_1$, consider the integer interval $K=[k_0:k_1]$.\footnote{If desired, one can set $k_1=\infty$ in Lemma \ref{lem:bnd bnd Ak}.}
	% and, for $\rho'>0$, suppose that
	%\begin{align}
	%\rho' \ge \max_{k\in K} \\|x_{k}\\|.
	%\label{eq:good neighb}
	%\end{align}
	Suppose that problem~(\ref{prob:01}) satisfies the geometric regularity in Definition~\ref{defn:nonconvex slater} with
	\begin{align}
	\nu(g,A,S,\rho,\rho')\ge 2\lambda_A \max_{k\in K} \g_k \\|G_{\b_k,\g_k}(x_k,y_k)\\|,
	\label{eq:low-bnd-on-nu-lemma}
	\end{align}
	where $\lambda_A$ was defined in (\ref{eq:smoothness basic}) and
	\begin{itemize}
	\item $\rho \ge \max_{k\in K}\\|A(x_k)\\| $,
	\item $\rho' \ge \max_{k\in K}\\|x_k\\| $,
	\item $S \supseteq \bigcup_{k\in K} P_{\cone(\partial g (x_{k+1}) )^*}\left( \D A(x_{k+1})^\top A(x_{k+1}) \right).$
	\end{itemize}
	% let us set
	%\begin{align}
	%\cone(x_{k+1}) := \text{cone}(\partial g(x_{k+1}) )^* \subset \RR^d,
	%\qquad \forall k\in K,
	%\end{align}
	%for short, and
	%consider a subspace $S_K\subseteq \mathbb{R}^d$ such that
	%\begin{align}
	%S _{K} \supseteq \bigcup_{k\in K} P_{\cone(x_{k+1})}\left( DA(x_{k+1})^\top A(x_{k+1}) \right),
	%\label{eq:defn of S lemma}
	%\end{align}
	%and, to simplify the notation, let $S_{K}$ also denote an orthonormal basis for this subspace.
	%For $\rho,\rho',\rho''>0$, assume that
	%\begin{align}
	%\max_{k\in K}\\|A(x_k)\\| \le \rho,
	%\qquad
	%\max_{k\in K}\\|x_k\\| \le \rho',
	%\qquad
	%\max_{k\in K} \\|x_{k+1}-x_k\\|\le \rho''.
	%\label{eq:good neighb}
	%\end{align}
	%Let us also set
	%\begin{align}
	%\nu_A :=
	%\begin{cases}
	%\min_{v,x} \, \frac{\left\\| S_{K}^\top P_{\cone(x)} ( DA(x)^\top v) \right\\|}{\\|v\\|} \\
	%\\|v\\|\le \rho\\
	%\\|x\\|\le \rho'.
	%\end{cases}
	%\label{eq:new slater lemma}
	%\end{align}
	%and assume that $\nu_A\ge 2\lambda_A\rho''$.
	Then, for every $k\in K$, it holds that
	\begin{align}
	\\|A(x_k)\\|
	& \le \frac{2}{\nu(g,A,S,\rho,\rho')\b_k} \left( \\|G_{\b_k,\g_k}(x_k,y_k)\\| + \lambda'_f + \lambda'_A \\| y_k\\| \right),
	\label{eq:bnd on Ak final}
	\end{align}
	where
	\begin{align}
	\lambda'_f := \max_{\\|x\\| \le \rho'} \\|\nabla f(x)\\|,
	\qquad
	\lambda'_A := \max_{\\|x\\|\le \rho'} \\| \D A(x)\\|.
	\end{align}
	\end{lemma}


	\noindent In words, Lemma~\ref{lem:bnd bnd Ak} relates the feasibility gap to the gradient mapping in problem~\eqref{prob:01}.
	Not surprisingly, as the penalty weight $\b_k$ increases in Algorithm~\ref{Algo:2}, the feasibility gap in~\eqref{prob:01} reduces, as indicated in~\eqref{eq:bnd on Ak final}. Note also that the larger $\nu$, the more regular problem~\eqref{prob:01} is, and the smaller feasibility gap becomes, as again corroborated by \eqref{eq:bnd on Ak final}.

	With the aid of Lemma~\ref{lem:bnd bnd Ak}, we can derive the convergence rate of Algorithm~\ref{Algo:2} to first-order stationarity, with the proof deferred to Appendix~\ref{sec:proof-of-main}. For the convergence metric, we will use a linear combination of the gradient mapping and the feasibility gap of problem~(\ref{prob:01}), as motivated earlier by the discussion after Definition~\ref{def:grad map}.



	\begin{theorem}[Convergence rate of HoLAL]
	\label{thm:main}
	For sufficiently large integers $k_0< k_1$, consider the interval $K=[k_0:k_1]$,\footnote{The requirement for $k_0$ to be sufficiently large is merely to simplify the analysis. If desired, one can set $k_1=\infty$ in Theorem~\ref{thm:main}.} and consider the output sequence $\{x_k,y_k\}_{k\in K}$ of Algorithm~\ref{Algo:2}. Suppose that
	\begin{align*}
	\mu := - \min(0, \inf_{k} f(x_k) + g(x_k) + \langle A(x_k) ,y_{k_0}\rangle )
	< \infty.
	\end{align*}
	For $\rho'>0$, in addition to the strong smoothness of $f$ and $A$ quantified in (\ref{eq:smoothness basic}), let us define
	\begin{align}
	\lambda'_f = \max_{\\|x\\|\le \rho'} \\|\nabla f(x)\\|,
	\qquad
	\lambda'_A = \max_{\\|x\\| \le \rho'} \\|\D A(x)\\|,
	\label{eq:local-lips}
	\end{align}
	to be the (restricted) Lipschitz constants of $f$ and $A$, respectively.\footnote{The restricted Lipschitz continuity assumption for $f,A$ in \eqref{eq:local-lips} is mild. Indeed, we have already assumed in \eqref{eq:smoothness basic} that $f,A$ are both continuously differentiable and, by compactness of the domain in \eqref{eq:local-lips}, $\lambda'_f,\lambda'_A <\infty$. }
	Suppose also that problem~(\ref{prob:01}) satisfies the geometric regularity in Definition~\ref{defn:nonconvex slater} with
	\begin{align}
	\nu(g,A,S,\rho,\rho') \gtrsim
	\max\left(\lambda_A \max_{k\in K} \sqrt{\g_k\mu} , \frac{{\lambda'_f+\lambda'_A}}{\sqrt{\mu}} \right),
	\label{eq:low-bnd-on-regularity}
	\end{align}
	where
	\begin{itemize}
	\item $\rho \gtrsim \sqrt{\mu}$,
	%\item $\rho \gtrsim -\sqrt{\mu}$,
	\item $\rho'\ge \max_{k\in K} \\|x_k\\|$,
	\item $S \supseteq \bigcup_{k\in K} P_{\cone(\partial g (x_{k+1}) )^*}\left( \D A(x_{k+1})^\top A(x_{k+1}) \right).$
	\end{itemize}
	Then the output sequence of Algorithm~\ref{Algo:2} satisfies
	\begin{align}
	\label{eq:rates-thm}
	& \min_{k\in K} \frac{1}{\lambda_f+ \sqrt{m} \lambda_A \rho + d \lambda'^2_A} \cdot \frac{\\|G_{\b_K,\g_k}(x_k,y_k)\\|^2}{\sqrt{k_1}\log(k_1+1)} + \\|A(x_k)\\|^2 \nonumber\\
	& \lesssim \frac{1}{k_1-k_0}\left( \frac{\lambda_f'^2+\lambda_A'^2}{\nu(g,A,S,\rho,\rho')^2} + \mu\right),
	\end{align}
	where $\lesssim,\gtrsim$ above suppress the dependence on constants and less important parameters, for the sake of clarity. The exact expressions
	%in terms of $\lambda_f,\lambda'_f,\lambda_A,\lambda'_A, \b_{k_0},\s_{k_0},\rho, \rho', \nu(g,A,S,\rho,\rho'),y_0$
	are found in (\ref{eq:exact dependences},\ref{eq:suff cnd on nu-1},\ref{eq:cnd-on-nu-proof}).
	\end{theorem}

	\noindent A few remarks about Theorem~\ref{thm:main} are in order.
	\paragraph{\textbf{\emph{Convergence rates.}}} Loosely speaking, Theorem~\ref{thm:main} states that Algorithm~\ref{Algo:2} achieves first-order stationarity for \eqref{prob:01} by reducing the gradient map and the feasibility gap at the rates
	\begin{align}
	\\|G_{\b_k,\g_k}(x_k,y_k) \\|^2 = \frac{1}{\widetilde{O}(\sqrt{k})},
	\qquad
	\\|A(x_k) \\| = \frac{1}{\widetilde{O}(\sqrt{k})},
	\label{eq:rates-summary}
	\end{align}
	where $\widetilde{O}$ suppresses logarithmic terms above.
	For comparison, if $f$ was convex and $A$ was affine in \eqref{prob:01}, both rates in \eqref{eq:rates-summary} would have improved to $1/k$, see \cite[Theorem 2.5]{xu2017accelerated} for the details.

	In contrast, for the convenience of the reader, let us emphasize again that the only linearized primal-dual nonconvex convergence rate we are aware of is the result in \cite[Theorem 19]{bot2018proximal}, which comes to a standstill for the correct choice of $\theta=1/2$, even though their problem~(1) therein is less general than our problem~\eqref{prob:01}. Moreover, compared with the first-order inexact methods that solve \eqref{e:exac} to approximate stationarity (instead of linearizing it like us),
	Theorem~\ref{thm:main} improves over the state-of-the-art total convergence rate of $1/k^{\frac{1}{3}}$, reported in~\cite{sahin2019inexact,cartis2018optimality}.

	\paragraph{\emph{\textbf{Geometric regularity.}}}
	Geometric regularity in Definition~\ref{defn:nonconvex slater} plays a key role in Theorem~\ref{thm:main} by, broadly speaking, ensuring that the primal updates of Algorithm~\ref{Algo:2} reduce the feasibility gap as the penalty weight $\b_k$ grows. We will verify this condition for several examples in Section~\ref{sec:experiments}.



	As confirmed by \eqref{eq:rates-thm}, the larger $\nu(g,A,S,\rho,\rho')$, the more regular \eqref{prob:01}, and the faster Algorithm~\ref{Algo:2} would converge to stationarity. In fact, for Algorithm~\ref{Algo:2} to succeed, Theorem~\ref{thm:main} requires $\nu$ to be sufficiently large, see \eqref{eq:low-bnd-on-regularity}. We do not know if \eqref{eq:low-bnd-on-regularity} is necessary or rather an artifact of the proof technique, but it is naturally expected for the convergence rate to at least slow down when $\nu$ decreases, as corroborated by \eqref{eq:rates-thm}.

	The right-hand side of \eqref{eq:low-bnd-on-regularity} also depends on the largest primal step size $\max_{k\in K}\g_k$. Since $\g_k$ is found by line search in Algorithm~\ref{Algo:2}, we are unable to upper bound this quantity unless we make further assumptions on problem~(\ref{prob:01}), or slightly modify the algorithm to cap primal step sizes. However, recall that the augmented Lagrangian $\L_{\b_k}(\cdot,y_k)$ is $\lambda_{\b_k}$ Lipschitz gradient in its first argument and thus typically $\g_k \propto 1/\lambda_{\b_k}$, namely, $\g_k \propto 1/\sqrt{k}$ by (\ref{eq:smoothness at k},\ref{eq:low bnd on gammas}).

	\paragraph{\emph{\textbf{Smoothness.}}}
	The smoother $f,A$ are in the sense of (\ref{eq:smoothness basic},\ref{eq:local-lips}), the faster convergence would be, see~(\ref{eq:low-bnd-on-regularity},\ref{eq:rates-thm}). Indeed, as $f,A$ becomes smoother, problem~(\ref{prob:01}) more and more resembles a convex program, at least locally.

	Note, however, that it is often not straightforward to compute the local Lipschitz constants $\lambda'_f,\lambda'_A$ of \eqref{eq:local-lips} in practice, but it is in general possible to loosely upper bound them. For us, it is necessary to work with $\lambda'_f,\lambda'_A$ to translate any descent in the augmented Lagrangian (see Lemma~\ref{lem:11}) to reducing the feasibility gap (see Lemma~\ref{lem:bnd bnd Ak}).



	\paragraph{\emph{\textbf{Subspace $S$.}}}
	The freedom over the choice of subspace $S$ in Theorem~\ref{thm:main} is meant to further strengthen the result in the spirit of Proposition~\ref{prop:convex}. One might simply set $S=\mathbb{R}^d$ in the first reading.

	\paragraph{\textbf{\emph{Faster rates.}}} Assuming restricted strong convexity and smoothness for $f$ in~\eqref{prob:01} and near-isometry for $A$, (approximate) linear convergence of Algorithm~\ref{Algo:2} to a global minimizer of problem~\eqref{prob:01} can be established~\cite{gomez2019fast}.

LinAL.texNo OneTemporaryActions

File Metadata

LinAL.texView Options

Event Timeline

LinAL.tex
No OneTemporary
Actions

LinAL.tex
View Options