%% use the tnoteref command within \title for footnotes;
%% use the tnotetext command for the associated footnote;
%% use the fnref command within \author or \address for footnotes;
%% use the fntext command for the associated footnote;
%% use the corref command within \author for corresponding author footnotes;
%% use the cortext command for the associated footnote;
%% use the ead command for the email address,
%% and the form \ead[url] for the home page:
%%
%% \title{Title\tnoteref{label1}}
%% \tnotetext[label1]{}
%% \author{Name\corref{cor1}\fnref{label2}}
%% \ead{email address}
%% \ead[url]{home page}
%% \fntext[label2]{}
%% \cortext[cor1]{}
%% \address{Address\fnref{label3}}
%% \fntext[label3]{}
%% use optional labels to link authors explicitly to addresses:
%% \author[label1,label2]{<author name>}
%% \address[label1]{<address>}
%% \address[label2]{<address>}
\author{Authors}
\address{LIONS, EPFL}
% \begin{abstract}
% %% Text of abstract
% Suspendisse potenti. Suspendisse quis sem elit, et mattis nisl. Phasellus consequat erat eu velit rhoncus non pharetra neque auctor. Phasellus eu lacus quam. Ut ipsum dolor, euismod aliquam congue sed, lobortis et orci. Mauris eget velit id arcu ultricies auctor in eget dolor. Pellentesque suscipit adipiscing sem, imperdiet laoreet dolor elementum ut. Mauris condimentum est sed velit lacinia placerat. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Nullam diam metus, pharetra vitae euismod sed, placerat ultrices eros. Aliquam tincidunt dapibus venenatis. In interdum tellus nec justo accumsan aliquam. Nulla sit amet massa augue.
% \end{abstract}
% \begin{keyword}
% Science \sep Publication \sep Complicated
% %% keywords here, in the form: keyword \sep keyword
% %% MSC codes here, in the form: \MSC code \sep code
% %% or \MSC[2008] code \sep code (2000 is the default)
% \end{keyword}
\end{frontmatter}
%%
%% Start line numbering here if you want
%%
% \linenumbers
%% main text
\section{Problem Formulation}
\label{S:1}
We study the following non-convex optimization program.
for every $u,u'\in \mathbb{R}^d$. Above, $\nabla h(u)\in \mathbb{R}^d$ is the gradient of $h$ and $DA(u)\in \mathbb{R}^{m\times d}$ is the Jacobian of $A$.
Write down the equivalent program
\begin{align}
\min_{u\in C} \max_y \, \mathcal{L}_\beta(u,y),
\label{eq:minmax}
\end{align}
where the augmented Lagrangian is
\begin{align}
\label{eq:Lagrangian}
\mathcal{L}_\beta(u,y) := h(u) + \langle A(u)-b, y \rangle + \frac{1}{2\beta}\|A(u)-b\|^2,
\end{align}
\section{Preliminaries}
\begin{definition} \label{def:grad map} Given $u$ and $\gamma >0$, define
where $u^+=P_{C}(u-\gamma \nabla \mathcal{L}_ {\beta}(u,y))$.
\end{definition}
\begin{lemma}[Descent Lemma]\label{lem:11}
{For fixed $y\in \RR^m$, suppose that $\nabla_u \mathcal{L}_{\beta}(\cdot, y)$ is $\lambda_\beta$ Lipschitz-continuous. For $u\in C$ and $\gamma \in (0, 1/\lambda_\beta)$, it holds that }
Note that \eqref{e:deslem} follows immediately from an application of \cite[Lemma 3.2, Remark 3.2(i)]{bolte2014proximal}. It only remains to compute the smoothness parameter of $\mathcal{L}_\beta(\cdot,y)$, namely, $\lambda_\beta$. To that end, note that
In {practice}, the Lipschitz constant $\lambda_{\beta}$ is often hard to evaluate exactly {and we might resort to the classic line search technique, reviewed below for completeness.}
{
\begin{lemma}[Line search] \label{lem:eval Lipsc cte} Fix $\theta \in (0,1)$ and ${\gamma}_0$. For $\gamma'>0$, let $u^+_{\gamma'} = P_C(u - \gamma' \nabla \mathcal{L}_\beta(u,y))$ and define
and, with some abuse of notation, let $S_{K}$ also denote an orthonormal basis for this subspace. For $\rho>0$, suppose that there exists $\eta_{\min}$ such that
\begin{align}
0 < \eta_{\min } :=
\begin{cases}
\min_u \, \left\| S_{K}^\top P_{T_C(u)} (DA(u)^\top v ) \right\| \\
which completes the proof of Lemma \ref{lem:bnd bnd Ak}.
\end{proof}
\section{Algorithm and Convergence}
\subsection{Algorithm}
We propose the following method for solving the problem \eqref{prob:01} where, the main idea is to do one projected gradient step over $u$ in order to avoid solving the sub-problems generated by \eqref{eq:minmax} exactly which is intractable. The formal algorithm is presented as follows.
Since $\gamma_k$ is determined by the line search subroutine in Lemma \ref{lem:eval Lipsc cte}, we may now apply Lemma \ref{lem:11} for every iteration in this interval to find that
%That is, Volkan's assumption \eqref{eq:assumption} successfully bounds both the gradient mapping and the feasibility gap. One question is the interplay between $\{\beta_k,\gamma_k\}_k$ to ensure the validity of Volkan's assumption, which feels like some sort of \emph{uncertainty principle}.
Note that \eqref{eq:long chain} bounds the gradient mapping with the feasibility gap. We next find a converse, thus bounding the feasibility gap with the gradient mapping. To that end, let $T_C(u)$ and $P_{T_C(u)}$ be the tangent cone of $C$ at $u\in C$ and orthogonal projection onto this subspace, respectively. Likewise, let $N_C(u)$ and $P_{N_{C}(u)}$ be the normal cone of $C$ at $u$ and the corresponding orthogonal projection.
Roughly speaking, \eqref{eq:bnd on Ak final} states that the feasibility gap is itself bounded by the gradient map. In order to apply Lemma \ref{lem:bnd bnd Ak}, let us assume that (\ref{eq:good neighb}) holds. Lemma \ref{lem:bnd bnd Ak} is then in force and we may now substitute \eqref{eq:bnd on Ak final} back into \eqref{eq:long chain} to find that
In order to interpret (\ref{eq:final bnd on G},\ref{eq:feas bnd semi final},\ref{eq:suff cnd on rho}), we next estimate $B_{K}$ in \eqref{eq:defn of B}. To that end, let us first control the growth of the dual sequence $\{y_k\}_k$. Recalling \eqref{eq:y update recall} and for every $k\in [k_0:k_1+1]$, we write that
In order to interpret (\ref{eq:final bnd on G},\ref{eq:feas bnd semi final},\ref{eq:suff cnd on rho}), it still remains to estimate $\mu'$ in \eqref{eq:defn of muprime}. To that end, we first derive a lower bound on the step sizes $\{\gamma_k\}_k$. To invoke \eqref{eq:moreover}, we in turn need to gauge how smooth the augmented Lagrangian $\mathcal{L}_{\beta_k}(\cdot,y_k)$ is. For every $k\in [k_0:k_1+1]$, note that
\qquad \text{(see (\ref{eq:Bk evaluated},\ref{eq:mup to mupp}))}
\end{align}
Moreover, we can simplify the assumption in \eqref{eq:suff cnd on rho}. To be specific, thanks to (\ref{eq:Bk evaluated},\ref{eq:mup to mupp}), we can replace \eqref{eq:suff cnd on rho} with the assumption
The lower bound on the step sizes in \eqref{eq:low bnd on gammas} has two other consequences. First, we find that \eqref{eq:to be used later on} automatically holds if $k_0$ is sufficiently large. Second, it allows us to improve \eqref{eq:bnd on gamma G} by writing that
where the last line holds if there exists $c>0$ for which $k_1-k_0+1 \ge ck_1 $.
\end{proof}
Note that $\log$ factors in the convergence proof might be shaved up to a factor by choosing a more conservative step size for the dual which will automatically bound the norm of the dual. A possible choice
\begin{align}\label{eq:yk bounded}
\sigma_k \geq \max(\beta \sqrt{k}, \beta k \log^2(k+1) \| Au_k - b \|),