%Denote $B_k = \dist( -\nabla_x \L_{\b_{k-1}}(x_k,y_k),\partial g(x_k)) + \| A(x_k)\|$ for every $k$. Using the convergence proof of the outer algorithm, we have the following bound
Let $K$ denote the number of (outer) iterations of Algorithm~\ref{Algo:2} and let $\epsilon_{f}$ denote the desired accuracy of Algorithm~\ref{Algo:2}, see~(\ref{eq:inclu3}). Recalling Theorem~\ref{thm:main}, we can then write that
\begin{equation}
\epsilon_{f} = \frac{Q}{\b_{K}},
\label{eq:acc_to_b}
\end{equation}
or, equivalently, $\b_{K} = Q/\epsilon_{f}$.
%where $K$ denotes the last outer iterate and $\epsilon$ is the final accuracy we would like to get for the optimality conditions given in~\eqref{eq:inclu3}.
We now count the number of total (inner) iterations $T$ of Algorithm~\ref{Algo:2} to reach the accuracy $\epsilon_{f}$. From \eqref{eq:smoothness of Lagrangian} and for sufficiently large $k$, recall that $\lambda_{\b_k}\le \lambda'' \b_k$ is the smoothness parameter of the augmented Lagrangian. Then, from \eqref{eq:iter_1storder} ad by summing over the outer iterations, we bound the total number of (inner) iterations of Algorithm~\ref{Algo:2} as
\begin{align}\label{eq: tk_bound}
T &= \sum_{k=1}^K\mathcal{O}\left ( \frac{\lambda_{\beta_{k-1}}^2 x_{\max}^2 }{\epsilon_k} \right) \nonumber\\
%Given that $b\leq e$, the bound suggests picking $e$ as small as possible and $b$ as big as possible.
%
%To minimize the total number of iterations in both cases with flexible $b$ and $e$, the bounds suggest to pick $b=e=\alpha$ and take this value to be as large as possible.
\section{Proof of Lemma \ref{lem:smoothness}\label{sec:proof of smoothness lemma}}
which completes the proof of Lemma \ref{lem:smoothness}.
%\section{Proof of Lemma \ref{lem:11}\label{sec:proof of descent lemma}}
%
%Throughout, let
%\begin{align}
%G = G_{\b,\g}(x,y) = \frac{x-x^+}{\g},
%\end{align}
%for short.
%Suppose that $\|A(x)\|\le \rho$, $\|x\|\le \rho$, and similarly $\|A(x^+)\|\le \rho$, $\|x^+\|\le \rho'$. An application of Lemma \ref{lem:smoothness} yields that
To bound the last line above, let $x_*$ be a solution of Program~\eqref{prob:01} and note that $\overline{B} x_*^{\circ 2} = b $ by definition. Let also $z_k,z_*\in \RR^d$ denote the vectors corresponding to $x_k,x_*$. Corresponding to $x_k$, also define $u_{k,1},u_{k,2}$ naturally and let $|z_k| = u_{k,1}^{\circ 2} + u_{k,2}^{\circ 2} \in \RR^d$ be the amplitudes of $z_k$. To simplify matters, let us assume also that $B$ is full-rank.
We then rewrite the last line of \eqref{eq:cnd-bp-pre} as
where $\eta_n(\cdot)$ returns the $n$th largest singular value of its argument. We can therefore ensure that \eqref{eq:regularity} holds by enforcing that
\begin{align}
z_k \in \left\{ z \in \RR^d: \eta_n( B \text{diag}(|z|)) > \nu \right\},
\end{align}
for every iteration $k$.
\section{Clustering \label{sec:verification2}}
We only verify the condition in~\eqref{eq:regularity}.
Note that
\begin{align}
A(x) = VV^\top \mathbf{1}- \mathbf{1},
\end{align}
\begin{align}
DA(x) & =
\left[
\begin{array}{cccc}
w_{1,1} x_1^\top & \cdots & w_{1,n} x_{1}^\top\\
%x_2^\top p& \cdots & x_{2}^\top\\
\vdots\\
w_{n,1}x_{n}^\top & \cdots & w_{n,n}1 x_{n}^\top
\end{array}
\right] \nonumber\\
& = \left[
\begin{array}{ccc}
V & \cdots & V
\end{array}
\right]
+
\left[
\begin{array}{ccc}
x_1^\top & \\
& \ddots & \\
& & x_n^\top
\end{array}
\right],
\label{eq:Jacobian clustering}
\end{align}
%where $I_n\in\RR^{n\times n}$ is the identity matrix,
where $w_{i.i}=2$ and $w_{i,j}=1$ for $i\ne j$. In the last line above, $n$ copies of $V$ appear and the last matrix above is block-diagonal. For $x_k$, define $V_k$ as in the example and let $x_{k,i}$ be the $i$th row of $V_k$.
where $I_n\in \RR^{n\times n}$ is the identity matrix.
Let us make a number of simplifying assumptions. First, we assume that $x_k\in \text{relint}(C)$ so that $\partial g(x_k)=\{0\}$, which can be easily enforced in the iterates. Second, we assume that $V_k$ has nearly orthonormal columns, namely, $V_k V_k^\top \approx I_n$. This can also be easily enforced in each iterate of Algorithm~\ref{Algo:2} and naturally corresponds to well-separated clusters. While a more fine-tuned argument can remove these assumptions, they will help us simplify the derivations. Under these assumptions, the squared right-hand side of \eqref{eq:regularity} becomes
We can enforce the iterates to satisfy $\|x_{k,i}\| \ge \nu$, which corresponds again to well-separated clusters, and guarantee \eqref{eq:regularity}. In practice, often $n$ exceeds the number of true clusters and a more fine-tuned analysis is required to establish \eqref{eq:regularity} by restricting the argument to a particular subspace of $\RR^n$.
\section{$\ell_\infty$ denoising with a generative prior}
The authors of \cite{Ilyas2017} have proposed to
project onto the range of a Generative Adversarial network (GAN)
\cite{Goodfellow2014}, as a way to defend against adversarial examples. For a
given noisy observation $x^* + \eta$, they consider a projection in the
$\ell_2$ norm. We instead propose to use our augmented Lagrangian method to
denoise in the $\ell_\infty$ norm, a much harder task: