Since $\gamma_k$ is determined by the line search subroutine in Lemma \ref{lem:eval Lipsc cte}, we may apply Lemma \ref{lem:11} for every iteration in this interval to find that
\sigma_{\widetilde{k}-1} \frac{ \b_k-\b_{k-1}}{\beta_{\widetilde{k}}-\beta_{\widetilde{k}-1}} & \|A(u_k)-b\| > \frac{\sigma_{k_0}}{\sqrt{k\log {^2}(k+1)}} \text{ and } k \le k_{\max},\\
\\
\min\left( \sigma_{\widetilde{k}-1} \frac{ \b_k-\b_{k-1}}{\beta_{\widetilde{k}}-\beta_{\widetilde{k}-1}} , \frac{\|A(u_k)-b\|}{k\log^2(k+1)} \right)& \|A(u_k)-b\| > \frac{\sigma_{k_0}}{\sqrt{k\log {^2}(k+1)}} \text{ and } k > k_{\max},
\end{cases}
\label{eq:sigma rule}
\end{align}
for a threshold $\sigma_{k_0}\ge \beta_{k_0}$ and integer $k_{\max}$. {Above, $\widetilde{k}=\widetilde{k}(k)$ is the iteration at which the most recent transition occurs into the third update regime. Indeed, the factor depending on $\widetilde{k}$ is introduced above to ensure the continuity of the sequence $\{\sigma_k\}_{k\ge 0}$.}
%For a threshold $k_{\max}$, the third regime of \eqref{eq:sigma rule} is replaced with $\|A(u_k)-b\|/(k \log^2(k+1))$ when $k \ge k_{\max}$, namely,
For the record, a consequence of the above updates is that $\{\s_k\}_{k\ge 0}$ is a nonincreasing sequence and, in particular,
More specifically, for every $k\in K_{\beta,l}$, the first update in \eqref{eq:beta rule} is in force whereas, for every $k\in K_{\beta,u}$, the second update in \eqref{eq:beta rule} is in force. Likewise, let us partition $K$ to disjoint subsets
For every $k\in K_{\sigma,l}$, the first update in \eqref{eq:sigma rule} is in force whereas, for every $k\in K_{\sigma,m}$, the second update in \eqref{eq:sigma rule} is in force and, for every $k\in K_{\sigma,u}$, the third update there is active. By their definitions in (\ref{eq:beta rule},\ref{eq:sigma rule}) and the assumption that $\s_{k_0}\ge \b_{k_0}$, we observe that $K_{\beta,l}\cap K_{\s,u}=\emptyset$, namely,
\begin{equation}
\sigma_k >0, \qquad \forall k\in K.
\end{equation}
The partitioning in \eqref{eq:partition sigma} allows us to break the sum in the last line of \eqref{eq:long chain broken} to three sums.
To evaluate the sum over $K_{\sigma,l}$, we write that
where the second to last line assumes that $k_0\gg 1$ and the last line uses \eqref{eq:sigma rule}.
To control the last sum above, let us further write that
\begin{align}
K_{\s,m}=\cup_{j=1}^{J_{\s,m}} K_{\s,m,j},
\label{eq:another partition}
\end{align}
where $\{K_{\s,m,j}\}_{j=1}^{J_{\s,m}}$ are disjoint intervals, with the starting points $\{k_{\s,m,j}\}_{j=1}^{J_{\s,m}}$. With the partitioning in \eqref{eq:another partition}, we write that
where $\{K_{\s,u,j}\}_{j=1}^{J_{\s,u}}$ are disjoint intervals, with the starting points $\{\widetilde{k}_j\}_{j=1}^{J_{\s,u}}$; recall \eqref{eq:sigma rule}.} To evaluate the sum in the last line of \eqref{eq:long chain broken} over $K_{\s,u}$,
But the last line above is bounded. If, to the contrary, the last line above is unbounded, then the subsequences $\{\b_k,\s_k\}_{k\ge k_2}$ would never enter the second and third regimes in, respectively, (\ref{eq:beta rule},\ref{eq:sigma rule}) for a sufficiently large $k_2$ and, in that case, we must have that $\widetilde{k}_j \le k_2$ for every $j\in J_{\s,u}$. Therefore, $J_{\s,u}$ is a bounded set, which contradicts the unboundedness of the last line of \eqref{eq:boundedness}. Let us then set
By combining (\ref{eq:long chain broke 2},\ref{eq:long chain broke 3},\ref{eq:long chain broke 0}), we can finally simplify \eqref{eq:long chain broken}: When $k_0$ is sufficiently large, there exists a sufficiently large $\delta_{k_0}$, which depends only on $k_0$, such that
thanks to the continuity of $A$ and compactness of $C$.
Note that \eqref{eq:long chain} bounds the gradient mapping with the feasibility gap. We next find a converse, thus bounding the feasibility gap with the gradient mapping. To that end, let $T_C(u)$ and $P_{T_C(u)}$ be the tangent cone of $C$ at $u\in C$ and orthogonal projection onto this subspace, respectively. Likewise, let $N_C(u)$ and $P_{N_{C}(u)}$ be the normal cone of $C$ at $u$ and the corresponding orthogonal projection.
The update rule for $u_k$ in \eqref{eq:update uk recall} immediately implies that
In the second line above, we used the identity $P_T(a+b) = P_T(a)+P_T(b)$, for points $a,b$ and convex cone $T$. The last line above uses \eqref{eq:y update recall}.
After rearranging and applying the triangle inequality to the last line above, we reach
Roughly speaking, \eqref{eq:bnd on Ak final} states that the feasibility gap is itself bounded by the gradient map. In order to apply Lemma \ref{lem:bnd bnd Ak}, let us assume for now that (\ref{eq:good neighb}) holds. Recall also the partitioning of $K$ in \eqref{eq:partition beta}.
We may now substitute \eqref{eq:bnd on Ak final} back into \eqref{eq:long chain} to find that
where the last line above uses (\ref{eq:defn of c},\ref{eq:bnd on Ak final}).
Consider the partitioning
\begin{align}
K_{\beta,u} = \cup_{j=1}^{J_\b} K_{\beta,u,j},
\label{eq:partition on beta}
\end{align}
where $\{K_{\beta,u,j}\}_{j=1}^{J_{\b}}$ are disjoint intervals, with the starting points $\{k_{\beta,u,j}\}_{j=1}^{J_{\b,u}}$. With the partitioning in \eqref{eq:partition on beta} at hand and after recalling \eqref{eq:beta rule}, we simplify the last line of \eqref{eq:adaptive 1} as
Indeed, $\b'_{k_0}$ above is finite (and depends only on $k_0$). If, to the contrary, the series on the right-hand side of \eqref{eq:defn of kappa} diverges, then there must exists a large enough $k_2$ such that the subsequence $\{\b_{k} \}_{k\ge k_2}$ remains in the first regime of \eqref{eq:beta rule}. Therefore, $J_\b$ must be finite and so does the right-hand side of \eqref{eq:defn of kappa}, thus contradicting the unboundedness of $\b'_{k_0}$ .
To simplify the last line of \eqref{eq:to be used for feas}, let us assume that
\begin{equation}
\frac{12 \beta_{k_0}'\delta_{k_0}}{\eta_{\min}^2 k \log^2(k+1)} \le \frac{\gamma_k}{2},
\qquad \forall k\in K.
\label{eq:to be used later on}
\end{equation}
We also let $|K|=k_1-k_0+1$ denote the size of the interval $K$.
After rearranging \eqref{eq:to be used for feas} and applying \eqref{eq:to be used later on}, we arrive at
In order to interpret (\ref{eq:final bnd on G},\ref{eq:feas bnd semi final}), we next estimate $B_{K}$, as defined in \eqref{eq:defn of B}. To that end, let us first control the growth of the dual sequence, namely, $\{y_k\}_k$. Recalling \eqref{eq:y update recall} and for every $k\in K$, we write that
To evaluate the sum over $K_{\s,m}$ in the last line above, recall the partitioning $K_{\s,m}=\cup_{j=1}^{J_{\s,m}} K_{\s,m,j}$ in \eqref{eq:another partition}. Recall also that each $K_{\s,m,j}$ is an interval with the starting point $k_{\s,m,j}$. This allows us to write that
which ensures that \eqref{eq:good neighb} holds. We emphasize that the left-hand side above is bounded, as proved in (\ref{eq:long chain},\ref{eq:defn of kappa}).
\begin{remark}
Note that \eqref{eq:suff cnd on rho} controls the initial feasibility $\|A(u_{k_0})-b\|$, which indeed appears in (\ref{eq:defn mu},\ref{eq:defn of mup}).
\end{remark}
By combining (\ref{eq:final bnd on G},\ref{eq:feas bnd semi final},\ref{eq:Bk evaluated}), we reach
where the last line above uses \eqref{eq:Bk evaluated}.
To better interpret \eqref{eq:raw bnd on sum}, we next find a lower bound on the primal step sizes $\{\gamma_k\}_k$ by invoking \eqref{eq:moreover}. To do so, we first need to gauge how smooth the augmented Lagrangian $\mathcal{L}_{\beta_k}(\cdot,y_k)$ is (as a function of its first argument). For every $k\in K$ and after recalling Lemma \ref{lem:11}, we note that
\qquad \text{(see \eqref{eq:smoothness at k})} \nonumber\\
& =: \gamma' \sqrt{\frac{k_0 \log^2 (k_0+1) }{ k \log^2(k+1)}},
\label{eq:low bnd on gammas}
\end{align}
for every $k\in K$.
The lower bound on the step sizes in \eqref{eq:low bnd on gammas} has two immediate consequences. First, we find that \eqref{eq:to be used later on} automatically holds if $k_0$ is sufficiently large, because both $\delta_{k_0}$ and $\b'_{k_0}$ are bounded, respectively, as (\ref{eq:defn of delta},\ref{eq:defn of kappa}). Second, \eqref{eq:low bnd on gammas} allows us to better interpret \eqref{eq:raw bnd on sum}, namely,
where $\gamma'$ is independent of $k_1$ and $\lesssim$ suppresses the dependence on all parameters except $k_1$.
\begin{theorem}
\label{thm:main}
Let $\gamma_k$ be the output of the line search subroutine in our algorithm in iteration $k$.
For integers $k_0\le k_1$, consider the interval $K=[k_0:k_1]$ and the choice of parameters in (\ref{eq:beta rule},\ref{eq:sigma rule}) for $\s_{k_0}\ge \b_{k_0}>0$ and an integer $k_{\max}$. We impose the following geometric requirements on the constraints. Let $P_{T_C(u)}$ and $P_{N_C(u)}$ denote the orthogonal projection onto the tangent and normal cones at $u\in C$, respectively.
Consider a subspace $S_{K}\subseteq \mathbb{R}^d$ such that
\begin{align}
S_{K} \supseteq \bigcup_{k\in K} T_C(u_k),
\end{align}
and, with some abuse of notation, let $S_{K}$ also denote an orthonormal basis for this subspace. For $\rho>0$, we assume that the nonconvex Slater's condition holds. That is, assume that $\eta_{\min}>0$, where
and that $\rho$ is large enough so that (\ref{eq:suff cnd on rho}) holds, where the left-hand side is independent of $k_1$. Assuming that $k_0\gg 1$, it then holds that
where $\gamma'$ is independent of $k_1$ and $\lesssim$ suppresses the dependence on all parameters except $k_1$. The exact expression for the right-hand side of (\ref{eq:final thm}) is given in (\ref{eq:detailed final bnd}). The factor $\gamma'$ is defined in (\ref{eq:low bnd on gammas}).