which completes the proof of Lemma \ref{lem:smoothness}.
\section{Proof of Lemma \ref{lem:11}\label{sec:proof of descent lemma}}
Throughout, let
\begin{align}
G = G_{\b,\g}(x,y) = \frac{x-x^+}{\g},
\end{align}
for short.
Suppose that $\|A(x)\|\le\rho$, $\|x\|\le\rho$, and similarly $\|A(x^+)\|\le\rho$, $\|x^+\|\le\rho'$. An application of Lemma \ref{lem:smoothness} yields that
To lower bound the first line of \eqref{eq:slater proof 0}, we invoke the restricted injectivity in Section~\ref{sec:slater}.
%
%
%
%must impose a restricted injectivity as follows.
%We let $S_{K}$ with orthonormal columns denote a basis for the subspace $S_K$ in \eqref{eq:defn of S}, to simplify the notation. We then assume that $ \nu_A>0$, where
where the second line above again uses the non-expansiveness of $P_{S}$ and $P_{\cone(\partial g(x_{k+1}))^*}$. The remaining term in \eqref{eq:slater proof 1} is bounded as
and completes the proof of Lemma \ref{lem:bnd bnd Ak}.
%\section{Clustering: Verifying the Geometric Regularity}
%
%Note that
%\begin{align}
%A(x) = VV^\top \mathbf{1}- \mathbf{1},
%\end{align}
%\begin{align}
%DA(x) & =
%\left[
%\begin{array}{cccc}
%w_{1,1} x_1^\top & \cdots & w_{1,n} x_{1}^\top\\
%%x_2^\top p& \cdots & x_{2}^\top\\
%\vdots\\
%w_{n,1}x_{n}^\top & \cdots & w_{n,n}1 x_{n}^\top
%\end{array}
%\right] \nonumber\\
%& = \left[
%\begin{array}{ccc}
%V & \cdots & V
%\end{array}
%\right]
%+
%\left[
%\begin{array}{ccc}
%x_1^\top & \\
%& \ddots & \\
%& & x_n^\top
%\end{array}
%\right],
%\label{eq:Jacobian clustering}
%\end{align}
%%where $I_n\in\RR^{n\times n}$ is the identity matrix,
%where $w_{i.i}=2$ and $w_{i,j}=1$ for $i\ne j$. In the last line above, $n$ copies of $V$ appear and the last matrix above is block-diagonal. For $x_k$, define $V_k$ accordingly and let $x_{k,i}$ be the $i$th row of $V_k$.
%where $I_n\in \RR^{n\times n}$ is the identity matrix.
%Let us make a number of simplifying assumptions. First, we assume that $\|x_k\|< \sqrt{s}$ (which can be enforced in the iterates by replacing $C$ with $(1-\epsilon)C$ for a small positive $\epsilon$ in the subproblems). Under this assumption, it follows that
%\begin{align}
%(\partial g(x_k))_i =
%\begin{cases}
%0 & (x_k)_i > 0\\
%\{a: a\le 0\} & (x_k)_i = 0,
%\end{cases}
%\qquad i\le d.
%\label{eq:exp-subgrad-cluster}
%\end{align}
%Second, we assume that $V_k$ has nearly orthonormal columns, namely, $V_k^\top V_k \approx I_n$. This can also be enforced in each iterate of Algorithm~\ref{Algo:2} and naturally corresponds to well-separated clusters. While a more fine-tuned argument can remove these assumptions, they will help us simplify the presentation here. Under these assumptions, the (squared) right-hand side of \eqref{eq:regularity} becomes
%Therefore, given a prescribed $\nu$, ensuring $\min_i \|x_{k,i}\| \ge \nu$ guarantees \eqref{eq:regularity}. When the algorithm is initialized close enough to the constraint set, there is indeed no need to separately enforce \eqref{eq:final-cnd-cluster}. In practice, often $n$ exceeds the number of true clusters and a more intricate analysis is required to establish \eqref{eq:regularity} by restricting the argument to a particular subspace of $\RR^n$.
\section{Proof of Proposition \ref{prop:convex}\label{sec:proof of prop}}
To prove the first claim of the proposition, suppose that the Slater's condition does not hold, namely, suppose that
\begin{equation}
\Null(A)\cap\relint(C) = \emptyset,
\label{eq:no feas}
\end{equation}
where $\Null(A)$ and $\relint(C)$ denote the null space of the matrix $A$ and the relative interior of $C$, respectively.
We have assumed that \eqref{prob:01} is feasible, namely, there exists $x\in C$ such that $Ax=0$. It follows from \eqref{eq:no feas} that $x\in\text{boundary}(C)$ and that $\Null(A)$ supports $C$ at $x$, namely,
$A x'\ge0$, for every $x'\in C$. (The inequality $Ax'\ge0$ applies to each entry of the vector $Ax'$.)
Consequently, $\Null(A)\cap T_C(x)\ne\{0\}$, where $T_C(x)$ is the tangent cone of the set $C$ at $x$.
Equivalently, it holds that
$\row(A)\cap N_C(x)\ne\{0\}$, where $\row(A)$ is the row space of the matrix $A$ and $N_C(x)$ is the normal cone to $C$ at $x$. That is, there exists a unit-norm vector $v$ such that
$P_{T_C(x)}A^\top v=0$
and, consequently, $P_S P_{T_C(x)}A^\top v=0$. Let us take $\rho'=\|x\|$ in~\eqref{eq:nu for cvx}. We then conclude that
namely, the geometric regularity also does not hold for any $\rho\ge0$ and $\rho'=\|x\|$. The first identity above follows from the homogenity of the right-hand side of \eqref{eq:nu for cvx}.
Because increasing $\rho'$ cannot increase the right-hand side of \eqref{eq:nu for cvx}, we find that $\widetilde{\nu}(g,A,S,\infty,\max_{x\in C} \|x\|)=0$, which proves the first claim in Proposition~\ref{prop:convex}.
For the converse, we can verify that it suffices to take $\text{row}(A)\subseteq S$.
Next, recall \eqref{eq:nu for cvx} and suppose that there exists $x\in\RR^d$ such that
\begin{align}
\eta_{\min} ( P_S P_{T_C(x)} A^\top ) =0.
\label{eq:counter-assump}
\end{align}
Throughout, we assume without loss of generality that $x\in C$. Indeed, otherwise $x$ would not be feasible for problem \eqref{prob:01} with $g=1_C$ and cannot be used to study the Slater's condition in \eqref{prob:01}. Note that, by assumption in Proposition~\ref{prop:convex}, \eqref{eq:counter-assump} can be rewritten as
\begin{align}
\eta_{\min}(P_S P_{T_C(x)} A^\top)
& = \eta_{\min}( P_{T_C(x)} A^\top) =0,
\qquad (S = \text{aff}(C))
\label{eq:counter-assump-rewritten}
\end{align}
where $\text{aff}(C)$ is the affine hull of $C$.
Then, since $\|x\|\le\rho'<\infty$ in \eqref{eq:nu for cvx}, we can assume throughout that $\text{boundary}(C)\cap B_{\rho'}\ne\emptyset$ and, moreover, $x\in\text{boundary}(C)$. (Here, $B_{\rho'}=\{z : \|z\|\le\rho' \}$ is the ball of radius $\rho'$ at the origin.) Indeed, otherwise if $x\in\text{relint}(C)$, we have that $T_C(x)=S$ and thus
which contradicts \eqref{eq:counter-assump}. The last line above holds because, by assumption in Proposition~\ref{prop:convex}, $A$ is full-rank. To reiterate, we assume that $x\in\text{boundary}(C)$. Therefore, by \eqref{eq:counter-assump-rewritten}, there exists a unit-norm $u\in\row(A)$ such that $u\in N_C(x)$. In turn, this implies that $\text{null}(A)\cap\text{int}(C)=\emptyset$. Indeed, otherwise, any vector $v\in\text{null}(A)\cap\text{int}(C)$ satisfies $\langle u, v \rangle <0$, which is impossible because $u\in\text{row}(A)$ and $v\in\text{null}(A)$ are orthogonal vectors. That is, the Slater's condition does not hold, which proves the second claim in Proposition~\ref{prop:convex}.