\notea{looks like major revision is required here...}
Augmented Lagrangian based methods are first proposed in~\cite{hestenes1969multiplier, powell1978fast}.
In the convex setting, standard, inexact and linearized versions of ALM are studied extensively~\cite{bertsekas1999nonlinear, bertsekas2014constrained,lan2016iteration,nedelcu2014computational}.
Some works also considered the application of ALM/ADMM to nonconvex problems~\cite{wang2015global, liu2017linearized}.
These works assume that the operator in~\eqref{prob:01} is linear, therefore, they do not apply to our setting.
%since we have a nonlinear constraint in addition to a nonconvex objective function.
Series of influential papers from Burer and Monteiro~\cite{burer2003nonlinear, burer2005local} proposed using the splitting $X=UU^\ast$ and suggested solving the problem using ALM.
First, they did not have any inexact analysis, their analysis requires primal subproblems to be solved exactly which is not practical.
%Their practical stopping condition is also not analyzed theoretically.
Secondly, they have to put an artificial bound to the primal domain which will be ineffective in practice; which is impossible to do without knowing the norm of the solution.
% and their results do not extend to the case where projection in primal domain are required in each iteration.
Lastly, their results are for convergence only, without any rate guarantees.
The authors focused on the special case of SDPs without linear constraints in~\cite{BKS15:Dropping-Convexity} and~\cite{park2016provable}.
They prove the convergence of gradient descent on Burer-Monteiro factorized formulation.
%In their followup work~, the authors considered projected gradient descent, but only for restricted strongly convex functions.
Their results are not able to extend to linear constraints and general convex functions.
%, therefore not applicable to general SDPs.
Another line of work focused on solving a specific kind of SDPs by applying gradient descent or trust regions methods on manifolds~\cite{boumal2014manopt, boumal2016global}.
The authors show that they can apply gradient descent on manifolds to satisfy the first order stationarity conditions in $\mathcal{O}(1/\epsilon^2)$ iterations.
In addition, they apply trust regions methods on manifolds to satisfy the second order stationarity conditions in $\mathcal{O}(1/\epsilon^3)$ iterations.
Firstly, these methods have to assume that the problem will be on a smooth manifold, which holds for Maximum Cut and generalized eigenvalue problems, but is not satisfied for other important SDPs such as quadratic programming (QAP) and optimal power flow.
Secondly, as noted in~\cite{boumal2016global}, per iteration cost of their method for Max-Cut problem is $\mathcal{O}({n^6})$ for solving~\eqref{eqn:nonconvex-formulation1} which is astronomically larger than our cost of $\mathcal{O}(n^2r)$ where $r \ll n$.
Another recent line of work~\cite{clason2018acceleration} focused on solving the nonlinear constrained nonconvex problem template~\eqref{eqn:nonconvex-formulation1} by adapting the primal-dual method of Chambolle and Pock~\cite{chambolle2011first}.
The authors proved the convergence of the method with rate guarantees by assuming error bound conditions on the objective function, which is not necessarily satisfied for general SDPs.
%They do not apply to the general semidefinite programming since $f(U)=\langle C, UU^\ast \rangle$. %, these conditions are not satisfied.
\cite{bhojanapalli2018smoothed} focused on the penalty formulation of~\eqref{eqn:nonconvex-formulation1} and studied the optimality of second order stationary points of the formulation.
However, their results are for connecting the stationary points of the penalty formulation of nonconvex problem to the penalty formulation of convex problem and not to the constrained problem itself.
\cite{doi:10.1137/15M1031631} can handle the same problem but their algorithm is much more complicated than ours.