introduction.tex
No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Thu, May 16, 05:06

introduction.tex
View Options

	\newpage
	\section{Introduction}
	\label{intro}
	We study the non-convex optimization program
	\begin{equation}
	\label{prob:01}
	\begin{cases}
	\underset{x}{\min}\,\, f(x)+g(x)\\
	A(x) = 0,
	\end{cases}
	\end{equation}
	where (possibly nonconvex) $f:\mathbb{R}^d\rightarrow\mathbb{R}$ and (possibly nonlinear) $A:\mathbb{R}^d\rightarrow\mathbb{R}^m$ satisfy
	\begin{align*}
	\\| \nabla f(x) - \nabla f(x')\\| \le \lambda_f \\|x-x'\\|,
	\end{align*}
	\begin{align}
	\\| DA(x) - DA(x') \\| \le \lambda_A \\|x-x'\\|,
	\label{eq:smoothness basic}
	\end{align}
	for every $x,x'\in \mathbb{R}^d$. Above, $\nabla f(x)\in \mathbb{R}^d$ is the gradient of $f$ and $DA(x)\in \mathbb{R}^{m\times d}$ is the Jacobian of $A$. Moreover, we assume that $g:\mathbb{R}^d\rightarrow\mathbb{R}$ is convex (but not necessarily smooth). For instance, for a convex set $C\subset\RR^d$ and letting $g=1_C$ be the convex indicator function on $C$, Program~\eqref{prob:01} is a noncnovex optimization problem with the convex constraint $x\in C$. Throughout, we only assume that computing the proximal operator of $g$ is easy and inexpensive, see Section~\ref{sec:preliminaries}.


	A broad range of problems in computer science \cite{khot2011grothendieck, lovasz2003semidefinite}, machine learning \cite{mossel2015consistency, song2007dependence}, and signal processing \cite{singer2011angular, singer2011three} naturally fall under this template, including but not limited to maximum cut, clustering, generalized eigenvalue, as well as community detection. See Section ? for several examples. \textbf{We should explain this better and give more high level examples to motivate this program.}
	%An example of our template in \eqref{prob:01} is semi-definite programming which provides powerful relaxations to above problems. Denote the space of $d'\times d'$ symmetric matrices by $\mathbb{S}^{d'\times d'}$ and consider
	%
	%\begin{equation}
	% \label{sdp}
	% \min_x \{h'(x): A'(x) = b', ~~x \in \mathcal{C'}~~\text{and}~~x\succeq0 \}
	%\end{equation}
	%
	%where $h': \mathbb{S}^{d'\times d'} \to \RR$, $A'\colon\mathbb{S}^{d'\times d'}\to\RR^m$, $b'\in\RR^m$, and $C' \subseteq \mathbb{R}^{d'\times d'}$. This template clearly can be put to the form of \eqref{prob:01} by using \emph{Burer-Monteiro factorization} \cite{burer2003nonlinear, burer2005local}.

	The \emph{augmented Lagrangian method} \cite{luenberger1984linear} is a powerful approach to solve Program \eqref{prob:01}, see Section \ref{sec:related work} for a review of the related literature as well as other approaches to solve Program \eqref{prob:01}. Indeed, for positive $\beta$, it is easy to verify that Program \eqref{prob:01} is equivalent to
	\begin{align}
	\min_{x} \max_y \,\,\mathcal{L}_\beta(x,y) + g(x),
	\label{eq:minmax}
	\end{align}
	where
	\begin{align}
	\label{eq:Lagrangian}
	\mathcal{L}_\beta(x,y) := f(x) + \langle A(x), y \rangle + \frac{\beta}{2}\\|A(x)\\|^2,
	\end{align}
	is the augmented Lagrangian corresponding to Program~\eqref{prob:01}. The equivalent formulation in Program \eqref{eq:minmax} naturally suggests the following algorithm to solve Program \eqref{prob:01}:
	\begin{equation}\label{e:exac}
	x_{k+1} \in \underset{x}{\argmin} \, \mathcal{L}_{\beta}(x,y_k)+g(x),
	\end{equation}
	\begin{equation*}
	y_{k+1} = y_k+\s_k A(x_{k+1}),
	\end{equation*}
	for dual step sizes $\{\s_k\}_k$.
	However, updating $x$ in the augmented Lagrangian method requires solving the non-convex Program~\eqref{e:exac} to global optimality, which is typically intractable. It is instead often easier to find an approximate first-order stationary point of Program~\eqref{e:exac} using, for example, the proximal gradient descent algorithm. Therefore, we consider an algorithm that only requires approximate stationarity in every iteration. Moreover, by gradually decreasing the stationarity tolerance and increasing the penalty weight $\b$ above, we can reach a stationary point of Program~\eqref{eq:Lagrangian}, as detailed in Section~\ref{sec:AL algorithm}.




	\paragraph{{\textbf{Contributions.}} }

	\textbf{To be written...}

introduction.texNo OneTemporaryActions

File Metadata

introduction.texView Options

Event Timeline

introduction.tex
No OneTemporary
Actions

introduction.tex
View Options