Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization
Nachuan Xiao, Kuangyu Ding, Xiaoyin Hu, Kim-Chuan Toh
TL;DR
The paper tackles constrained nonsmooth nonconvex optimization with $f$ and $c$ potentially nonconvex and nonsmooth, proposing a unified Lagrangian-based framework that performs a single primal subgradient step and a dual ascent using a modified penalty $H_{\rho,\beta}$. It shows that, under mild conditions, the iterates converge to $(\mathcal{D}_f,\mathcal{D}_c)$-KKT points and that the penalty is exact under a regularity condition, enabling convergence guarantees even for non-Clarke-regular problems; it also extends to expectation-constrained formulations. A key advance is the embedding framework that allows proximal SGD, proximal SGDM, and proximal ADAM to be plugged as black-box primal updates, with global convergence inherited from the underlying stochastic subgradient methods. The approach yields efficient, convergent variants of Lagrangian-based methods applicable to training constrained nonsmooth neural networks and other large-scale problems, providing a principled pathway to combine powerful proximal stochastic optimizers with constrained nonconvex nonsmooth objectives.
Abstract
In this paper, we consider the minimization of a nonsmooth nonconvex objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of $\mathbb{R}^n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We develop a unified framework for developing Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These subgradient methods are ``embedded'' into our framework, in the sense that they are incorporated as black-box updates to the primal variables. We prove that our proposed framework inherits the global convergence guarantees from these embedded subgradient methods under mild conditions. In addition, we show that our framework can be extended to solve constrained optimization problems with expectation constraints. Based on the proposed framework, we show that a wide range of existing stochastic subgradient methods, including the proximal SGD, proximal momentum SGD, and proximal ADAM, can be embedded into Lagrangian-based methods. Preliminary numerical experiments on deep learning tasks illustrate that our proposed framework yields efficient variants of Lagrangian-based methods with convergence guarantees for nonconvex nonsmooth constrained optimization problems.
