Table of Contents
Fetching ...

Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization

Nachuan Xiao, Kuangyu Ding, Xiaoyin Hu, Kim-Chuan Toh

TL;DR

The paper tackles constrained nonsmooth nonconvex optimization with $f$ and $c$ potentially nonconvex and nonsmooth, proposing a unified Lagrangian-based framework that performs a single primal subgradient step and a dual ascent using a modified penalty $H_{\rho,\beta}$. It shows that, under mild conditions, the iterates converge to $(\mathcal{D}_f,\mathcal{D}_c)$-KKT points and that the penalty is exact under a regularity condition, enabling convergence guarantees even for non-Clarke-regular problems; it also extends to expectation-constrained formulations. A key advance is the embedding framework that allows proximal SGD, proximal SGDM, and proximal ADAM to be plugged as black-box primal updates, with global convergence inherited from the underlying stochastic subgradient methods. The approach yields efficient, convergent variants of Lagrangian-based methods applicable to training constrained nonsmooth neural networks and other large-scale problems, providing a principled pathway to combine powerful proximal stochastic optimizers with constrained nonconvex nonsmooth objectives.

Abstract

In this paper, we consider the minimization of a nonsmooth nonconvex objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of $\mathbb{R}^n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We develop a unified framework for developing Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These subgradient methods are ``embedded'' into our framework, in the sense that they are incorporated as black-box updates to the primal variables. We prove that our proposed framework inherits the global convergence guarantees from these embedded subgradient methods under mild conditions. In addition, we show that our framework can be extended to solve constrained optimization problems with expectation constraints. Based on the proposed framework, we show that a wide range of existing stochastic subgradient methods, including the proximal SGD, proximal momentum SGD, and proximal ADAM, can be embedded into Lagrangian-based methods. Preliminary numerical experiments on deep learning tasks illustrate that our proposed framework yields efficient variants of Lagrangian-based methods with convergence guarantees for nonconvex nonsmooth constrained optimization problems.

Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization

TL;DR

The paper tackles constrained nonsmooth nonconvex optimization with and potentially nonconvex and nonsmooth, proposing a unified Lagrangian-based framework that performs a single primal subgradient step and a dual ascent using a modified penalty . It shows that, under mild conditions, the iterates converge to -KKT points and that the penalty is exact under a regularity condition, enabling convergence guarantees even for non-Clarke-regular problems; it also extends to expectation-constrained formulations. A key advance is the embedding framework that allows proximal SGD, proximal SGDM, and proximal ADAM to be plugged as black-box primal updates, with global convergence inherited from the underlying stochastic subgradient methods. The approach yields efficient, convergent variants of Lagrangian-based methods applicable to training constrained nonsmooth neural networks and other large-scale problems, providing a principled pathway to combine powerful proximal stochastic optimizers with constrained nonconvex nonsmooth objectives.

Abstract

In this paper, we consider the minimization of a nonsmooth nonconvex objective function over a closed convex subset of , with additional nonsmooth nonconvex constraints . We develop a unified framework for developing Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These subgradient methods are ``embedded'' into our framework, in the sense that they are incorporated as black-box updates to the primal variables. We prove that our proposed framework inherits the global convergence guarantees from these embedded subgradient methods under mild conditions. In addition, we show that our framework can be extended to solve constrained optimization problems with expectation constraints. Based on the proposed framework, we show that a wide range of existing stochastic subgradient methods, including the proximal SGD, proximal momentum SGD, and proximal ADAM, can be embedded into Lagrangian-based methods. Preliminary numerical experiments on deep learning tasks illustrate that our proposed framework yields efficient variants of Lagrangian-based methods with convergence guarantees for nonconvex nonsmooth constrained optimization problems.
Paper Structure (21 sections, 23 theorems, 90 equations, 5 figures)

This paper contains 21 sections, 23 theorems, 90 equations, 5 figures.

Key Result

Proposition 2.8

Let $h: \mathbb{R}^n \to \mathbb{R}$ be a path-differentiable function that admits $\mathcal{D}_h$ as its conservative field. Suppose $h$, $\mathcal{D}_h$ and ${ \mathcal{X} }$ are definable over $\mathbb{R}^n$, then $\{h(x): x \in { \mathcal{X} },~ 0\in \mathrm{conv}(\mathcal{D}_h(x)) + \mathcal{N}

Figures (5)

  • Figure 1: A brief illustration of our results on embedding stochastic subgradient methods, which are designed to solve \ref{['Prob_simple_min_X']}, into linearized Lagrangian-based methods through the framework \ref{['Eq_Framework']}.
  • Figure 2: Numerical results on training LeNet10 on MNIST with constraints.
  • Figure 3: Numerical results on training ResNet14 on CIFAR10 with constraints.
  • Figure 4: SGDM-LALM with different stepsizes $\alpha$ to train LeNet10 on MNIST. Other parameters are fixed.
  • Figure 5: ADAM-LALM with different $\rho$ to train LeNet10 on MNIST. Other parameters are fixed.

Theorems & Definitions (50)

  • Definition 2.1: clarke1990optimization
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4: Aumann’s integral
  • Definition 2.5
  • Definition 2.6
  • Definition 2.7
  • Proposition 2.8: Corollary 5 in bolte2007clarke
  • proof
  • Definition 2.9
  • ...and 40 more