Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization

Nachuan Xiao; Kuangyu Ding; Xiaoyin Hu; Kim-Chuan Toh

Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization

Nachuan Xiao, Kuangyu Ding, Xiaoyin Hu, Kim-Chuan Toh

TL;DR

The paper tackles constrained nonsmooth nonconvex optimization with $f$ and $c$ potentially nonconvex and nonsmooth, proposing a unified Lagrangian-based framework that performs a single primal subgradient step and a dual ascent using a modified penalty $H_{\rho,\beta}$. It shows that, under mild conditions, the iterates converge to $(\mathcal{D}_f,\mathcal{D}_c)$-KKT points and that the penalty is exact under a regularity condition, enabling convergence guarantees even for non-Clarke-regular problems; it also extends to expectation-constrained formulations. A key advance is the embedding framework that allows proximal SGD, proximal SGDM, and proximal ADAM to be plugged as black-box primal updates, with global convergence inherited from the underlying stochastic subgradient methods. The approach yields efficient, convergent variants of Lagrangian-based methods applicable to training constrained nonsmooth neural networks and other large-scale problems, providing a principled pathway to combine powerful proximal stochastic optimizers with constrained nonconvex nonsmooth objectives.

Abstract

In this paper, we consider the minimization of a nonsmooth nonconvex objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of $\mathbb{R}^n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We develop a unified framework for developing Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These subgradient methods are ``embedded'' into our framework, in the sense that they are incorporated as black-box updates to the primal variables. We prove that our proposed framework inherits the global convergence guarantees from these embedded subgradient methods under mild conditions. In addition, we show that our framework can be extended to solve constrained optimization problems with expectation constraints. Based on the proposed framework, we show that a wide range of existing stochastic subgradient methods, including the proximal SGD, proximal momentum SGD, and proximal ADAM, can be embedded into Lagrangian-based methods. Preliminary numerical experiments on deep learning tasks illustrate that our proposed framework yields efficient variants of Lagrangian-based methods with convergence guarantees for nonconvex nonsmooth constrained optimization problems.

Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization

TL;DR

The paper tackles constrained nonsmooth nonconvex optimization with

and

potentially nonconvex and nonsmooth, proposing a unified Lagrangian-based framework that performs a single primal subgradient step and a dual ascent using a modified penalty

. It shows that, under mild conditions, the iterates converge to

-KKT points and that the penalty is exact under a regularity condition, enabling convergence guarantees even for non-Clarke-regular problems; it also extends to expectation-constrained formulations. A key advance is the embedding framework that allows proximal SGD, proximal SGDM, and proximal ADAM to be plugged as black-box primal updates, with global convergence inherited from the underlying stochastic subgradient methods. The approach yields efficient, convergent variants of Lagrangian-based methods applicable to training constrained nonsmooth neural networks and other large-scale problems, providing a principled pathway to combine powerful proximal stochastic optimizers with constrained nonconvex nonsmooth objectives.

Abstract

In this paper, we consider the minimization of a nonsmooth nonconvex objective function

over a closed convex subset

, with additional nonsmooth nonconvex constraints

. We develop a unified framework for developing Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These subgradient methods are ``embedded'' into our framework, in the sense that they are incorporated as black-box updates to the primal variables. We prove that our proposed framework inherits the global convergence guarantees from these embedded subgradient methods under mild conditions. In addition, we show that our framework can be extended to solve constrained optimization problems with expectation constraints. Based on the proposed framework, we show that a wide range of existing stochastic subgradient methods, including the proximal SGD, proximal momentum SGD, and proximal ADAM, can be embedded into Lagrangian-based methods. Preliminary numerical experiments on deep learning tasks illustrate that our proposed framework yields efficient variants of Lagrangian-based methods with convergence guarantees for nonconvex nonsmooth constrained optimization problems.

Paper Structure (21 sections, 23 theorems, 90 equations, 5 figures)

This paper contains 21 sections, 23 theorems, 90 equations, 5 figures.

Introduction
Existing works on nonsmooth optimization
Embedding stochastic subgradient methods into Lagrangian-based methods
Motivations
Contributions
Organization
Preliminaries
Notations
Nonsmooth analysis
Differential inclusion and stochastic subgradient methods
Global Convergence
Embeddable stochastic subgradient methods
Basic assumptions and main results
Application in expectation-constrained optimization
Applications
...and 6 more sections

Key Result

Proposition 2.8

Let $h: \mathbb{R}^n \to \mathbb{R}$ be a path-differentiable function that admits $\mathcal{D}_h$ as its conservative field. Suppose $h$, $\mathcal{D}_h$ and ${ \mathcal{X} }$ are definable over $\mathbb{R}^n$, then $\{h(x): x \in { \mathcal{X} },~ 0\in \mathrm{conv}(\mathcal{D}_h(x)) + \mathcal{N}

Figures (5)

Figure 1: A brief illustration of our results on embedding stochastic subgradient methods, which are designed to solve \ref{['Prob_simple_min_X']}, into linearized Lagrangian-based methods through the framework \ref{['Eq_Framework']}.
Figure 2: Numerical results on training LeNet10 on MNIST with constraints.
Figure 3: Numerical results on training ResNet14 on CIFAR10 with constraints.
Figure 4: SGDM-LALM with different stepsizes $\alpha$ to train LeNet10 on MNIST. Other parameters are fixed.
Figure 5: ADAM-LALM with different $\rho$ to train LeNet10 on MNIST. Other parameters are fixed.

Theorems & Definitions (50)

Definition 2.1: clarke1990optimization
Definition 2.2
Definition 2.3
Definition 2.4: Aumann’s integral
Definition 2.5
Definition 2.6
Definition 2.7
Proposition 2.8: Corollary 5 in bolte2007clarke
proof
Definition 2.9
...and 40 more

Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization

TL;DR

Abstract

Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (50)