A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization

Songtao Lu

A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization

Songtao Lu

TL;DR

The paper tackles nonconvex optimization with nonconvex inequality constraints by introducing GDPA, a single-loop primal-dual gradient method that uses a perturbed augmented Lagrangian $F_{\beta}$. With a simple two-parameter scheme and mild regularity, GDPA achieves $\varepsilon$-approximate KKT points at a rate of $\mathcal{O}(1/\varepsilon^3)$, matching the best known rates for multi-loop methods without inner subproblem solvers. Theoretical guarantees are complemented by numerical experiments on mNPC, CMDP, and energy-budgeted DNNs, where GDPA consistently outperforms state-of-the-art double-/triple-loop algorithms in both stationarity and feasibility metrics. This work thus provides a practical, scalable alternative for constrained nonconvex learning tasks, with potential impact across ML safety, interpretability, and efficiency.

Abstract

Nonconvex constrained optimization problems can be used to model a number of machine learning problems, such as multi-class Neyman-Pearson classification and constrained Markov decision processes. However, such kinds of problems are challenging because both the objective and constraints are possibly nonconvex, so it is difficult to balance the reduction of the loss value and reduction of constraint violation. Although there are a few methods that solve this class of problems, all of them are double-loop or triple-loop algorithms, and they require oracles to solve some subproblems up to certain accuracy by tuning multiple hyperparameters at each iteration. In this paper, we propose a novel gradient descent and perturbed ascent (GDPA) algorithm to solve a class of smooth nonconvex inequality constrained problems. The GDPA is a primal-dual algorithm, which only exploits the first-order information of both the objective and constraint functions to update the primal and dual variables in an alternating way. The key feature of the proposed algorithm is that it is a single-loop algorithm, where only two step-sizes need to be tuned. We show that under a mild regularity condition GDPA is able to find Karush-Kuhn-Tucker (KKT) points of nonconvex functional constrained problems with convergence rate guarantees. To the best of our knowledge, it is the first single-loop algorithm that can solve the general nonconvex smooth problems with nonconvex inequality constraints. Numerical results also showcase the superiority of GDPA compared with the best-known algorithms (in terms of both stationarity measure and feasibility of the obtained solutions).

A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization

TL;DR

The paper tackles nonconvex optimization with nonconvex inequality constraints by introducing GDPA, a single-loop primal-dual gradient method that uses a perturbed augmented Lagrangian

. With a simple two-parameter scheme and mild regularity, GDPA achieves

-approximate KKT points at a rate of

, matching the best known rates for multi-loop methods without inner subproblem solvers. Theoretical guarantees are complemented by numerical experiments on mNPC, CMDP, and energy-budgeted DNNs, where GDPA consistently outperforms state-of-the-art double-/triple-loop algorithms in both stationarity and feasibility metrics. This work thus provides a practical, scalable alternative for constrained nonconvex learning tasks, with potential impact across ML safety, interpretability, and efficiency.

Abstract

Paper Structure (35 sections, 12 theorems, 198 equations, 3 figures, 1 table)

This paper contains 35 sections, 12 theorems, 198 equations, 3 figures, 1 table.

Introduction
Motivating Examples
Related Work
Main Contributions of This Work
Gradient Descent and Perturbed Ascent Algorithm
Theoretical Guarantees
Assumptions
Theoretical Guarantees
Convergence Analysis
Discussion
Regularity Condition
Comparison with Existing Works
Numerical Experiments
Concluding Remark
Acknowledgement
...and 20 more sections

Key Result

Theorem 1

Suppose that Assumption ass.lif-Assumption ass.bd (or Assumption ass.compact ) and Assumption ass.re hold and iterates $\{\mathbf x_r,\boldsymbol \lambda_r,\forall r\ge0\}$ are generated by GDPA. When the step-sizes are chosen as and $\max\{1-\sigma/\sqrt{66U^2_J+\sigma^2},1/2\}<\tau<1$, there exist constants $K_1,K_2,K_3$ such that where and outputs $\mkern 1.5mu\overline{\mkern-1.5mu\mathbf x

Figures (3)

Figure 1: Computational time comparison of GDPA, IALM, IQRC, IPPP.
Figure 2: Computational time comparison of GDPA, IALM, IQRC, IPPP.
Figure 3: Objective reward v.s. constrained reward achieved by GDPA and PG.

Theorems & Definitions (24)

Definition 1
Theorem 1
Definition 2
Proposition 1
Lemma 1
Corollary 1
Lemma 2
Lemma 3
Lemma 4
Lemma 5
...and 14 more

A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization

TL;DR

Abstract

A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (24)