Table of Contents
Fetching ...

Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints

Ming Yang, Gang Li, Quanqi Hu, Qihang Lin, Tianbao Yang

TL;DR

This work tackles stochastic non-convex optimization with weakly convex objectives and multiple inequality constraints by introducing a hinge-based exact penalty, forming the unconstrained objective $\Phi(x)=F(x)+\frac{\beta}{m}\sum_k [h_k(x)]_+$. By analyzing the Moreau envelope $\Phi_{\theta}$ and proving a nearly $\epsilon$-KKT guarantee under a regularity condition, the authors justify a single-loop stochastic algorithm that attains the state-of-the-art $\mathcal{O}(\epsilon^{-6})$ complexity. The framework extends to finite-sum coupled compositional objectives (FCCO), with two settings: Setting I (unbiased stochastic gradient for $F$) and Setting II (FCCO structure), both yielding convergence guarantees and matching $\epsilon^{-6}$ rates via variance-reduced estimators (MSVR). Empirical validation on fair learning with ROC constraints and continual-learning non-forgetting constraints demonstrates effective constraint satisfaction with modest penalty parameters and competitive objective performance relative to squared-hinge penalties and double-loop methods. Overall, the hinge penalty approach provides a scalable, theoretically-grounded, single-loop alternative for challenging stochastic constrained optimization problems in ML applications.

Abstract

Constrained optimization with multiple functional inequality constraints has significant applications in machine learning. This paper examines a crucial subset of such problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop algorithmic designs. To overcome these challenges, we introduce a novel single-loop penalty-based stochastic algorithm. Following the classical exact penalty method, our approach employs a {\bf hinge-based penalty}, which permits the use of a constant penalty parameter, enabling us to achieve a {\bf state-of-the-art complexity} for finding an approximate Karush-Kuhn-Tucker (KKT) solution. We further extend our algorithm to address finite-sum coupled compositional objectives, which are prevalent in artificial intelligence applications, establishing improved complexity over existing approaches. Finally, we validate our method through experiments on fair learning with receiver operating characteristic (ROC) fairness constraints and continual learning with non-forgetting constraints.

Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints

TL;DR

This work tackles stochastic non-convex optimization with weakly convex objectives and multiple inequality constraints by introducing a hinge-based exact penalty, forming the unconstrained objective . By analyzing the Moreau envelope and proving a nearly -KKT guarantee under a regularity condition, the authors justify a single-loop stochastic algorithm that attains the state-of-the-art complexity. The framework extends to finite-sum coupled compositional objectives (FCCO), with two settings: Setting I (unbiased stochastic gradient for ) and Setting II (FCCO structure), both yielding convergence guarantees and matching rates via variance-reduced estimators (MSVR). Empirical validation on fair learning with ROC constraints and continual-learning non-forgetting constraints demonstrates effective constraint satisfaction with modest penalty parameters and competitive objective performance relative to squared-hinge penalties and double-loop methods. Overall, the hinge penalty approach provides a scalable, theoretically-grounded, single-loop alternative for challenging stochastic constrained optimization problems in ML applications.

Abstract

Constrained optimization with multiple functional inequality constraints has significant applications in machine learning. This paper examines a crucial subset of such problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop algorithmic designs. To overcome these challenges, we introduce a novel single-loop penalty-based stochastic algorithm. Following the classical exact penalty method, our approach employs a {\bf hinge-based penalty}, which permits the use of a constant penalty parameter, enabling us to achieve a {\bf state-of-the-art complexity} for finding an approximate Karush-Kuhn-Tucker (KKT) solution. We further extend our algorithm to address finite-sum coupled compositional objectives, which are prevalent in artificial intelligence applications, establishing improved complexity over existing approaches. Finally, we validate our method through experiments on fair learning with receiver operating characteristic (ROC) fairness constraints and continual learning with non-forgetting constraints.

Paper Structure

This paper contains 18 sections, 10 theorems, 86 equations, 7 figures, 2 tables, 2 algorithms.

Key Result

Lemma 3.4

Given a $\rho$ weakly convex function $\phi$ and $\theta < (\rho)^{-1}$, then the envelope $\phi_\theta$ is smooth with gradient given by $\nabla \phi_{\theta}(\mathbf{x}) = \theta^{-1}(\mathbf{x}-\text{prox}_{\theta\phi}(\mathbf{x}))$, where The smoothness constant of $\phi_\theta$ is $\frac{2-\theta\rho}{\theta(1-\theta\rho)}$. In addition, $\text{dist}(0, \partial \phi(\bar{\mathbf{x}}))\leq \

Figures (7)

  • Figure 1: Training curves of 15 constraint functions of different methods for fair learning on the Adult dataset. Top: hinge-based penalty method with different $\beta$; Bottom: squared-hinge-based penalty method with different $\beta$. The legend $\texttt{train\_tpr\_th\_-3.0}$ denotes the constraint function $h_{\tau^{+}}(\mathbf{w})$ with $\tau=3$; similarly, $\texttt{train\_fpr\_th\_-3.0}$ represents the constraint function $h_{\tau^{-}}(\mathbf{w})$ with $\tau=3$.
  • Figure 2: Training curves of objective and constraint violation of different methods under a parameter setting when they satisfy the constraints in the end. Dashed lines correspond to objective AUC values (y-axis on the right) and solid lines correspond to constraint violations (y-axis on the left).
  • Figure 3: Training curves of 5 constraint values in zero-one loss of different methods for continual learning with non-forgetting constraints when targeting the foggy class. Top: hinge penalty method with different $\beta$; Bottom: squared-hinge penalty method with different $\beta$.
  • Figure 5: Training curves of 5 constraint values in zero-one loss of different methods for continual learning with non-forgetting constraints when targeting the overcast class. Top: hinge penalty method with different $\beta$; Bottom: squared-hinge penalty method with different $\beta$.
  • Figure 6: Training curves of 5 constraint values in zero-one loss of different methods for continual learning with non-forgetting constraints when targeting the tunnel class. Top: hinge penalty method with different $\beta$; Bottom: squared-hinge penalty method with different $\beta$.
  • ...and 2 more figures

Theorems & Definitions (18)

  • Definition 3.2
  • Definition 3.3
  • Lemma 3.4
  • Lemma 4.1
  • Theorem 4.2
  • proof
  • Lemma 4.3
  • Lemma 4.4
  • Theorem 5.3
  • Theorem 5.5
  • ...and 8 more