Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints
Ming Yang, Gang Li, Quanqi Hu, Qihang Lin, Tianbao Yang
TL;DR
This work tackles stochastic non-convex optimization with weakly convex objectives and multiple inequality constraints by introducing a hinge-based exact penalty, forming the unconstrained objective $\Phi(x)=F(x)+\frac{\beta}{m}\sum_k [h_k(x)]_+$. By analyzing the Moreau envelope $\Phi_{\theta}$ and proving a nearly $\epsilon$-KKT guarantee under a regularity condition, the authors justify a single-loop stochastic algorithm that attains the state-of-the-art $\mathcal{O}(\epsilon^{-6})$ complexity. The framework extends to finite-sum coupled compositional objectives (FCCO), with two settings: Setting I (unbiased stochastic gradient for $F$) and Setting II (FCCO structure), both yielding convergence guarantees and matching $\epsilon^{-6}$ rates via variance-reduced estimators (MSVR). Empirical validation on fair learning with ROC constraints and continual-learning non-forgetting constraints demonstrates effective constraint satisfaction with modest penalty parameters and competitive objective performance relative to squared-hinge penalties and double-loop methods. Overall, the hinge penalty approach provides a scalable, theoretically-grounded, single-loop alternative for challenging stochastic constrained optimization problems in ML applications.
Abstract
Constrained optimization with multiple functional inequality constraints has significant applications in machine learning. This paper examines a crucial subset of such problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop algorithmic designs. To overcome these challenges, we introduce a novel single-loop penalty-based stochastic algorithm. Following the classical exact penalty method, our approach employs a {\bf hinge-based penalty}, which permits the use of a constant penalty parameter, enabling us to achieve a {\bf state-of-the-art complexity} for finding an approximate Karush-Kuhn-Tucker (KKT) solution. We further extend our algorithm to address finite-sum coupled compositional objectives, which are prevalent in artificial intelligence applications, establishing improved complexity over existing approaches. Finally, we validate our method through experiments on fair learning with receiver operating characteristic (ROC) fairness constraints and continual learning with non-forgetting constraints.
