Stochastic Smoothed Primal-Dual Algorithms for Nonconvex Optimization with Linear Inequality Constraints
Ruichuan Huang, Jiawei Zhang, Ahmet Alacaoglu
TL;DR
This work introduces a stochastic, smoothed primal–dual framework for nonconvex optimization with linear inequality constraints, using a Moreau-envelope-based analysis to enable a single-sample SGD at each iteration. The proposed stochastic smoothed linearized ALM achieves an optimal O(ε^{-4}) oracle complexity for obtaining an ε-near stationary point, with a postprocessing step to guarantee ε-stationarity, and extends to stochastic linear constraints with safeguarding. By incorporating variance reduction (STORM), the authors achieve O(ε^{-3}) complexity under a stronger variance-reduction assumption, matching known lower bounds in the unconstrained case. The framework is demonstrated across distributed optimization, discrete relaxations, and fairness-constrained classification, illustrating its broad applicability to structured stochastic nonconvex problems.
Abstract
We propose smoothed primal-dual algorithms for solving stochastic and smooth nonconvex optimization problems with linear inequality constraints. Our algorithms are single-loop and only require a single stochastic gradient based on one sample at each iteration. A distinguishing feature of our algorithm is that it is based on an inexact gradient descent framework for the Moreau envelope, where the gradient of the Moreau envelope is estimated using one step of a stochastic primal-dual augmented Lagrangian method. To handle inequality constraints and stochasticity, we combine the recently established global error bounds in constrained optimization with a Moreau envelope-based analysis of stochastic proximal algorithms. For obtaining $\varepsilon$-stationary points, we establish the optimal $O(\varepsilon^{-4})$ sample complexity guarantee for our algorithms and provide extensions to stochastic linear constraints. We also show how to improve this complexity to $O(\varepsilon^{-3})$ by using variance reduction and the expected smoothness assumption. Unlike existing methods, the iterations of our algorithms are free of subproblems, large batch sizes or increasing penalty parameters and use dual variable updates to ensure feasibility.
