Stochastic Momentum Methods for Non-smooth Non-Convex Finite-Sum Coupled Compositional Optimization
Xingyu Chen, Bokun Wang, Ming Yang, Qihang Lin, Tianbao Yang
TL;DR
This work tackles non-smooth, non-convex finite-sum Coupled Compositional Optimization (FCCO) by introducing stochastic momentum methods that leverage outer (and nested) Moreau envelope smoothing to produce tractable surrogates. The authors propose two algorithms, SONEX for smooth inner functions and ALEXR2 for smooth or weakly convex inner functions, achieving a new state-of-the-art iteration complexity of $O(1/\epsilon^5)$. They further apply smoothing techniques to non-convex inequality-constrained problems via smoothed hinge penalties, obtaining near-optimal $\epsilon$-KKT guarantees with comparable rates. Empirical results on group DRO, AUC ROC fairness, and continual learning tasks show that the proposed methods outperform existing baselines in both optimization efficiency and constraint satisfaction, illustrating practical relevance for deep learning and robust optimization.
Abstract
Finite-sum Coupled Compositional Optimization (FCCO), characterized by its coupled compositional objective structure, emerges as an important optimization paradigm for addressing a wide range of machine learning problems. In this paper, we focus on a challenging class of non-convex non-smooth FCCO, where the outer functions are non-smooth weakly convex or convex and the inner functions are smooth or weakly convex. Existing state-of-the-art result face two key limitations: (1) a high iteration complexity of $O(1/ε^6)$ under the assumption that the stochastic inner functions are Lipschitz continuous in expectation; (2) reliance on vanilla SGD-type updates, which are not suitable for deep learning applications. Our main contributions are two fold: (i) We propose stochastic momentum methods tailored for non-smooth FCCO that come with provable convergence guarantees; (ii) We establish a new state-of-the-art iteration complexity of $O(1/ε^5)$. Moreover, we apply our algorithms to multiple inequality constrained non-convex optimization problems involving smooth or weakly convex functional inequality constraints. By optimizing a smoothed hinge penalty based formulation, we achieve a new state-of-the-art complexity of $O(1/ε^5)$ for finding an (nearly) $ε$-level KKT solution. Experiments on three tasks demonstrate the effectiveness of the proposed algorithms.
