A Novel Unified Parametric Assumption for Nonconvex Optimization
Artem Riabinin, Ahmed Khaled, Peter Richtárik
TL;DR
The paper introduces a novel unified parametric assumption that links the gradient to a projection onto a subset of minimizers through a nonnegative progress function, enabling unified convergence analysis for nonconvex optimization. It develops deterministic and stochastic convergence theorems for gradient-based methods, and shows that classical convex guarantees emerge as a special case while subsuming several nonconvex conditions such as weak quasi-convexity and aiming. The framework is demonstrated across multiple function classes and extended to stochastic settings, with problem formulations and proofs complemented by experiments on half-space learning, Fashion-MNIST with an MLP, and CIFAR-10 with ResNet to validate practical relevance. Overall, the work provides a flexible, theoretically grounded bridge between convex optimization and nonconvex practice, guiding algorithm design and analysis in complex landscapes.
Abstract
Nonconvex optimization is central to modern machine learning, but the general framework of nonconvex optimization yields weak convergence guarantees that are too pessimistic compared to practice. On the other hand, while convexity enables efficient optimization, it is of limited applicability to many practical problems. To bridge this gap and better understand the practical success of optimization algorithms in nonconvex settings, we introduce a novel unified parametric assumption. Our assumption is general enough to encompass a broad class of nonconvex functions while also being specific enough to enable the derivation of a unified convergence theorem for gradient-based methods. Notably, by tuning the parameters of our assumption, we demonstrate its versatility in recovering several existing function classes as special cases and in identifying functions amenable to efficient optimization. We derive our convergence theorem for both deterministic and stochastic optimization, and conduct experiments to verify that our assumption can hold practically over optimization trajectories.
