Lower Bounds for Non-Convex Stochastic Optimization
Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth
TL;DR
This work establishes tight distributional lower bounds for stochastic first-order methods in non-convex optimization, showing that finding an $\epsilon$-stationary point requires at least $\Omega(\Delta L \sigma^{2} / \epsilon^{4})$ queries under bounded-variance oracles and at least $\Omega(\Delta \bar{L} \sigma / \epsilon^{3} + \sigma^{2}/\epsilon^{2})$ under mean-squared smoothness, with dimensions scaling polynomially in $1/\epsilon$. By leveraging probabilistic zero-chains and random rotations, the authors prove that SGD is minimax-optimal in the bounded-variance setting and that variance-reduction methods are optimal under MSS, clarifying the fundamental limits and separations between MSS and non-MSS regimes. The results extend to learning-type and active oracle models, as well as finite-sum structures, and imply a separation between non-convex stochastic optimization and convex settings in terms of the $\epsilon^{-4}$ vs $\epsilon^{-2}$ scaling. The paper also outlines several open questions, including the MSS bound with a single query ($K=1$), stronger oracle assumptions, and extensions to higher-order algorithms.
Abstract
We lower bound the complexity of finding $ε$-stationary points (with gradient norm at most $ε$) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least $ε^{-4}$ queries to find an $ε$ stationary point. The lower bound is tight, and establishes that stochastic gradient descent is minimax optimal in this model. In a more restrictive model where the noisy gradient estimates satisfy a mean-squared smoothness property, we prove a lower bound of $ε^{-3}$ queries, establishing the optimality of recently proposed variance reduction techniques.
