The Nesterov-Spokoiny Acceleration Achieves Strict $o(1/k^2)$ Convergence
Weibin Peng, Yu Liu, Tianyu Wang
TL;DR
The paper introduces the Nesterov--Spokoiny Acceleration (NSA), a momentum-based scheme that preserves monotone descent while achieving fast convergence for smooth convex objectives, attaining $o(1/k^2)$ in function value and $o(1/(k^3 \log k))$ in the squared gradient. It extends NSA to inexact gradient (zeroth-order) oracles and to nonsmooth/composite objectives with proximal updates, preserving the same accelerated rates in function value and giving descent guarantees even in nonconvex settings. A continuous-time analysis connects NSA to high-resolution ODEs, yielding a system that explains the acceleration phenomenon and yields an $O(1/t^2)$ rate in the convex setting. The paper also provides extensive experiments comparing NSA variants to standard accelerators, demonstrating practical speedups and highlighting the effectiveness of the zeroth-order and composite extensions. Overall, NSA offers a unified framework combining acceleration with guaranteed descent and broad applicability across smooth, nonsmooth, and zeroth-order optimization problems.
Abstract
This paper studies the Nesterov-Spokoiny Acceleration (NSA), a variant of the accelerated gradient method by Nesterov and Spokoiny. For smooth convex optimization, NSA achieves a strict $o(1/k^2)$ convergence rate in function value and an $o(1/(k^3 \log k))$ rate in squared gradient norm, while ensuring monotonic descent of the objective. We further study a zeroth-order version of NSA that handles inexact gradients, and extends NSA to composite optimization problems, in each case establishing $o(1/k^2)$ convergence in function value. A continuous-time analysis reveals connections to high-resolution ODEs known to underlie acceleration phenomena.
