Continuized Nesterov Momentum Achieves the $O(\varepsilon^{-7/4})$ Complexity without Additional Mechanisms
Julien Hermant, Jean-François Aujol, Charles Dossal, Lorick Huang, Aude Rondepierre
TL;DR
The paper proves that a continuized Nesterov momentum method, with stochastic but function-independent parameters and no safeguard mechanisms, attains the same $O(\varepsilon^{-7/4})$ complexity for finding an $\varepsilon$-stationary point as prior safeguarded methods. By blending continuous momentum dynamics with random gradient updates (via a Poisson process) and analyzing a Poisson-averaged trajectory, the authors derive convergence in expectation under Lipschitz gradient and Hessian assumptions. Under these conditions, they establish a rate of $\mathcal{O}(n^{-4/7})$ for the gradient norm in a suitably weighted sense, which implies the target complexity bound when expressed in terms of gradient evaluations. The results hinge on a careful transfer from a continuous-time analysis to a discrete algorithm, and they reveal that safeguards may not be fundamentally necessary for accelerated first-order non-convex optimization in this setting. Limitations include dependence on a random event $\mathcal{A}_n$ and a normalization term $\Delta_n/\mathbb{E}[\Delta_n]$, though empirical evidence suggests these are mild and likely removable in future work.
Abstract
For first-order optimization of non-convex functions with Lipschitz continuous gradient and Hessian, the best known complexity for reaching an $\varepsilon$-approximation of a stationary point is $O(\varepsilon^{-7/4})$. Existing algorithms achieving this bound are based on momentum, but are always complemented with safeguard mechanisms, such as restarts or negative-curvature exploitation steps. Whether such mechanisms are fundamentally necessary has remained an open question. Leveraging the continuized method, we show that a Nesterov momentum algorithm with stochastic parameters alone achieves the same complexity in expectation. This result holds up to a multiplicative stochastic factor with unit expectation and a restriction to a subset of the realizations, both of which are independent of the objective function. We empirically verify that these constitute mild limitations.
