Adaptive Acceleration Without Strong Convexity Priors Or Restarts
Joao V. Cavalcanti, Laurent Lessard, Ashia C. Wilson
TL;DR
The paper tackles optimization with unknown strong convexity $m$ by introducing NAG-free, a restart-free extension of Nesterov's accelerated gradient that online-estimates $m$ while backtracking $L$ for robustness. The method yields a convex interpolation between GD and NAG, ensuring global convergence no worse than GD and, under local Hessian smoothness with an upper bound on $L$, accelerated convergence near the optimum. Theoretical results are supported by Lyapunov-function analyses and two-regime behavior (global and local), with empirical demonstrations on logistic regression, log-sum-exp, and SVM problems showing competitive performance against restart-based approaches and natural adaptation to favorable curvature. Overall, NAG-free provides a practical, parameter-free pathway to achieve near-optimal first-order convergence without restarts, while maintaining stability under varying curvature conditions.
Abstract
A longstanding challenge in optimization is achieving optimal performance when the strong convexity parameter m is unknown. In this paper, we propose NAG-free, a simple extension of Nesterov's accelerated gradient (NAG) which is the first method capable of estimating m directly, without priors or restarts. Our estimator is inexpensive: it requires no additional function or gradient evaluations, only the storage of one extra iterate and gradient already computed by NAG. We prove that, by estimating the smoothness parameter L via backtracking, NAG-free converges globally at least as fast as gradient descent. We also prove that, given an upper bound on L, NAG-free achieves accelerated convergence locally near the minimum under local smoothness of the Hessian and some mild additional assumptions. Finally, we present experiments with smooth and nonsmooth Hessians on both synthetic and real-world data which demonstrate that NAG-free is competitive with restart methods, and naturally adapts to favorable local curvature conditions.
