Table of Contents
Fetching ...

Adaptive Acceleration Without Strong Convexity Priors Or Restarts

Joao V. Cavalcanti, Laurent Lessard, Ashia C. Wilson

TL;DR

The paper tackles optimization with unknown strong convexity $m$ by introducing NAG-free, a restart-free extension of Nesterov's accelerated gradient that online-estimates $m$ while backtracking $L$ for robustness. The method yields a convex interpolation between GD and NAG, ensuring global convergence no worse than GD and, under local Hessian smoothness with an upper bound on $L$, accelerated convergence near the optimum. Theoretical results are supported by Lyapunov-function analyses and two-regime behavior (global and local), with empirical demonstrations on logistic regression, log-sum-exp, and SVM problems showing competitive performance against restart-based approaches and natural adaptation to favorable curvature. Overall, NAG-free provides a practical, parameter-free pathway to achieve near-optimal first-order convergence without restarts, while maintaining stability under varying curvature conditions.

Abstract

A longstanding challenge in optimization is achieving optimal performance when the strong convexity parameter m is unknown. In this paper, we propose NAG-free, a simple extension of Nesterov's accelerated gradient (NAG) which is the first method capable of estimating m directly, without priors or restarts. Our estimator is inexpensive: it requires no additional function or gradient evaluations, only the storage of one extra iterate and gradient already computed by NAG. We prove that, by estimating the smoothness parameter L via backtracking, NAG-free converges globally at least as fast as gradient descent. We also prove that, given an upper bound on L, NAG-free achieves accelerated convergence locally near the minimum under local smoothness of the Hessian and some mild additional assumptions. Finally, we present experiments with smooth and nonsmooth Hessians on both synthetic and real-world data which demonstrate that NAG-free is competitive with restart methods, and naturally adapts to favorable local curvature conditions.

Adaptive Acceleration Without Strong Convexity Priors Or Restarts

TL;DR

The paper tackles optimization with unknown strong convexity by introducing NAG-free, a restart-free extension of Nesterov's accelerated gradient that online-estimates while backtracking for robustness. The method yields a convex interpolation between GD and NAG, ensuring global convergence no worse than GD and, under local Hessian smoothness with an upper bound on , accelerated convergence near the optimum. Theoretical results are supported by Lyapunov-function analyses and two-regime behavior (global and local), with empirical demonstrations on logistic regression, log-sum-exp, and SVM problems showing competitive performance against restart-based approaches and natural adaptation to favorable curvature. Overall, NAG-free provides a practical, parameter-free pathway to achieve near-optimal first-order convergence without restarts, while maintaining stability under varying curvature conditions.

Abstract

A longstanding challenge in optimization is achieving optimal performance when the strong convexity parameter m is unknown. In this paper, we propose NAG-free, a simple extension of Nesterov's accelerated gradient (NAG) which is the first method capable of estimating m directly, without priors or restarts. Our estimator is inexpensive: it requires no additional function or gradient evaluations, only the storage of one extra iterate and gradient already computed by NAG. We prove that, by estimating the smoothness parameter L via backtracking, NAG-free converges globally at least as fast as gradient descent. We also prove that, given an upper bound on L, NAG-free achieves accelerated convergence locally near the minimum under local smoothness of the Hessian and some mild additional assumptions. Finally, we present experiments with smooth and nonsmooth Hessians on both synthetic and real-world data which demonstrate that NAG-free is competitive with restart methods, and naturally adapts to favorable local curvature conditions.

Paper Structure

This paper contains 24 sections, 31 theorems, 399 equations, 11 figures, 2 tables.

Key Result

Theorem 4.3

Let $f\in\mathcal{S}(L,m)$ and suppose that $\kappa=L/m \geq 2$. If $y_{t}$ are generated by alg:nag-free for given $L_{0}\geq m$, $\gamma \leq 2$ and $\gamma_{L}\leq 2$, then letting $\bar{\kappa}=\max(L_{0},2L)/m$, we have that

Figures (11)

  • Figure 1: Suboptimality gap for logistic regression on three datasets.
  • Figure 2: Left: $c_{t}$ and $m_{t}$ produced by $\text{NAG-free } (\bar{L})$ for logistic regression on phishing dataset, when $x_{0}=0$ and $x_{0}\sim 10^{6}\times\mathcal{U}[0,10^{-6}]^{d}$. Right: left $y$-axis shows the normalized suboptimality gap for NAG and $\text{NAG-free } (\bar{L})$ on a quadratic problem and the gaps corresponding to accelerated rates for $m=1$ and $m=5$, $r_{\textup{1}}$ and $r_{\textup{5}}$; right $y$-axis shows $m_{t}$.
  • Figure 3: Suboptimality gap $f(x_{t})-f(x^{\star})$ for log-sum-exp under different $(\eta,\theta)$ settings.
  • Figure 4: Suboptimality gap $f(x_{t})-f(x^{\star})$ for SVM on three datasets.
  • Figure 5: Suboptimality gap $f(x_{t})-f(x^{\star})$ for logistic regression on three datasets.
  • ...and 6 more figures

Theorems & Definitions (39)

  • Definition 2.1: Lipschitz-Smooth and Strongly Convex Functions.
  • Definition 2.2: Locally Hölder-smooth Hessian
  • Theorem 4.3
  • Corollary 4.1
  • Theorem 4.4
  • Remark A.5
  • Remark A.6
  • Remark A.7
  • Lemma A.8
  • Lemma A.9
  • ...and 29 more