Restarts subject to approximate sharpness: A parameter-free and optimal scheme for first-order methods

Ben Adcock; Matthew J. Colbrook; Maksym Neyra-Nesterenko

Restarts subject to approximate sharpness: A parameter-free and optimal scheme for first-order methods

Ben Adcock, Matthew J. Colbrook, Maksym Neyra-Nesterenko

TL;DR

This work addresses speeding up first-order methods under an approximate sharpness condition without knowing the associated constants. It develops a parameter-free restart framework that leverages a grid search over potential sharpness parameters and a schedule criterion to order restarts, ensuring convergence rates matching the optimal rates for a wide range of convex problems, even when iterates need not be feasible. The method applies to diverse first-order schemes, including Nesterov's methods, universal fast gradient methods, and primal-dual iterations, and remains robust in the presence of noise and model mismatch. Numerical experiments on sparse recovery (QCBP), TV-based image reconstruction, and SR-LASSO demonstrate substantial practical gains over non-restarted schemes and existing restart approaches, with grid-search-based variants offering significant parameter-insensitivity advantages.

Abstract

Sharpness is an almost generic assumption in continuous optimization that bounds the distance from minima by objective function suboptimality. It facilitates the acceleration of first-order methods through restarts. However, sharpness involves problem-specific constants that are typically unknown, and restart schemes typically reduce convergence rates. Moreover, these schemes are challenging to apply in the presence of noise or with approximate model classes (e.g., in compressive imaging or learning problems), and they generally assume that the first-order method used produces feasible iterates. We consider the assumption of approximate sharpness, a generalization of sharpness that incorporates an unknown constant perturbation to the objective function error. This constant offers greater robustness (e.g., with respect to noise or relaxation of model classes) for finding approximate minimizers. By employing a new type of search over the unknown constants, we design a restart scheme that applies to general first-order methods and does not require the first-order method to produce feasible iterates. Our scheme maintains the same convergence rate as when the constants are known. The convergence rates we achieve for various first-order methods match the optimal rates or improve on previously established rates for a wide range of problems. We showcase our restart scheme in several examples and highlight potential future applications and developments of our framework and theory.

Restarts subject to approximate sharpness: A parameter-free and optimal scheme for first-order methods

TL;DR

Abstract

Paper Structure (43 sections, 18 theorems, 153 equations, 11 figures, 2 tables, 8 algorithms)

This paper contains 43 sections, 18 theorems, 153 equations, 11 figures, 2 tables, 8 algorithms.

Introduction
The problem
Motivations
Example: sparse recovery
Contributions
Complexity bounds
Connections with previous work
Notation and outline
Restart scheme for unknown Lg
Restart scheme for unknown $\alpha$, $\beta$ and $\eta$
Schedule criterion functions, $h$-assignments and grid searches
The algorithm
Cost analysis of the algorithm
Choices of schedule criterion functions and assignments, and the proof of \ref{['thm:MAIN']}
Comparison with the cost in \ref{['thm:restart-known-consts']}
...and 28 more sections

Key Result

Theorem 1.1

Suppose that $f$ satisfies eqn:sharpness for some unknown constants $\alpha$, $\beta$ and $\eta$. Consider alg:restart-unknown-constsC for fixed $a, b > 1$, $0 < r<1$, $\alpha_0 > 0$, $\beta_0 \geq 1$ and schedule criterion function as in rad_ord_cor1 (unknown $\alpha$ and $\beta$), known_alpha (kno (total inner) iterations, where $K(\varepsilon)$ is given in L-eps-def, implies that Let $\beta_*=

Figures (11)

Figure 1: Level curves of $h=50$ for the schedule criterion functions $h$ in \ref{['rad_ord_cor1']} (left panel), \ref{['known_alpha']} (middle panel) and \ref{['known_beta']} (right panel) with $c_1=c_2=2$. The level curves describe the search order. The red dots show the corresponding indices $(i,j,k)$ in the set defined in \ref{['eqn:restart-unknown-consts-t-lb']}. The index $i$ indicates the parameter search value $a^i\alpha_0$ for $\alpha$. The index $j$ indicates the parameter search value $b^j\beta_0$ for $\beta$. The height (i.e., $k$) indicates the total number of inner iterations for a fixed $(i,j)$.
Figure 2: Reconstruction error of restarted primal-dual iteration for QCBP with $\varsigma = 10^{-6}$. Left: The restart scheme with fixed sharpness constants $\beta = 1$ and various $\alpha$. Right: Various different schemes (including restarted and non-restarted schemes).
Figure 3: Reconstruction error of restarted primal-dual iteration for QCBP with $\varsigma = 10^{-6}$. Left: The restart scheme with grid search over $\alpha$ and various fixed $\beta$. Right: The restart scheme with grid search over $\beta$ and various fixed $\alpha$.
Figure 4: Reconstruction error of restarted primal-dual iteration for QCBP with $\varsigma = 10^{-2k}$ for $k = 1,2,\dots,6$. Each plot includes the various (restarted and non-restarted) schemes.
Figure 5: Sampling patterns for the Fourier measurements used in the image reconstruction experiments.
...and 6 more figures

Theorems & Definitions (36)

Theorem 1.1
Theorem 2.1
proof : Proof of \ref{['thm:restart-known-consts']}
Definition 3.1
Remark 3.2: \ref{['alg:restart-unknown-constsC']} reduces to \ref{['alg:restart-known-consts']} when $\alpha$ and $\beta$ are known
Theorem 3.3
proof
Corollary 3.4: Unknown $\alpha$ and $\beta$
proof
Corollary 3.5: Known $\alpha$
...and 26 more

Restarts subject to approximate sharpness: A parameter-free and optimal scheme for first-order methods

TL;DR

Abstract

Restarts subject to approximate sharpness: A parameter-free and optimal scheme for first-order methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (36)