Table of Contents
Fetching ...

A template for gradient norm minimization

Mihai I. Florea

Abstract

The gradient mapping norm is a strong and easily verifiable stopping criterion for first-order methods on composite problems. When the objective exhibits the quadratic growth property, the gradient mapping norm minimization problem can be solved by online parameter-free and adaptive first-order schemes with near-optimal worst-case rates. In this work we address problems where quadratic growth is absent, a class for which no methods with all the aforementioned properties are known to exist. We formulate a template whose instantiation recovers the existing Performance Estimation derived approaches. Our framework provides a simple human-readable interpretation along with runtime convergence rates for these algorithms. Moreover, our template can be used to construct a quasi-online parameter-free method applicable to the entire class of composite problems while retaining the optimal worst-case rates with the best known proportionality constant. The analysis also allows for adaptivity. Preliminary simulation results suggest that our scheme is highly competitive in practice with the existing approaches, either obtained via Performance Estimation or employing Accumulative Regularization.

A template for gradient norm minimization

Abstract

The gradient mapping norm is a strong and easily verifiable stopping criterion for first-order methods on composite problems. When the objective exhibits the quadratic growth property, the gradient mapping norm minimization problem can be solved by online parameter-free and adaptive first-order schemes with near-optimal worst-case rates. In this work we address problems where quadratic growth is absent, a class for which no methods with all the aforementioned properties are known to exist. We formulate a template whose instantiation recovers the existing Performance Estimation derived approaches. Our framework provides a simple human-readable interpretation along with runtime convergence rates for these algorithms. Moreover, our template can be used to construct a quasi-online parameter-free method applicable to the entire class of composite problems while retaining the optimal worst-case rates with the best known proportionality constant. The analysis also allows for adaptivity. Preliminary simulation results suggest that our scheme is highly competitive in practice with the existing approaches, either obtained via Performance Estimation or employing Accumulative Regularization.

Paper Structure

This paper contains 31 sections, 14 theorems, 173 equations, 3 figures, 2 tables, 15 algorithms.

Key Result

Proposition 1

\newlabellabel_0070 For any $y \in \mathbb{E}$ and $\hat{L} > 0$ satisfying the descent condition we can construct a global lower bound on the objective of the form

Figures (3)

  • Figure 1: Side-by-side comparison between the reduced variants of OCGM-G and FGM
  • Figure 1: Simulation results on LASSO
  • Figure 2: Simulation results on NNLS

Theorems & Definitions (26)

  • Proposition 1
  • Proposition 1
  • Proof 1
  • Proposition 2
  • Proof 2
  • Lemma 3
  • Proof 3
  • Proposition 4
  • Proof 4
  • Proposition 5
  • ...and 16 more