Table of Contents
Fetching ...

Acceleration Methods

Alexandre d'Aspremont, Damien Scieur, Adrien Taylor

TL;DR

This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization, and uses quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes.

Abstract

This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nesterov and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns. Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters.

Acceleration Methods

TL;DR

This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization, and uses quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes.

Abstract

This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nesterov and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns. Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters.

Paper Structure

This paper contains 163 sections, 74 theorems, 511 equations, 6 figures, 32 algorithms.

Key Result

proposition 1

Let $x_0\in\mathbb{R}^d$ and $f$ be a quadratic function defined as in eq:quad-prob with $\mu \mathbf{I} \preceq \textbf{H} \preceq L \mathbf{I}$ for some $L>\mu>0$. The sequence $\{x_k\}_{k=0,1,\ldots}$ satisfies for all $k=0,1,\ldots$, if and only if the errors $\{x_k-x_\star\}_{k=0,1,\ldots}$ can be written as for all $k=0,1\ldots$, for some sequence of polynomials $\{P_{k}\}_{k=0,1,\ldots}$

Figures (6)

  • Figure 1: We plot $|P_k^{\text{Grad}}(\lambda)|$ (for the optimal $\gamma$ in \ref{['eq:step']}) for $k\in\{1,3,5\}$, $\mu = 1$, $L=10$. Note that the polynomials satisfy $P_k^{\text{Grad}}(0)=1$. The rate is equal to the largest value of $|P_k^{\text{Grad}}(\lambda)|$ on the interval, which is achieved at the boundaries (where $\lambda$ is either equal to $\mu$ or to $L$).
  • Figure 2: We plot the absolute value of $C_1^{[\mu,L]}(x)$, $C_3^{[\mu,L]}(x)$ and $C_5^{[\mu,L]}(x)$ for $\lambda\in [\mu, L]$, where $\mu = 1$ and $L=10$. Note that the polynomials satisfy $C_k^{[\mu,L]}(0)=1$. The maximum value of the image of $[\mu,L]$ by $C^{[\mu,L]}_k$ decreases rapidly as $k$ grows, implying an accelerated rate of convergence.
  • Figure 3: Illustration of the sensitivity of nonlinear acceleration when applying nonlinear acceleration to gradient descent, to Nesterov's method (see Section \ref{['c-Nest']}), and using its online variant (Algorithm \ref{['alg:online_nonlinear_acceleration']}) to minimize some random quadratic function. Figure \ref{['fig:sensitivity_nonlinear_acceleration_cong_gg']}: the condition number of the matrix $\textbf{G}^T\textbf{G}$, which grows exponentially with its size (the plateau on the right is caused by numerical errors). Figure \ref{['fig:sensitivity_nonlinear_acceleration_norm_c']}: the norm of the vector of coefficients $c$.
  • Figure 4: Let $f(\cdot)$ (blue) be a differentiable function. (Left) Smoothness: $f(\cdot)$ (blue) is $L$-smooth if and only if it is upper bounded by $f(y)+\langle \nabla f(y);.-y\rangle+\tfrac{L}{2}\|.-y\|^2_2$ (dashed, brown) for all $y$. (Right) Strong convexity: $f(\cdot)$ (blue) is $\mu$-strongly convex if and only if it is lower bounded by $f(y)+\langle \nabla f(y);.-y\rangle+\tfrac{\mu}{2}\|.-y\|^2_2$ (dashed, brown) for all $y$.
  • Figure 5: Left: Sublinear convergence plot without restart. Right: Sublinear convergence plot with restart.
  • ...and 1 more figures

Theorems & Definitions (160)

  • proposition 1
  • proof
  • theorem 1
  • proof
  • corollary 1
  • proof
  • theorem 2
  • definition 1: Nondegenerate first-order method
  • proposition 2
  • proof
  • ...and 150 more