Table of Contents
Fetching ...

Accelerated optimization algorithms and ordinary differential equations: the convex non Euclidean case

Paul Dobson, Jesus María Sanz-Serna, Konstantinos C. Zygalakis

Abstract

We study the connections between ordinary differential equations and optimization algorithms in a non-Euclidean setting. We propose a novel accelerated algorithm for minimising convex functions over a convex constrained set. This algorithm is a natural generalization of Nesterov's accelerated gradient descent method to the non-Euclidean setting and can be interpreted as an additive Runge-Kutta algorithm. The algorithm can also be derived as a numerical discretization of the ODE appearing in Krichene et al. (2015a). We use Lyapunov functions to establish convergence rates for the ODE and show that the discretizations considered achieve acceleration beyond the setting studied in Krichene et al. (2015a). Finally, we discuss how the proposed algorithm connects to various equations and algorithms in the literature.

Accelerated optimization algorithms and ordinary differential equations: the convex non Euclidean case

Abstract

We study the connections between ordinary differential equations and optimization algorithms in a non-Euclidean setting. We propose a novel accelerated algorithm for minimising convex functions over a convex constrained set. This algorithm is a natural generalization of Nesterov's accelerated gradient descent method to the non-Euclidean setting and can be interpreted as an additive Runge-Kutta algorithm. The algorithm can also be derived as a numerical discretization of the ODE appearing in Krichene et al. (2015a). We use Lyapunov functions to establish convergence rates for the ODE and show that the discretizations considered achieve acceleration beyond the setting studied in Krichene et al. (2015a). Finally, we discuss how the proposed algorithm connects to various equations and algorithms in the literature.

Paper Structure

This paper contains 25 sections, 6 theorems, 96 equations, 6 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

If Assumptions as:cond_mir1 and ass:two hold, then:

Figures (6)

  • Figure 1: An illustration of one step of the Nesterov algorithm in a Euclidean setting
  • Figure 2: Lemma \ref{['lemma']} for the case of the simplex. The left and right panels correspond to the primal and dual spaces. Each point $\zeta\in \mathbb{E}^\star$ is mapped by $\chi$ into a point $z=\chi(\zeta)\in{\rm ri}({\cal X})=\Delta_{+}$; the image $\nabla \phi(z)$ does not in general coincide with $\zeta$, but $\zeta$ and $\nabla \phi(z)$ differ in an element in $\cal N$. The straight lines of slope 1 partition the dual space; each line is mapped into a single point by $\chi$.
  • Figure 3: Non strongly convex objective function, $f(x_k)-f(x^\star)$ vs. $k$. The dotted lines have slopes corresponding to decays $1/k$ and $1/k^2$.
  • Figure 4: Quadratic objective function, $f(x_k)-f(x^\star)$ vs. $k$. The dotted line has a slope corresponding to a decay $1/k^2$.
  • Figure 5: Quadratic objective function, larger learning rates, $f(x_k)-f(x^\star)$ vs. $k$. The dotted line has the same equation as the reference line in Figure \ref{['fig:secondexperiment']} so as to make it easy to compare both figures.
  • ...and 1 more figures

Theorems & Definitions (15)

  • Lemma 1
  • proof
  • Example 1
  • Remark 1
  • Theorem 1
  • Theorem 2
  • proof
  • Remark 2
  • Theorem 3
  • proof
  • ...and 5 more