Table of Contents
Fetching ...

A geometric integration approach to smooth optimisation: Foundations of the discrete gradient method

Matthias J. Ehrhardt, Erlend S. Riis, Torbjørn Ringholm, Carola-Bibiane Schönlieb

Abstract

Discrete gradient methods are geometric integration techniques that can preserve the dissipative structure of gradient flows. Due to the monotonic decay of the function values, they are well suited for general convex and nonconvex optimisation problems. Both zero- and first-order algorithms can be derived from the discrete gradient method by selecting different discrete gradients. In this paper, we present a thorough analysis of the discrete gradient method for optimisation which provides a solid theoretical foundation. We show that the discrete gradient method is well-posed by proving the existence of iterates for any positive time step, as well as uniqueness in some cases, and propose an efficient method for solving the associated discrete gradient equation. Moreover, we establish an $O(1/k)$ convergence rate for convex objectives and prove linear convergence if instead the Polyak-Lojasiewicz inequality is satisfied. The analysis is carried out for three discrete gradients-the Gonzalez discrete gradient, the mean value discrete gradient, and the Itoh-Abe discrete gradient, as well as for a randomised Itoh-Abe method. Our theoretical results are illustrated with a variety of numerical experiments, and we furthermore demonstrate that the methods are robust with respect to stiffness.

A geometric integration approach to smooth optimisation: Foundations of the discrete gradient method

Abstract

Discrete gradient methods are geometric integration techniques that can preserve the dissipative structure of gradient flows. Due to the monotonic decay of the function values, they are well suited for general convex and nonconvex optimisation problems. Both zero- and first-order algorithms can be derived from the discrete gradient method by selecting different discrete gradients. In this paper, we present a thorough analysis of the discrete gradient method for optimisation which provides a solid theoretical foundation. We show that the discrete gradient method is well-posed by proving the existence of iterates for any positive time step, as well as uniqueness in some cases, and propose an efficient method for solving the associated discrete gradient equation. Moreover, we establish an convergence rate for convex objectives and prove linear convergence if instead the Polyak-Lojasiewicz inequality is satisfied. The analysis is carried out for three discrete gradients-the Gonzalez discrete gradient, the mean value discrete gradient, and the Itoh-Abe discrete gradient, as well as for a randomised Itoh-Abe method. Our theoretical results are illustrated with a variety of numerical experiments, and we furthermore demonstrate that the methods are robust with respect to stiffness.

Paper Structure

This paper contains 35 sections, 15 theorems, 80 equations, 10 figures, 2 tables.

Key Result

Proposition 1.1

If $V:\mathbb{R}^n \to \mathbb{R}$ is $L$-smooth, then for all $x, y \in \mathbb{R}^n$, the following holds.

Figures (10)

  • Figure 1: DG methods for linear systems with condition number $\kappa = 10$ (left) and $\kappa = 1,000$ (right). Convergence rate plotted as relative objective $[V(x^{k}) - V^*]/[V(x^0) - V^*]$. Linear rate is observed for all methods and is sensitive to condition number.
  • Figure 2: Comparison of observed convergence rate with theoretical convergence rate \ref{['eq:linear_rate']}, for randomised Itoh--Abe method applied to linear system with condition numbers $\kappa = 1.2$ (left) and $\kappa = 10$ (right). Average convergence rate and confidence intervals as estimated from 100 runs on the same system. The sharpness of the proven convergence rate is observed in both cases.
  • Figure 3: DG methods for linear systems with nontrivial kernel, and convergence rate plotted as relative objective. The function is not strongly convex but satisfies the PŁ inequality, yielding linear convergence rates.
  • Figure 4: CIA and RIA methods applied to linear system, with matrix entries created from uniform distribution. CIA with the time step $\tau = 1/[\sqrt{n} L]$ (orange, circle) performs better than the same method with heuristic time step $\tau = 2/L$ (blue, triangle), but worse than RIA. This is the reverse of what was observed in previous examples.
  • Figure 5: DG methods for $l_2$-regularised logistic regression \ref{['eq:logreg_problem']}. Convergence rate plotted as relative objective. The rates of randomised and cyclic Itoh--Abe methods almost coincide, and so do the mean value and Gonzalez discrete gradient methods.
  • ...and 5 more figures

Theorems & Definitions (37)

  • Definition 1.0.1: Discrete gradient
  • Definition 1.0.2: $L$-smooth
  • Proposition 1.1
  • proof
  • Definition 1.1.1: $\mu$-convex
  • Remark 3.2
  • Lemma 3.3
  • Remark 3.4
  • Proposition 3.5: Brouwer fixed point theorem
  • Theorem 3.6: Discrete gradient existence theorem
  • ...and 27 more