Table of Contents
Fetching ...

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

David Martínez-Rubio, Christophe Roux, Sebastian Pokutta

TL;DR

This work analyzes two of the most fundamental algorithms in geodesically convex optimization: Riemannian gradient descent and (possibly inexact) Riemannian proximal point, and provides an implementable inexact proximal point algorithm, and proves several new useful properties of Riemannian proximal methods.

Abstract

In this work, we analyze two of the most fundamental algorithms in geodesically convex optimization: Riemannian gradient descent and (possibly inexact) Riemannian proximal point. We quantify their rates of convergence and produce different variants with several trade-offs. Crucially, we show the iterates naturally stay in a ball around an optimizer, of radius depending on the initial distance and, in some cases, on the curvature. In contrast, except for limited cases, previous works bounded the maximum distance between iterates and an optimizer only by assumption, leading to incomplete analyses and unquantified rates. We also provide an implementable inexact proximal point algorithm yielding new results on minmax problems, and we prove several new useful properties of Riemannian proximal methods: they work when positive curvature is present, the proximal operator does not move points away from any optimizer, and we quantify the smoothness of its induced Moreau envelope. Further, we explore beyond our theory with empirical tests.

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

TL;DR

This work analyzes two of the most fundamental algorithms in geodesically convex optimization: Riemannian gradient descent and (possibly inexact) Riemannian proximal point, and provides an implementable inexact proximal point algorithm, and proves several new useful properties of Riemannian proximal methods.

Abstract

In this work, we analyze two of the most fundamental algorithms in geodesically convex optimization: Riemannian gradient descent and (possibly inexact) Riemannian proximal point. We quantify their rates of convergence and produce different variants with several trade-offs. Crucially, we show the iterates naturally stay in a ball around an optimizer, of radius depending on the initial distance and, in some cases, on the curvature. In contrast, except for limited cases, previous works bounded the maximum distance between iterates and an optimizer only by assumption, leading to incomplete analyses and unquantified rates. We also provide an implementable inexact proximal point algorithm yielding new results on minmax problems, and we prove several new useful properties of Riemannian proximal methods: they work when positive curvature is present, the proximal operator does not move points away from any optimizer, and we quantify the smoothness of its induced Moreau envelope. Further, we explore beyond our theory with empirical tests.
Paper Structure (29 sections, 37 theorems, 95 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 29 sections, 37 theorems, 95 equations, 7 figures, 1 table, 1 algorithm.

Key Result

theorem 1

[proof:lemma:iterate_boundedness_RGD_1_L] Consider a manifold $\mathcal{M}_{\operatorname{LB}}\in{def:M_manifold_with_lower_bounded_curv}$, and $f\in\smc$ for $\mathcal{X}\stackrel{\mathrm{ def}}{=}{def:closed_ball}({def:optimizer}, \varphi{def:initial_distance}{def:zeta_geometric_constant}_{{def:in

Figures (7)

  • Figure 1: Comparison of \ref{['def:acronym_riemannian_inexact_proximal_point_algorithm']} and of \ref{['def:acronym_riemannian_gradient_descent']} with $\eta={\ref{['def:riemannian_smoothness_of_F']}}^{-1}$ for solving \ref{['eq:karcher']} in the hyperbolic space $\mathbb{H}^d$ with $n=1000$ centers and dimension $d=1000$ in terms of squared distance to the optimizer ${\ref{['def:optimizer']}}$. Smoothness ${\ref{['def:riemannian_smoothness_of_F']}}$ is taken for a set of diameter $O( {\ref{['def:initial_distance']}} )$. We observe monotonous decrease in distance in all of our experiments.
  • Figure 2: Comparison of \ref{['def:acronym_riemannian_inexact_proximal_point_algorithm']} and of \ref{['def:acronym_riemannian_gradient_descent']} with $\eta={\ref{['def:riemannian_smoothness_of_F']}}^{-1}$ and $\eta=({\ref{['def:riemannian_smoothness_of_F']}}{\ref{['def:zeta_geometric_constant']}}_{O({\ref{['def:initial_distance']}})})^{-1}$ for solving \ref{['eq:karcher']} in $\mathcal{S}_+^{100}$ with $n=1000$ centers, and dimension $d(d+1)/2=5050$ in terms of squared distance to the optimizer ${\ref{['def:optimizer']}}$. Smoothness ${\ref{['def:riemannian_smoothness_of_F']}}$ is taken for a set of diameter $O( {\ref{['def:initial_distance']}}{\ref{['def:zeta_geometric_constant']}}_{{\ref{['def:initial_distance']}}} )$. We observe monotonous decrease in distance in all of our experiments.
  • Figure 3: Corresponding plots in primal gap for \ref{['fig:loss_H', 'fig:loss_SPD']}.
  • Figure 4: Karcher Mean on $\mathbb{H}^d$: $d=500$, $n=1000$, error in squared distance to the optimizer (left) and primal gap (right).
  • Figure 5: Karcher Mean on $\mathbb{H}^d$: $d=1000$, $n=500$, error in squared distance to the optimizer (left) and primal gap (right).
  • ...and 2 more figures

Theorems & Definitions (63)

  • theorem 1
  • proposition 1
  • theorem 2
  • theorem 3: Non-smooth
  • proposition 3: Composite
  • proposition 3: RPPA
  • theorem 4
  • proposition 4
  • proposition 4: Implementing Min-Max RIPPA
  • lemma 4: Gradient of Moreau envelope
  • ...and 53 more