Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

David Martínez-Rubio; Christophe Roux; Sebastian Pokutta

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

David Martínez-Rubio, Christophe Roux, Sebastian Pokutta

TL;DR

This work analyzes two of the most fundamental algorithms in geodesically convex optimization: Riemannian gradient descent and (possibly inexact) Riemannian proximal point, and provides an implementable inexact proximal point algorithm, and proves several new useful properties of Riemannian proximal methods.

Abstract

In this work, we analyze two of the most fundamental algorithms in geodesically convex optimization: Riemannian gradient descent and (possibly inexact) Riemannian proximal point. We quantify their rates of convergence and produce different variants with several trade-offs. Crucially, we show the iterates naturally stay in a ball around an optimizer, of radius depending on the initial distance and, in some cases, on the curvature. In contrast, except for limited cases, previous works bounded the maximum distance between iterates and an optimizer only by assumption, leading to incomplete analyses and unquantified rates. We also provide an implementable inexact proximal point algorithm yielding new results on minmax problems, and we prove several new useful properties of Riemannian proximal methods: they work when positive curvature is present, the proximal operator does not move points away from any optimizer, and we quantify the smoothness of its induced Moreau envelope. Further, we explore beyond our theory with empirical tests.

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

TL;DR

Abstract

Paper Structure (29 sections, 37 theorems, 95 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 29 sections, 37 theorems, 95 equations, 7 figures, 1 table, 1 algorithm.

Introduction
Outline
Preliminaries and notation
Related work
Riemannian Gradient Descent
Riemannian Proximal Methods
Riemannian Min-Max Methods
Convergence Results and Bounded Iterates
Riemannian Gradient Descent
Riemannian Proximal Methods
Experiments
Conclusion and Discussion
\ref{['def:acronym_riemannian_gradient_descent']} proofs
\ref{['def:acronym_riemannian_proximal_point_algorithm']} proofs
\ref{['def:acronym_riemannian_inexact_proximal_point_algorithm']} implementation via composite \ref{['def:acronym_riemannian_gradient_descent']} or \ref{['def:acronym_metric_projected_riemannian_gradient_descent']}
...and 14 more sections

Key Result

theorem 1

[proof:lemma:iterate_boundedness_RGD_1_L] Consider a manifold $\mathcal{M}_{\operatorname{LB}}\in{def:M_manifold_with_lower_bounded_curv}$, and $f\in\smc$ for $\mathcal{X}\stackrel{\mathrm{ def}}{=}{def:closed_ball}({def:optimizer}, \varphi{def:initial_distance}{def:zeta_geometric_constant}_{{def:in

Figures (7)

Figure 1: Comparison of \ref{['def:acronym_riemannian_inexact_proximal_point_algorithm']} and of \ref{['def:acronym_riemannian_gradient_descent']} with $\eta={\ref{['def:riemannian_smoothness_of_F']}}^{-1}$ for solving \ref{['eq:karcher']} in the hyperbolic space $\mathbb{H}^d$ with $n=1000$ centers and dimension $d=1000$ in terms of squared distance to the optimizer ${\ref{['def:optimizer']}}$. Smoothness ${\ref{['def:riemannian_smoothness_of_F']}}$ is taken for a set of diameter $O( {\ref{['def:initial_distance']}} )$. We observe monotonous decrease in distance in all of our experiments.
Figure 2: Comparison of \ref{['def:acronym_riemannian_inexact_proximal_point_algorithm']} and of \ref{['def:acronym_riemannian_gradient_descent']} with $\eta={\ref{['def:riemannian_smoothness_of_F']}}^{-1}$ and $\eta=({\ref{['def:riemannian_smoothness_of_F']}}{\ref{['def:zeta_geometric_constant']}}_{O({\ref{['def:initial_distance']}})})^{-1}$ for solving \ref{['eq:karcher']} in $\mathcal{S}_+^{100}$ with $n=1000$ centers, and dimension $d(d+1)/2=5050$ in terms of squared distance to the optimizer ${\ref{['def:optimizer']}}$. Smoothness ${\ref{['def:riemannian_smoothness_of_F']}}$ is taken for a set of diameter $O( {\ref{['def:initial_distance']}}{\ref{['def:zeta_geometric_constant']}}_{{\ref{['def:initial_distance']}}} )$. We observe monotonous decrease in distance in all of our experiments.
Figure 3: Corresponding plots in primal gap for \ref{['fig:loss_H', 'fig:loss_SPD']}.
Figure 4: Karcher Mean on $\mathbb{H}^d$: $d=500$, $n=1000$, error in squared distance to the optimizer (left) and primal gap (right).
Figure 5: Karcher Mean on $\mathbb{H}^d$: $d=1000$, $n=500$, error in squared distance to the optimizer (left) and primal gap (right).
...and 2 more figures

Theorems & Definitions (63)

theorem 1
proposition 1
theorem 2
theorem 3: Non-smooth
proposition 3: Composite
proposition 3: RPPA
theorem 4
proposition 4
proposition 4: Implementing Min-Max RIPPA
lemma 4: Gradient of Moreau envelope
...and 53 more

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

TL;DR

Abstract

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (63)