Table of Contents
Fetching ...

A general framework of Riemannian adaptive optimization methods with a convergence analysis

Hiroyuki Sakai, Hideaki Iiduka

TL;DR

The paper addresses stochastic optimization on Riemannian manifolds by introducing a general framework for adaptive methods on embedded submanifolds of $\,\mathbb{R}^d$, unifying algorithms such as SGD, AdaGrad, RMSProp, Adam, and AMSGrad via tangent-space projections. It presents RAMSGrad as a direct extension of AMSGrad to embedded submanifolds and provides convergence analyses for both constant and diminishing step sizes, including scenarios with increasing mini-batch sizes; the rates scale as $\mathcal{O}\left(\frac{1}{K}+\frac{1}{b}\right)$ for constant steps and $\mathcal{O}\left(\left(1+\frac{1}{b}\right)\frac{\log K}{\sqrt{K}}\right)$ for diminishing steps, with improvements when $b_k$ grows. The theoretical framework hinges on projecting adaptive updates onto the tangent spaces via $P_x$ and leveraging retraction-Lipschitz smoothness to establish descent. Numerical experiments on PCA (Stiefel) and LRMC (Grassmann) datasets demonstrate RAMSGrad and RAdam competitive performance, validating both the convergence theory and practical effectiveness on Riemannian optimization problems.

Abstract

This paper proposes a general framework of Riemannian adaptive optimization methods. The framework encapsulates several stochastic optimization algorithms on Riemannian manifolds and incorporates the mini-batch strategy that is often used in deep learning. Within this framework, we also propose AMSGrad on embedded submanifolds of Euclidean space. Moreover, we give convergence analyses valid for both a constant and a diminishing step size. Our analyses also reveal the relationship between the convergence rate and mini-batch size. In numerical experiments, we applied the proposed algorithm to principal component analysis and the low-rank matrix completion problem, which can be considered to be Riemannian optimization problems. Python implementations of the methods used in the numerical experiments are available at https://github.com/iiduka-researches/202408-adaptive.

A general framework of Riemannian adaptive optimization methods with a convergence analysis

TL;DR

The paper addresses stochastic optimization on Riemannian manifolds by introducing a general framework for adaptive methods on embedded submanifolds of , unifying algorithms such as SGD, AdaGrad, RMSProp, Adam, and AMSGrad via tangent-space projections. It presents RAMSGrad as a direct extension of AMSGrad to embedded submanifolds and provides convergence analyses for both constant and diminishing step sizes, including scenarios with increasing mini-batch sizes; the rates scale as for constant steps and for diminishing steps, with improvements when grows. The theoretical framework hinges on projecting adaptive updates onto the tangent spaces via and leveraging retraction-Lipschitz smoothness to establish descent. Numerical experiments on PCA (Stiefel) and LRMC (Grassmann) datasets demonstrate RAMSGrad and RAdam competitive performance, validating both the convergence theory and practical effectiveness on Riemannian optimization problems.

Abstract

This paper proposes a general framework of Riemannian adaptive optimization methods. The framework encapsulates several stochastic optimization algorithms on Riemannian manifolds and incorporates the mini-batch strategy that is often used in deep learning. Within this framework, we also propose AMSGrad on embedded submanifolds of Euclidean space. Moreover, we give convergence analyses valid for both a constant and a diminishing step size. Our analyses also reveal the relationship between the convergence rate and mini-batch size. In numerical experiments, we applied the proposed algorithm to principal component analysis and the low-rank matrix completion problem, which can be considered to be Riemannian optimization problems. Python implementations of the methods used in the numerical experiments are available at https://github.com/iiduka-researches/202408-adaptive.
Paper Structure (21 sections, 12 theorems, 88 equations, 32 figures, 16 tables, 2 algorithms)

This paper contains 21 sections, 12 theorems, 88 equations, 32 figures, 16 tables, 2 algorithms.

Key Result

Proposition 3.2

Suppose that Assumption asm:mainasm:Lipschitz holds. Then, for all $x\in M$ and $\eta\in T_xM$.

Figures (32)

  • Figure 1: Objective function value defined by \ref{['eq:pca']} versus number of iterations on the training set of the MNIST datasets.
  • Figure 2: Objective function value defined by \ref{['eq:pca']} versus number of iterations on the test set of the MNIST datasets.
  • Figure 3: Norm of the gradient of objective function defined by \ref{['eq:pca']} versus number of iterations on the training set of the MNIST datasets.
  • Figure 4: Norm of the gradient of objective function defined by \ref{['eq:pca']} versus number of iterations on the test set of the MNIST datasets.
  • Figure 5: Objective function value defined by \ref{['eq:pca']} versus number of iterations on the training set of the COIL100 datasets.
  • ...and 27 more figures

Theorems & Definitions (25)

  • Definition 2.1: Retraction
  • Proposition 3.2
  • Lemma 3.3
  • proof
  • Theorem 3.4
  • proof
  • Theorem 3.5
  • proof
  • Theorem 3.6
  • proof
  • ...and 15 more