Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space

Mingyang Yi; Bohan Wang

Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space

Mingyang Yi, Bohan Wang

TL;DR

The paper addresses optimization over probability measures in the second-order Wasserstein space by developing continuous stochastic Riemannian flows. It builds three flows—Riemannian GD, SGD, and SVRG—by mapping discrete Riemannian dynamics to Euclidean SDEs and describing their evolution via the Fokker-Planck equation, all while minimizing $D_{KL}(\pi||\mu)$. The authors prove convergence rates that align with Euclidean theory: $O(1/\sqrt{T})$ for Riemannian SGD and $O(N^{2/3}/\epsilon)$ for Riemannian SVRG, with global rates $O(1/T)$ and exponential decay under a log-Sobolev (Riemannian PL) inequality, respectively. They also connect these flows to Langevin-type sampling, establish discrete-to-continuous correspondences (including SGLD and SVRG-Langevin), and validate the theory through Gaussian and mixture-Gaussian experiments. Overall, the work provides a principled framework for continuous stochastic Riemannian optimization on Wasserstein space and offers analytical insights for analyzing discrete stochastic Riemannian algorithms.

Abstract

Recently, optimization on the Riemannian manifold have provided valuable insights to the optimization community. In this regard, extending these methods to to the Wasserstein space is of particular interest, since optimization on Wasserstein space is closely connected to practical sampling processes. Generally, the standard (continuous) optimization method on Wasserstein space is Riemannian gradient flow (i.e., Langevin dynamics when minimizing KL divergence). In this paper, we aim to enrich the family of continuous optimization methods in the Wasserstein space, by extending the gradient flow on it into the stochastic gradient descent (SGD) flow and stochastic variance reduction gradient (SVRG) flow. By leveraging the property of Wasserstein space, we construct stochastic differential equations (SDEs) to approximate the corresponding discrete Euclidean dynamics of the desired Riemannian stochastic methods. Then, we obtain the flows in Wasserstein space by Fokker-Planck equation. Finally, we establish convergence rates of the proposed stochastic flows, which align with those known in the Euclidean setting.

Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space

TL;DR

. The authors prove convergence rates that align with Euclidean theory:

for Riemannian SGD and

for Riemannian SVRG, with global rates

and exponential decay under a log-Sobolev (Riemannian PL) inequality, respectively. They also connect these flows to Langevin-type sampling, establish discrete-to-continuous correspondences (including SGLD and SVRG-Langevin), and validate the theory through Gaussian and mixture-Gaussian experiments. Overall, the work provides a principled framework for continuous stochastic Riemannian optimization on Wasserstein space and offers analytical insights for analyzing discrete stochastic Riemannian algorithms.

Abstract

Paper Structure (25 sections, 22 theorems, 149 equations, 3 figures, 2 algorithms)

This paper contains 25 sections, 22 theorems, 149 equations, 3 figures, 2 algorithms.

Introduction
Related Work
Riemannian Optimization.
Stochastic Sampling.
Preliminaries
Riemannian Gradient Flow
Constructing Riemannian Gradient Flow
Convergence of Riemannian GD Flow
Riemannian Stochastic Gradient Flow
Constructing Riemannian Stochastic Flow
Convergence of Riemannian SGD Flow
Riemannian Stochastic Variance Reduction Gradient Flow
Constructing Riemannian SVRG Flow
Convergence of Riemannian SVRG Flow
Experiments
...and 10 more sections

Key Result

Lemma 1

maoutsa2020interactingsong2020score The SDE $d\boldsymbol{x}_{t} = \boldsymbol{b}(\boldsymbol{x}_{t}, t)dt + \text{\boldmath{$G$}}(\boldsymbol{x}_{t}, t)dW_{t},$ has the same density with SODE where $\pi_{t}$We simplify $\log{(d\pi_{t} / d\boldsymbol{x})(\boldsymbol{x}_{t})}$ as $\log{\pi_{t}(\boldsymbol{x}_{t})}$ if there is no obfuscation in sequel. is the corresponded probability measure of $\

Figures (3)

Figure 1: Riemannian GD
Figure 2: Our idea to bridge the discrete dynamics to its continuous counterparts.
Figure 3: The convergence rates measured by KL divergence and Fisher divergence (Riemannian gradient norm) under different optimization methods.

Theorems & Definitions (38)

Lemma 1
Proposition 1
Proposition 2
Remark 1
Theorem 1
Proposition 3
Proposition 4
Theorem 2
Remark 2
Example 1
...and 28 more

Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space

TL;DR

Abstract

Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (38)