Table of Contents
Fetching ...

Optimal transport natural gradient for statistical manifolds with continuous sample space

Yifan Chen, Wuchen Li

TL;DR

The paper addresses parametric density estimation with continuous samples by introducing a Wasserstein geometry on the parameter space through a pull-back of the $L^2$-Wasserstein metric, forming the Wasserstein statistical manifold. It derives the Wasserstein natural gradient, provides an explicit 1D metric, and proves that gradient descent with this metric behaves like Newton’s method for Wasserstein-distance minimization, supported by a Hessian formula. Through Gaussian, Gaussian mixture, and Gamma/Laplace examples, it demonstrates the computational and geometric benefits of Wasserstein natural gradient, including total geodesicity in the Gaussian case and non-geodesic behavior in mixtures, and contrasts with Fisher–Rao natural gradient. The work bridges optimal transport and information geometry, offering a robust preconditioning framework for Wasserstein-based inference and optimization in continuous-sample settings.

Abstract

We study the Wasserstein natural gradient in parametric statistical models with continuous sample spaces. Our approach is to pull back the $L^2$-Wasserstein metric tensor in the probability density space to a parameter space, equipping the latter with a positive definite metric tensor, under which it becomes a Riemannian manifold, named the Wasserstein statistical manifold. In general, it is not a totally geodesic sub-manifold of the density space, and therefore its geodesics will differ from the Wasserstein geodesics, except for the well-known Gaussian distribution case, a fact which can also be validated under our framework. We use the sub-manifold geometry to derive a gradient flow and natural gradient descent method in the parameter space. When parametrized densities lie in $\bR$, the induced metric tensor establishes an explicit formula. In optimization problems, we observe that the natural gradient descent outperforms the standard gradient descent when the Wasserstein distance is the objective function. In such a case, we prove that the resulting algorithm behaves similarly to the Newton method in the asymptotic regime. The proof calculates the exact Hessian formula for the Wasserstein distance, which further motivates another preconditioner for the optimization process. To the end, we present examples to illustrate the effectiveness of the natural gradient in several parametric statistical models, including the Gaussian measure, Gaussian mixture, Gamma distribution, and Laplace distribution.

Optimal transport natural gradient for statistical manifolds with continuous sample space

TL;DR

The paper addresses parametric density estimation with continuous samples by introducing a Wasserstein geometry on the parameter space through a pull-back of the -Wasserstein metric, forming the Wasserstein statistical manifold. It derives the Wasserstein natural gradient, provides an explicit 1D metric, and proves that gradient descent with this metric behaves like Newton’s method for Wasserstein-distance minimization, supported by a Hessian formula. Through Gaussian, Gaussian mixture, and Gamma/Laplace examples, it demonstrates the computational and geometric benefits of Wasserstein natural gradient, including total geodesicity in the Gaussian case and non-geodesic behavior in mixtures, and contrasts with Fisher–Rao natural gradient. The work bridges optimal transport and information geometry, offering a robust preconditioning framework for Wasserstein-based inference and optimization in continuous-sample settings.

Abstract

We study the Wasserstein natural gradient in parametric statistical models with continuous sample spaces. Our approach is to pull back the -Wasserstein metric tensor in the probability density space to a parameter space, equipping the latter with a positive definite metric tensor, under which it becomes a Riemannian manifold, named the Wasserstein statistical manifold. In general, it is not a totally geodesic sub-manifold of the density space, and therefore its geodesics will differ from the Wasserstein geodesics, except for the well-known Gaussian distribution case, a fact which can also be validated under our framework. We use the sub-manifold geometry to derive a gradient flow and natural gradient descent method in the parameter space. When parametrized densities lie in , the induced metric tensor establishes an explicit formula. In optimization problems, we observe that the natural gradient descent outperforms the standard gradient descent when the Wasserstein distance is the objective function. In such a case, we prove that the resulting algorithm behaves similarly to the Newton method in the asymptotic regime. The proof calculates the exact Hessian formula for the Wasserstein distance, which further motivates another preconditioner for the optimization process. To the end, we present examples to illustrate the effectiveness of the natural gradient in several parametric statistical models, including the Gaussian measure, Gaussian mixture, Gamma distribution, and Laplace distribution.

Paper Structure

This paper contains 14 sections, 8 theorems, 87 equations, 10 figures, 4 tables.

Key Result

Proposition 1

Under assumption assume: elliptic, the solution $\Phi$ is unique modulo the addition of a spatially-constant function.

Figures (10)

  • Figure 1: Densities of Gaussian mixture distribution
  • Figure 2: Geodesic of Gaussian mixtures; left: in the Wasserstein statistical manifold; right: in the whole density space
  • Figure 3: objective value
  • Figure 4: Gamma density functions
  • Figure 5: Geodesic of Gamma distribution; left: in the Wasserstein statistical manifold; right: in the whole density space
  • ...and 5 more figures

Theorems & Definitions (20)

  • Definition 1: $L^2$-Wasserstein metric tensor
  • Proposition 1
  • proof
  • Definition 2: $L^2$-Wasserstein metric tensor in parameter space
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Remark 1
  • Remark 2
  • ...and 10 more