Table of Contents
Fetching ...

Random Coordinate Descent on the Wasserstein Space of Probability Measures

Yewei Xu, Qin Li

Abstract

Optimization over the space of probability measures endowed with the Wasserstein-2 geometry is central to modern machine learning and mean-field modeling. However, traditional methods relying on full Wasserstein gradients often suffer from high computational overhead in high-dimensional or ill-conditioned settings. We propose a randomized coordinate descent framework specifically designed for the Wasserstein manifold, introducing both Random Wasserstein Coordinate Descent (RWCD) and Random Wasserstein Coordinate Proximal{-Gradient} (RWCP) for composite objectives. By exploiting coordinate-wise structures, our methods adapt to anisotropic objective landscapes where full-gradient approaches typically struggle. We provide a rigorous convergence analysis across various landscape geometries, establishing guarantees under non-convex, Polyak-Łojasiewicz, and geodesically convex conditions. Our theoretical results mirror the classic convergence properties found in Euclidean space, revealing a compelling symmetry between coordinate descent on vectors and on probability measures. The developed techniques are inherently adaptive to the Wasserstein geometry and offer a robust analytical template that can be extended to other optimization solvers within the space of measures. Numerical experiments on ill-conditioned energies demonstrate that our framework offers significant speedups over conventional full-gradient methods.

Random Coordinate Descent on the Wasserstein Space of Probability Measures

Abstract

Optimization over the space of probability measures endowed with the Wasserstein-2 geometry is central to modern machine learning and mean-field modeling. However, traditional methods relying on full Wasserstein gradients often suffer from high computational overhead in high-dimensional or ill-conditioned settings. We propose a randomized coordinate descent framework specifically designed for the Wasserstein manifold, introducing both Random Wasserstein Coordinate Descent (RWCD) and Random Wasserstein Coordinate Proximal{-Gradient} (RWCP) for composite objectives. By exploiting coordinate-wise structures, our methods adapt to anisotropic objective landscapes where full-gradient approaches typically struggle. We provide a rigorous convergence analysis across various landscape geometries, establishing guarantees under non-convex, Polyak-Łojasiewicz, and geodesically convex conditions. Our theoretical results mirror the classic convergence properties found in Euclidean space, revealing a compelling symmetry between coordinate descent on vectors and on probability measures. The developed techniques are inherently adaptive to the Wasserstein geometry and offer a robust analytical template that can be extended to other optimization solvers within the space of measures. Numerical experiments on ill-conditioned energies demonstrate that our framework offers significant speedups over conventional full-gradient methods.

Paper Structure

This paper contains 26 sections, 13 theorems, 90 equations, 8 figures, 1 table, 2 algorithms.

Key Result

Proposition 7

Let $E : \mathcal{P}_2(\mathbb R^d) \to \mathbb R$ be differentiable. Fix $\mu \in \mathcal{P}_2(\mathbb R^d)$ and $i \in [d]$, $E$ has the directional regularity as in Definition def:coord_wise_smooth, then for any vector field $T \in L^2(\mu; \mathbb R^d)$, we have: $\blacktriangleleft$$\blacktriangleleft$

Figures (8)

  • Figure 1: Example 1: RWCD vs. WGD on the 2D quadratic objective \ref{['eqn:2D_exp']} using $N=2000$ particles. Horizontal axis: work measured in coordinate-gradient evaluations (one WGD step equals $d=2$ work units). Blue: RWCD median with $10$--$90\%$ band across $50$ runs; orange: WGD. The three panels respectively show $E[\mu_k]$, empirical mean of the first coordinate of $N$ samples, empirical mean of the second coordinate of $N$ samples.
  • Figure 2: Example 1: Particle cloud snapshots for the 2D quadratic experiment \ref{['eqn:2D_exp']}. Top row: RWCD (a median trial); bottom row: WGD. Snapshots are taken at work $=0, 8, 1200, 2400$, where one WGD iteration is counted as $d=2$ work units.
  • Figure 3: Example 2: RWCD vs. WGD for the quadratic potential and interaction objective with $N=2000$ and $d=50$. The horizontal axis represents work measured in coordinate-gradient evaluations. The blue curve shows the RWCD median with a $10$--$90\%$ confidence band across $50$ independent runs, while the orange curve represents WGD.
  • Figure 4: Example 3: RWCD vs. WGD for the MMD-type functional with an anisotropic Gaussian kernel in $d=50$. The horizontal axis represents work measured in coordinate-gradient evaluations. The blue curve denotes the RWCD median with a $10$--$90\%$ band across $50$ runs; the orange curve denotes WGD. Note that the RWCD band is visually indistinguishable from the median curve at this resolution.
  • Figure 5: Example 4: Coordinate-wise smoothness constants $H_i$ of the regularizer $\Psi_\epsilon$. The $y$-axis shows $H_i$ on a logarithmic scale, and the $x$-axis indexes the coordinates after sorting by $H_i$.
  • ...and 3 more figures

Theorems & Definitions (33)

  • Definition 1: Coordinate-wise smoothness
  • Definition 2: Convexity, strong convexity, and PL
  • Definition 3: Pushforward Measure
  • Definition 4: Wasserstein-2 Distance and Optimal Coupling
  • Definition 5: Wasserstein Gradient
  • Definition 6: Smoothness and coordinate-wise smoothness in Wasserstein space
  • Proposition 7: Coordinate-wise descent
  • Proposition 8: Examples of Coordinate-wise Smoothness
  • Definition 9: Geodesic Convexity
  • Definition 10: Polyak-Łojasiewicz Condition
  • ...and 23 more