Table of Contents
Fetching ...

Forward Euler for Wasserstein Gradient Flows: Breakdown and Regularization

Yewei Xu, Qin Li

TL;DR

The work shows that forward Euler discretization can qualitatively fail to approximate Wasserstein gradient flows, even for KL functionals against smooth targets, due to a discretization-induced loss of regularity. By introducing a Gaussian-regularized KL functional $F^$, the authors restore sufficient smoothness so that the Wasserstein gradient aligns with the L-derivative, enabling reliable discrete optimization via projected gradient descent and convergence to minimizers on bounded convex domains. The paper provides explicit counterexamples illustrating the breakdown of FE, along with theoretical guarantees and numerical experiments demonstrating the effectiveness of the regularized approach. This provides a practical pathway to robust explicit solvers for gradient flows in the space of probability measures and informs the development of blob/ensemble methods that depend on discretized Wasserstein dynamics.

Abstract

Wasserstein gradient flows have become a central tool for optimization problems over probability measures. A natural numerical approach is forward-Euler time discretization. We show, however, that even in the simple case where the energy functional is the Kullback-Leibler (KL) divergence against a smooth target density, forward-Euler can fail dramatically: the scheme does not converge to the gradient flow, despite the fact that the first variation $\nabla\frac{δF}{δρ}$ remains formally well defined at every step. We identify the root cause as a loss of regularity induced by the discretization, and prove that a suitable regularization of the functional restores the necessary smoothness, making forward-Euler a viable solver that converges in discrete time to the global minimizer.

Forward Euler for Wasserstein Gradient Flows: Breakdown and Regularization

TL;DR

The work shows that forward Euler discretization can qualitatively fail to approximate Wasserstein gradient flows, even for KL functionals against smooth targets, due to a discretization-induced loss of regularity. By introducing a Gaussian-regularized KL functional , the authors restore sufficient smoothness so that the Wasserstein gradient aligns with the L-derivative, enabling reliable discrete optimization via projected gradient descent and convergence to minimizers on bounded convex domains. The paper provides explicit counterexamples illustrating the breakdown of FE, along with theoretical guarantees and numerical experiments demonstrating the effectiveness of the regularized approach. This provides a practical pathway to robust explicit solvers for gradient flows in the space of probability measures and informs the development of blob/ensemble methods that depend on discretized Wasserstein dynamics.

Abstract

Wasserstein gradient flows have become a central tool for optimization problems over probability measures. A natural numerical approach is forward-Euler time discretization. We show, however, that even in the simple case where the energy functional is the Kullback-Leibler (KL) divergence against a smooth target density, forward-Euler can fail dramatically: the scheme does not converge to the gradient flow, despite the fact that the first variation remains formally well defined at every step. We identify the root cause as a loss of regularity induced by the discretization, and prove that a suitable regularization of the functional restores the necessary smoothness, making forward-Euler a viable solver that converges in discrete time to the global minimizer.

Paper Structure

This paper contains 27 sections, 31 theorems, 142 equations, 4 figures, 1 table.

Key Result

Theorem 2.5

Let $F$ be defined as in eqn:relative_entropy, and $\rho\in D(F)$. Write $\sigma=\frac{d\rho}{d\rho^*}$. Then: If $\rho$ and $\rho^*$ admit densities $e^{-V}$ and $e^{-U}$, respectively, then and

Figures (4)

  • Figure 1: Left: the pushforward map $T$. For $y\in(0,r)$, three preimages exist, marked by three bold-lines. Injectivity is resumed when $T$ is restricted to three non-intersecting domains. Right: a decomposition of $T$ into its components $T_1,T_2,T_3$. The blue arrows illustrate how points on the horizontal $x$-axis are mapped onto the vertical $y$-axis, and the labels at the top identify the domain of each $T_i$-s.
  • Figure 2: $\mathrm{KL}[\rho_n \vert \rho^*]$ as a function of time, plotted in semi-log scale. The blue (circle-marked), orange (square-marked), and green (triangle-marked) curves correspond to step sizes $h=0.1$, $h=0.01$, and $h=0.001$, respectively. The dashed black line represents the theoretical lower bound $0.019$ established in Theorem \ref{['thm:ex2_lowerbound']}.
  • Figure 3: The initial and final distribution (after $100$ iterations) computed using kernel density estimates of $2000$ samples that runs PGD iterates \ref{['eqn:PGD_def_measure']} for the regularized KL. The colored shading shows the empirical particle density (kernel density estimate), overlaid with the target distribution's contours (50%, 80%, 95%) in black. The outer white circle indicates the boundary of $\mathbb{B}_3$. The final distribution aligns well with the target distribution.
  • Figure 4: Evolution of the regularized energy $F^\varepsilon[\rho_n]-\min_{k \leq 100}F^\varepsilon[\rho_k]$ with step size $h=0.05$ and kernel width $\varepsilon=0.1$ for the first $90$ iterations. The exponential convergence rate confirms the dissipative behavior predicted by Theorem \ref{['prop:W2_PGD_rate']}.

Theorems & Definitions (57)

  • Definition 2.1: First Variation (FV)
  • Definition 2.2: Wasserstein-2 Distance
  • Definition 2.3: Geodesic Convexity
  • Definition 2.4: Wasserstein Differentiability
  • Theorem 2.5
  • Theorem 2.6
  • Remark 2.7
  • Definition 2.8: L-Differentiability
  • Proposition 2.9
  • Proposition 2.10
  • ...and 47 more