Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels

Fabian Altekrüger; Johannes Hertrich; Gabriele Steidl

Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels

Fabian Altekrüger, Johannes Hertrich, Gabriele Steidl

TL;DR

This paper proposes to approximate the backward scheme of Jordan, Kinderlehrer and Otto for computing such Wasserstein gradient flows as well as a forward scheme for so-calledWasserstein steepest descent flows by neural networks (NNs).

Abstract

Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure as singular measures can become absolutely continuous ones and conversely. In this paper we contribute to the understanding of such flows. We propose to approximate the backward scheme of Jordan, Kinderlehrer and Otto for computing such Wasserstein gradient flows as well as a forward scheme for so-called Wasserstein steepest descent flows by neural networks (NNs). Since we cannot restrict ourselves to absolutely continuous measures, we have to deal with transport plans and velocity plans instead of usual transport maps and velocity fields. Indeed, we approximate the disintegration of both plans by generative NNs which are learned with respect to appropriate loss functions. In order to evaluate the quality of both neural schemes, we benchmark them on the interaction energy. Here we provide analytic formulas for Wasserstein schemes starting at a Dirac measure and show their convergence as the time step size tends to zero. Finally, we illustrate our neural MMD flows by numerical examples.

Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels

TL;DR

Abstract

Paper Structure (27 sections, 14 theorems, 110 equations, 13 figures, 2 algorithms)

This paper contains 27 sections, 14 theorems, 110 equations, 13 figures, 2 algorithms.

Introduction
Contributions.
Related Work.
Outline.
Wasserstein Flows
Neural Backward Scheme
Neural Forward Scheme
Flows for the Interaction Energy
Numerical Examples
Comparison with Particle Flows.
Interaction Energy Flows with Benchmark
MMD Flows
Conclusions
Wasserstein Spaces as Geodesic Spaces
Disintegration of measures
...and 12 more sections

Key Result

Proposition 2.1

Let $\mathcal{F} \colon \mathcal{P}_2(\mathbb{R}^d) \to \mathbb{R}$ be locally Lipschitz continuous and $\lambda$-convex along generalized geodesics. Then, there exist unique Wasserstein steepest descent and gradient flows starting at $\mu \in\mathcal{P}_2(\mathbb{R}^d)$ and these flows coincide.

Figures (13)

Figure 1: Neural backward (top) and forward (bottom) schemes for the Wasserstein flow of the MMD with distance kernel starting in exactly two points 'sampled' from $\delta_{(-0.5,0)} + \delta_{(0.5,0)}$ toward the 2D density 'Max und Moritz' (Drawing by Wilhelm Busch top right and a sampled version bottom right).
Figure 2: Visualization of the different convergence behavior $\gamma_\tau \to \gamma$ as $\tau \to 0$ in Theorem \ref{['thm:jko-inter-flow']} via $f_\tau(n\tau) \to f(n\tau)$, $n= 0,1,\ldots$ in Remark \ref{['rem:vis']} for $\mathcal{F} = \mathcal{E}_K$ and Riesz kernels with $r \in \{0.5,1,1.5\}$.
Figure 3: Comparison of different approaches for approximating the Wasserstein gradient flow of $\mathcal{E}_K$ with step size $\tau = 0.05$. From top to bottom: limit curve, neural backward scheme, neural forward scheme and particle flow. The black circle is the border of the limit $\mathop{\mathrm{supp}}\nolimits \, \gamma(t)$. Here the forward flow shows the best fit. While our neural flows start in a single point, the particle flow starts with uniform samples in a square of radius $10^{-9}$, a structure which remains visible over the time.
Figure 4: Discrepancy between the analytic Wasserstein flow of $\mathcal{E}_K$ and its approximations for $\tau = 0.05$. Left: dimension $d=2$ and different exponents of the Riesz kernel. Note that the neural forward flow only exists for $r=1$, where it gives the best approximation. For $r=0.5$ the neural backward scheme and the particle flow approximate the limit curve nearly similar, while the neural backward scheme performs better for $r=1.5$ which is due to the relatively large time step size. Right: Different dimensions $d \in \{ 3, 10, 1000 \}$ and $r=1$. While the particle flow suffers from the inexact initial samples in lower dimensions, it performs very well in higher dimensions. The neural forward scheme gives a more accurate approximation than the neural backward scheme.
Figure 5: Samples and their trajectories from MNIST starting in $\delta_{x}$ for $x=0.5 \cdot \mathbf{1}_{784}$. The inexact starting of the particle flow leads to noisy images at the beginning.
...and 8 more figures

Theorems & Definitions (25)

Proposition 2.1
Remark 2.2
Lemma 3.1
proof
Remark 4.1
Theorem 5.1
Theorem 5.2
Remark 5.3: Illustration of Theorem \ref{['thm:jko-inter-flow']}
Theorem 5.4
Theorem 2.1
...and 15 more

Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels

TL;DR

Abstract

Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (25)