Combining Wasserstein-1 and Wasserstein-2 proximals: robust manifold learning via well-posed generative flows

Hyemin Gu; Markos A. Katsoulakis; Luc Rey-Bellet; Benjamin J. Zhang

Combining Wasserstein-1 and Wasserstein-2 proximals: robust manifold learning via well-posed generative flows

Hyemin Gu, Markos A. Katsoulakis, Luc Rey-Bellet, Benjamin J. Zhang

TL;DR

This work develops a robust framework for learning high-dimensional distributions supported on low-dimensional manifolds by combining Wasserstein-1 and Wasserstein-2 proximal regularizations of $f$-divergences within continuous-time generative flows. An MFG formulation yields well-posed backward-forward PDEs with a backward Hamilton-Jacobi equation and forward continuity equation, guaranteeing unique, linear optimal trajectories under the combined proximals. The authors present an adversarial training algorithm based on the dual Wasserstein-1 proximal and a Benamou-Brenier interpretation of the Wasserstein-2 proximal, enabling forward-only training without trajectory inversion and enabling manifold learning without autoencoders. Numerical results on MNIST demonstrate discretization-invariant generation directly in the original space, underscoring robustness and practical efficiency. The framework provides a principled path to robust, manifold-aware generative modeling with potential extensions to stochastic flows and distributionally robust optimization.

Abstract

We formulate well-posed continuous-time generative flows for learning distributions that are supported on low-dimensional manifolds through Wasserstein proximal regularizations of $f$-divergences. Wasserstein-1 proximal operators regularize $f$-divergences so that singular distributions can be compared. Meanwhile, Wasserstein-2 proximal operators regularize the paths of the generative flows by adding an optimal transport cost, i.e., a kinetic energy penalization. Via mean-field game theory, we show that the combination of the two proximals is critical for formulating well-posed generative flows. Generative flows can be analyzed through optimality conditions of a mean-field game (MFG), a system of a backward Hamilton-Jacobi (HJ) and a forward continuity partial differential equations (PDEs) whose solution characterizes the optimal generative flow. For learning distributions that are supported on low-dimensional manifolds, the MFG theory shows that the Wasserstein-1 proximal, which addresses the HJ terminal condition, and the Wasserstein-2 proximal, which addresses the HJ dynamics, are both necessary for the corresponding backward-forward PDE system to be well-defined and have a unique solution with provably linear flow trajectories. This implies that the corresponding generative flow is also unique and can therefore be learned in a robust manner even for learning high-dimensional distributions supported on low-dimensional manifolds. The generative flows are learned through adversarial training of continuous-time flows, which bypasses the need for reverse simulation. We demonstrate the efficacy of our approach for generating high-dimensional images without the need to resort to autoencoders or specialized architectures.

Combining Wasserstein-1 and Wasserstein-2 proximals: robust manifold learning via well-posed generative flows

TL;DR

This work develops a robust framework for learning high-dimensional distributions supported on low-dimensional manifolds by combining Wasserstein-1 and Wasserstein-2 proximal regularizations of

-divergences within continuous-time generative flows. An MFG formulation yields well-posed backward-forward PDEs with a backward Hamilton-Jacobi equation and forward continuity equation, guaranteeing unique, linear optimal trajectories under the combined proximals. The authors present an adversarial training algorithm based on the dual Wasserstein-1 proximal and a Benamou-Brenier interpretation of the Wasserstein-2 proximal, enabling forward-only training without trajectory inversion and enabling manifold learning without autoencoders. Numerical results on MNIST demonstrate discretization-invariant generation directly in the original space, underscoring robustness and practical efficiency. The framework provides a principled path to robust, manifold-aware generative modeling with potential extensions to stochastic flows and distributionally robust optimization.

Abstract

We formulate well-posed continuous-time generative flows for learning distributions that are supported on low-dimensional manifolds through Wasserstein proximal regularizations of

-divergences. Wasserstein-1 proximal operators regularize

-divergences so that singular distributions can be compared. Meanwhile, Wasserstein-2 proximal operators regularize the paths of the generative flows by adding an optimal transport cost, i.e., a kinetic energy penalization. Via mean-field game theory, we show that the combination of the two proximals is critical for formulating well-posed generative flows. Generative flows can be analyzed through optimality conditions of a mean-field game (MFG), a system of a backward Hamilton-Jacobi (HJ) and a forward continuity partial differential equations (PDEs) whose solution characterizes the optimal generative flow. For learning distributions that are supported on low-dimensional manifolds, the MFG theory shows that the Wasserstein-1 proximal, which addresses the HJ terminal condition, and the Wasserstein-2 proximal, which addresses the HJ dynamics, are both necessary for the corresponding backward-forward PDE system to be well-defined and have a unique solution with provably linear flow trajectories. This implies that the corresponding generative flow is also unique and can therefore be learned in a robust manner even for learning high-dimensional distributions supported on low-dimensional manifolds. The generative flows are learned through adversarial training of continuous-time flows, which bypasses the need for reverse simulation. We demonstrate the efficacy of our approach for generating high-dimensional images without the need to resort to autoencoders or specialized architectures.

Paper Structure (28 sections, 3 theorems, 37 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 28 sections, 3 theorems, 37 equations, 3 figures, 1 table, 2 algorithms.

Introduction
Contributions
Related work
Wasserstein proximals
Wasserstein-2 proximal stabilizes training of generative flows
Wasserstein-1 proximal enables manifold learning
Manifold detecting properties.
Lipschitz-regularized $f$-divergences are smooth.
Combining Wasserstein-1 and Wasserstein-2 proximals
$\mathcal{W}_1\oplus \mathcal{W}_2$ generative flows as mean-field games
Background on continuous-time generative flows as mean-field games
MFG for $\mathcal{W}_1\oplus \mathcal{W}_2$ generative flows have well-defined optimality conditions and imply linear optimal trajectories
Wasserstein-1 proximal provides robust learning of distributions on manifolds
$f$-divergences fail to learn manifolds.
$\mathcal{W}_1$ proximal of $D_f$ provides robust manifold learning
...and 13 more sections

Key Result

Theorem 4.1

Let $\pi$ be an unknown arbitrary target measure, $\rho(x, 0) = \rho_0$ be a given reference measure, and $v: \mathbb{R}^d \times \mathbb{R} \rightarrow \mathbb{R}^d$ be a vector field. Fix a terminal time $T >0$ and $\lambda >0$. The optimization problem where $\rho(x,t)$ satisfies the continuity equation $\partial_t \rho + \nabla \cdot (\rho v) = 0,$$\rho(x,0) = \rho_0(x)$ has the following opt

Figures (3)

Figure 1: Optimality indicator \ref{['eq:indicator2']} blows up without $\mathcal{W}_1$ proximal (left). The composition of $\mathcal{W}_1$ and $\mathcal{W}_2$ proximals minimizes kinetic energy and keeps the value less oscillatory while training (right). See \ref{['sec:numerical:examples']} for details.
Figure 2: Evaluation of learning objectives (a - b) and optimality indicators (c - d) from MFG optimality conditions over the course of training. We learn the MNIST dataset using the four test cases with $T=5.0$, $h=1.0$. We choose $\lambda=0.05$ and $L=1$ for the weight paramters of Wasserstein-2 and Wasserstein-1 proximals, respectively. In Figure $\bf{(a)}$, observe that the terminal cost $\mathcal{F}(\rho(\cdot,T)) = D_f^{\Gamma_L}(\rho(\cdot,T) \| \pi)$ diverges (green, red) without the Wasserstein-1 proximal regularization. (b) The Wasserstein-2 proximal additionally regularizes the flow to have lower kinetic energy and to be less oscillatory training objectives. Less oscillation is also related to the uniqueness of the MFG solution in \ref{['thm:uniqueness']} which in turn is expected to render the algorithms more robust, i.e. in our context not susceptible to implementation-dependent choices. (c) As inferred from mean-field game, the learning problem without Wasserstein-1 proximal regularization lacks a well-defined terminal condition. The exploding optimality indicator \ref{['eq:indicator2']} exemplifies this behavior. (d)$\mathcal{W}_1\oplus\mathcal{W}_2$ proximal generative flow results in lower values of the optimality indicator \ref{['eq:indicator1']} compared to those from Wasserstein-1 proximal regularized flow. (c-d) show that the optimality indicators can inform when an optimal generative flow has been discovered.
Figure 3: Wasserstein-2 proximal regularization implies discretization invariance in generative flows. After learning the MNIST dataset $\mathcal{W}_1\oplus\mathcal{W}_2$ and $\mathcal{W}_1$ proximal generative flows with $T=5.0$, $h=1.0$, we generated samples by integrating the learned vector field $\dot{x}(t) = -\frac{1}{\lambda}\nabla U(x(t), t)$ over time with different time step sizes $h$. In (a), we see that Wasserstein-2 proximal regularization provides almost straight flow trajectories which leads to generated samples which are almost invariant to time discretization. This robustness of the continuous time flow generator ensures high fidelity of generated samples regardless of time discretization. This empirical observation is justified by the theoretical result in \ref{['eq:trajectories']} of Theorem \ref{['thm:main']}. On the other hand, we see in (b) that without Wasserstein-2 regularization, the resulting vector field is more sensitive to varying step sizes as certain digits may flip to other ones.

Theorems & Definitions (8)

Remark 3.1: Composition of Wasserstein proximals and its interpretation
Theorem 4.1: MFG for $\mathcal{W}_1\oplus\mathcal{W}_2$ proximal generative flow
proof
Theorem 4.2: Uniqueness of Wasserstein-1/Wasserstein-2 proximal generative flows
Lemma 4.1: Monotonicity of \ref{['eq:optimality:conditions2']}
proof
proof
Remark 4.1

Combining Wasserstein-1 and Wasserstein-2 proximals: robust manifold learning via well-posed generative flows

TL;DR

Abstract

Combining Wasserstein-1 and Wasserstein-2 proximals: robust manifold learning via well-posed generative flows

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (8)