A Differential Equation Approach for Wasserstein GANs and Beyond

Zachariah Malik; Yu-Jui Huang

A Differential Equation Approach for Wasserstein GANs and Beyond

Zachariah Malik, Yu-Jui Huang

TL;DR

The paper develops a gradient-flow perspective for the Wasserstein-1 distance by formulating a distribution-dependent ODE on probability measures and deriving a forward-Euler discretization (W1-FE). It radially connects the gradient of the $W_1$ objective to the Kantorovich potential via a linear functional derivative, enabling a new class of generative methods that interpolate between WGAN and a discretized Wasserstein-1 flow. Empirically, increasing persistent training (K>1) accelerates convergence and improves results in several settings, though excessive persistence or poor Kantorovich-potentials can degrade performance and stability. The work advances theoretical understanding of Wasserstein-1 dynamics in GAN training and suggests practical guidelines for leveraging persistence in ODE-based generative modeling.

Abstract

This paper proposes a new theoretical lens to view Wasserstein generative adversarial networks (WGANs). To minimize the Wasserstein-1 distance between the true data distribution and our estimate of it, we derive a distribution-dependent ordinary differential equation (ODE) which represents the gradient flow of the Wasserstein-1 loss, and show that a forward Euler discretization of the ODE converges. This inspires a new class of generative models that naturally integrates persistent training (which we call W1-FE). When persistent training is turned off, we prove that W1-FE reduces to WGAN. When we intensify persistent training, W1-FE is shown to outperform WGAN in training experiments from low to high dimensions, in terms of both convergence speed and training results. Intriguingly, one can reap the benefits only when persistent training is carefully integrated through our ODE perspective. As demonstrated numerically, a naive inclusion of persistent training in WGAN (without relying on our ODE framework) can significantly worsen training results.

A Differential Equation Approach for Wasserstein GANs and Beyond

TL;DR

objective to the Kantorovich potential via a linear functional derivative, enabling a new class of generative methods that interpolate between WGAN and a discretized Wasserstein-1 flow. Empirically, increasing persistent training (K>1) accelerates convergence and improves results in several settings, though excessive persistence or poor Kantorovich-potentials can degrade performance and stability. The work advances theoretical understanding of Wasserstein-1 dynamics in GAN training and suggests practical guidelines for leveraging persistence in ODE-based generative modeling.

Abstract

Paper Structure (18 sections, 5 theorems, 42 equations, 7 figures, 1 algorithm)

This paper contains 18 sections, 5 theorems, 42 equations, 7 figures, 1 algorithm.

Introduction
Mathematical Preliminaries
Problem Formulation
A Discretization of ODE \ref{['Eq: ODE']}
A Comparison: W1-FE and WGAN
Numerical Experiments
Limitations
Conclusion
Impact Statement
Theoretical Results
Convexity of $W_{1}(\cdot,\mu_{d})$
Proof of Proposition \ref{['Prop: LFD of W1 distance']}
A Refined Arzela-Ascoli Result
Proof of Theorem \ref{['Th: Convergence of interpolating measure curve']}
More Experimental Results
...and 3 more sections

Key Result

Proposition 3.1

For any $\mu\in\mathcal{P}_1(\mathbb{R}^d)$, a Kantorovich potential $\varphi_{\mu}^{\mu_{\operatorname{d}}}$ (Definition def:KP) is a linear functional derivative of $J : \mathcal{P}_{1}(\mathbb{R}^d) \rightarrow \mathbb{R}$ in J at $\mu \in \mathcal{P}_{1}(\mathbb{R}^d)$ (Definition def:LFD with $

Figures (7)

Figure 1: Qualitative evolution of learning process. A sample from the target distribution is given in green, a sample from the initial distribution is in magenta, and the transport rays by the generator are given in the grey arrows. The generated samples lie at the head of each grey arrow.
Figure 2: $W_1$ loss of W1-FE-LP with persistency levels $K=1,3,5,10$ against training epoch (top) and wallclock time (bottom), respectively.
Figure 3: $1$-NN classifier accuracy against training epoch for domain adaptation from USPS to MNIST datasets.
Figure 4: FID against training epoch for various W1-FE-LP (top) and W1-LP (bottom) models on generating CIFAR-10 images. This demonstrates how persistent training may destabilize the training procedure in other WGANs.
Figure 5: Uncurated samples from various W1-FE-LP models across training.
...and 2 more figures

Theorems & Definitions (15)

Definition 2.1
Remark 2.1
Definition 3.1
Proposition 3.1
Theorem 4.1
Remark 4.1
Proposition 4.1
proof
Remark 4.2
Proposition 1.1
...and 5 more

A Differential Equation Approach for Wasserstein GANs and Beyond

TL;DR

Abstract

A Differential Equation Approach for Wasserstein GANs and Beyond

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (15)