Table of Contents
Fetching ...

On the Existence of Optimal Transport Gradient for Learning Generative Models

Antoine Houdard, Arthur Leclaire, Nicolas Papadakis, Julien Rabin

TL;DR

The paper probes the existence of gradients for optimal transport costs in learning generative models, revealing that the standard envelope-based gradient can fail in unregularized OT frameworks. It shows that entropic regularization restores differentiability, provides an explicit gradient expression via the $c,\lambda$-transform and Kantorovich potentials, and proves $W_c^\lambda(\theta)$ is $C^1$ under mild conditions. To make the approach practical, it specializes to a semi-discrete setting with discrete data, deriving a tractable algorithm that updates the generator parameters using a stochastic gradient informed by $\psi^{c,\lambda}$ and data samples. Numerical experiments on synthetic examples and MNIST illustrate the method’s stability and its capacity to learn complex generative mappings, albeit with a trade-off between smoothing and fidelity controlled by the regularization parameter $\lambda$.

Abstract

The use of optimal transport cost for learning generative models has become popular with Wasserstein Generative Adversarial Networks (WGAN). Training of WGAN relies on a theoretical background: the calculation of the gradient of the optimal transport cost with respect to the generative model parameters. We first demonstrate that such gradient may not be defined, which can result in numerical instabilities during gradient-based optimization. We address this issue by stating a valid differentiation theorem in the case of entropic regularized transport and specify conditions under which existence is ensured. By exploiting the discrete nature of empirical data, we formulate the gradient in a semi-discrete setting and propose an algorithm for the optimization of the generative model parameters. Finally, we illustrate numerically the advantage of the proposed framework.

On the Existence of Optimal Transport Gradient for Learning Generative Models

TL;DR

The paper probes the existence of gradients for optimal transport costs in learning generative models, revealing that the standard envelope-based gradient can fail in unregularized OT frameworks. It shows that entropic regularization restores differentiability, provides an explicit gradient expression via the -transform and Kantorovich potentials, and proves is under mild conditions. To make the approach practical, it specializes to a semi-discrete setting with discrete data, deriving a tractable algorithm that updates the generator parameters using a stochastic gradient informed by and data samples. Numerical experiments on synthetic examples and MNIST illustrate the method’s stability and its capacity to learn complex generative mappings, albeit with a trade-off between smoothing and fidelity controlled by the regularization parameter .

Abstract

The use of optimal transport cost for learning generative models has become popular with Wasserstein Generative Adversarial Networks (WGAN). Training of WGAN relies on a theoretical background: the calculation of the gradient of the optimal transport cost with respect to the generative model parameters. We first demonstrate that such gradient may not be defined, which can result in numerical instabilities during gradient-based optimization. We address this issue by stating a valid differentiation theorem in the case of entropic regularized transport and specify conditions under which existence is ensured. By exploiting the discrete nature of empirical data, we formulate the gradient in a semi-discrete setting and propose an algorithm for the optimization of the generative model parameters. Finally, we illustrate numerically the advantage of the proposed framework.

Paper Structure

This paper contains 19 sections, 12 theorems, 54 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1

Let $\theta_0$ and $\psi^*_0$ verifying $W_c(\theta_0) = F(\psi^*_0,\theta_0)$. If $W_c$ and ${\theta \mapsto F(\psi^*_0,\theta)}$ are both differentiable at $\theta_0$, then

Figures (2)

  • Figure 1: Plot of the trajectory of the parameter $\theta^k$ during optimization of the generative model $g_\theta(z) = z - \theta$ for two training points $\{y_1,y_2\}$. Left: as predicted by Proposition \ref{['prop:counterexample']}, the process does not converge for the optimal transport $\text{OT}_c$ with quadratic cost $c = \|.\|^2$. Right: as supported by Theorem \ref{['thm:gc1']}, the training converges to the solution $\theta^*=(0,0.5)$ when considering regularized optimal transport $\text{OT}_c^\lambda$. See Section \ref{['sec:synth_solved']} for more details.
  • Figure 2: Random samples from generative models learned on the MNIST dataset with Alg. \ref{['alg:algo']}, for $3$ regularization parameters $\lambda$.

Theorems & Definitions (22)

  • Theorem 1: Envelop theorem
  • proof
  • Proposition 1
  • proof
  • Definition 1: OT cost with entropic regularization genevay2019thesis
  • Definition 2: regularized $c,\lambda$-transforms
  • Proposition 2: Semi-dual formulation of regularized transport
  • Theorem 2: Existence and uniqueness of the dual solution genevay2019thesis
  • Lemma 1
  • Proposition 3
  • ...and 12 more