Table of Contents
Fetching ...

Score-based Generative Neural Networks for Large-Scale Optimal Transport

Mara Daniels, Tyler Maunu, Paul Hand

TL;DR

This work introduces a novel framework for learning the Sinkhorn coupling between two distributions in the form of a score-based generative model, and proves convergence of gradient descent with respect to network parameters in this formulation.

Abstract

We consider the fundamental problem of sampling the optimal transport coupling between given source and target distributions. In certain cases, the optimal transport plan takes the form of a one-to-one mapping from the source support to the target support, but learning or even approximating such a map is computationally challenging for large and high-dimensional datasets due to the high cost of linear programming routines and an intrinsic curse of dimensionality. We study instead the Sinkhorn problem, a regularized form of optimal transport whose solutions are couplings between the source and the target distribution. We introduce a novel framework for learning the Sinkhorn coupling between two distributions in the form of a score-based generative model. Conditioned on source data, our procedure iterates Langevin Dynamics to sample target data according to the regularized optimal coupling. Key to this approach is a neural network parametrization of the Sinkhorn problem, and we prove convergence of gradient descent with respect to network parameters in this formulation. We demonstrate its empirical success on a variety of large scale optimal transport tasks.

Score-based Generative Neural Networks for Large-Scale Optimal Transport

TL;DR

This work introduces a novel framework for learning the Sinkhorn coupling between two distributions in the form of a score-based generative model, and proves convergence of gradient descent with respect to network parameters in this formulation.

Abstract

We consider the fundamental problem of sampling the optimal transport coupling between given source and target distributions. In certain cases, the optimal transport plan takes the form of a one-to-one mapping from the source support to the target support, but learning or even approximating such a map is computationally challenging for large and high-dimensional datasets due to the high cost of linear programming routines and an intrinsic curse of dimensionality. We study instead the Sinkhorn problem, a regularized form of optimal transport whose solutions are couplings between the source and the target distribution. We introduce a novel framework for learning the Sinkhorn coupling between two distributions in the form of a score-based generative model. Conditioned on source data, our procedure iterates Langevin Dynamics to sample target data according to the regularized optimal coupling. Key to this approach is a neural network parametrization of the Sinkhorn problem, and we prove convergence of gradient descent with respect to network parameters in this formulation. We demonstrate its empirical success on a variety of large scale optimal transport tasks.

Paper Structure

This paper contains 17 sections, 13 theorems, 45 equations, 5 figures, 7 tables, 2 algorithms.

Key Result

Proposition 2.2

In the empirical setting of Definition def:reg-ot, the entropy regularized primal problem $K_\lambda(\pi)$ is $\lambda$-strongly convex in $l_1$ norm. The dual problem $J_\lambda(\varphi, \psi)$ is concave, unconstrained, and $\frac{1}{\lambda}$-strongly smooth in $l_\infty$ norm. Additionally, thes

Figures (5)

  • Figure 1: We use SCONES to sample the mean-squared-$L^2$ cost, entropy regularized optimal transport mapping between 2x downsampled CelebA images (Source) and unmodified CelebA images (Target) at $\lambda = 0.005$ regularization.
  • Figure 2: Samples generated by SCONES for entropy regularized optimal transport including the samples shown in Figure \ref{['fig:celeba32px-celeba']}. At regularization $\lambda=0.005$, optimal transportation with $L^2$ cost has a visible effect on generated images. This effect diminishes at increased regularization $\lambda=0.1$.
  • Figure 3: Comparison of Barycentric Projection Seguy_Damodaran_Flamary_Courty_Rolet_Blondel_2018 to SCONES for optimal transport between USPS and MNIST datasets of handwritten digits. (Left) Transporting MNIST to USPS. (Right) Transporting USPS to MNIST. Here, we show transportation of the $\chi^2$ regularized problem at $\lambda = 0.001$.
  • Figure 4: Entropy regularized, $\lambda=2$, $L^2$ cost SCONES and BP samples on transportation from a unit Gaussian source distribution to the Swiss Roll target distribution. For many samples, the Barycentric average lies off the manifold of high target density, whereas SCONES can separate multiple modes of the conditional coupling and correctly recover the target distribution.
  • Figure : Density Estimation.

Theorems & Definitions (22)

  • Definition 2.1: Regularized OT
  • Proposition 2.2
  • Proposition 2.3
  • Proposition 2.4
  • Theorem 4.2: Optimizing Neural Nets
  • Theorem 4.3: Stability of the OT Problem
  • Definition A.1: $f$-Divergences
  • Proposition A.2: Strong Convexity of $D_f$
  • proof
  • Proposition : \ref{['prop:f-div-dual']} -- Regularization with $f$-Divergences
  • ...and 12 more