Table of Contents
Fetching ...

Minibatch optimal transport distances; analysis and applications

Kilian Fatras, Younes Zine, Szymon Majewski, Rémi Flamary, Rémi Gribonval, Nicolas Courty

TL;DR

The paper tackles the scalability challenge of optimal transport by rigorously analyzing minibatch OT methods. It extends prior work to unbounded and nonuniform distributions, formalizes a general MBOT framework with reweighting and sampling laws, and introduces a debiased MBOT loss to restore distance-like properties. The authors prove concentration bounds, derive unbiased gradients for several OT kernels, and validate the approach through gradient flows, map learning, GANs, and large-scale color transfer, plus invariance properties for minibatch Gromov–Wasserstein. Collectively, the work provides both theoretical guarantees and practical utilities for large-scale distribution comparison and learning tasks. The results suggest MBOT as a scalable, theoretically sound toolkit for modern ML applications requiring OT-based objectives.

Abstract

Optimal transport distances have become a classic tool to compare probability distributions and have found many applications in machine learning. Yet, despite recent algorithmic developments, their complexity prevents their direct use on large scale datasets. To overcome this challenge, a common workaround is to compute these distances on minibatches i.e. to average the outcome of several smaller optimal transport problems. We propose in this paper an extended analysis of this practice, which effects were previously studied in restricted cases. We first consider a large variety of Optimal Transport kernels. We notably argue that the minibatch strategy comes with appealing properties such as unbiased estimators, gradients and a concentration bound around the expectation, but also with limits: the minibatch OT is not a distance. To recover some of the lost distance axioms, we introduce a debiased minibatch OT function and study its statistical and optimisation properties. Along with this theoretical analysis, we also conduct empirical experiments on gradient flows, generative adversarial networks (GANs) or color transfer that highlight the practical interest of this strategy.

Minibatch optimal transport distances; analysis and applications

TL;DR

The paper tackles the scalability challenge of optimal transport by rigorously analyzing minibatch OT methods. It extends prior work to unbounded and nonuniform distributions, formalizes a general MBOT framework with reweighting and sampling laws, and introduces a debiased MBOT loss to restore distance-like properties. The authors prove concentration bounds, derive unbiased gradients for several OT kernels, and validate the approach through gradient flows, map learning, GANs, and large-scale color transfer, plus invariance properties for minibatch Gromov–Wasserstein. Collectively, the work provides both theoretical guarantees and practical utilities for large-scale distribution comparison and learning tasks. The results suggest MBOT as a scalable, theoretically sound toolkit for modern ML applications requiring OT-based objectives.

Abstract

Optimal transport distances have become a classic tool to compare probability distributions and have found many applications in machine learning. Yet, despite recent algorithmic developments, their complexity prevents their direct use on large scale datasets. To overcome this challenge, a common workaround is to compute these distances on minibatches i.e. to average the outcome of several smaller optimal transport problems. We propose in this paper an extended analysis of this practice, which effects were previously studied in restricted cases. We first consider a large variety of Optimal Transport kernels. We notably argue that the minibatch strategy comes with appealing properties such as unbiased estimators, gradients and a concentration bound around the expectation, but also with limits: the minibatch OT is not a distance. To recover some of the lost distance axioms, we introduce a debiased minibatch OT function and study its statistical and optimisation properties. Along with this theoretical analysis, we also conduct empirical experiments on gradient flows, generative adversarial networks (GANs) or color transfer that highlight the practical interest of this strategy.

Paper Structure

This paper contains 47 sections, 18 theorems, 179 equations, 16 figures, 4 tables.

Key Result

Proposition 5

Denote $\mathcal{P}^{o,m}$ the set of all ordered $m$-tuples without repeated indices. Given a discrete uniform probability distribution $\mathbf{u}$, the reweighting function $w^\mathtt{U}$ and the probability law on m-tuples $P_{\mathbf{u}}^\mathtt{W}$, we have that our minibatch OT losses is equa where $C_{(I^o,J^o)}$ is the ground cost matrix between elements in $I^o$ and $J^o$.

Figures (16)

  • Figure 1: Illustration of optimal transport computation for GANs.
  • Figure 2: Several OT matrices between distributions with $n=20$ samples in 1D. The first row shows the minibatch OT matrices $\overline{\Pi}^W_\mathtt{W}(\mathbf{u}, \mathbf{u})$ for different values of $m$, the second row provides the shape of the distributions on the rows of $\overline{\Pi}^W_\mathtt{W}(\mathbf{u}, \mathbf{u})$. The two last columns correspond to classical entropic and quadratic regularized OT.
  • Figure 3: Several OT matrices between 2D distributions with $n=10$ samples. The first row shows the minibatch OT matrices $\overline{\Pi}^W_\mathtt{W}(\mathbf{u}, \mathbf{u})$ for different values of $m$. The second row provide a 2D visualization of where the mass is transported between the 2D positions of the sample.
  • Figure 4: Difference between transport plan estimators with 2D distributions and $n=5$ samples. Each column gives the OT plan $\overline{\Pi}^W_\mathtt{W}(\mathbf{u}, \mathbf{u})$ or $\overline{\Pi}^W_\mathtt{U}(\mathbf{u}, \mathbf{u})$ (top) and the shape of the distributions on the rows of the OT matrices (bottom).
  • Figure 5: Positivity counter example. (Left) source and target distribution for a given perturbation. (Middle and right) Comparison of different estimator values for $\Lambda_{W_1}$ and $\Lambda_{W_2}$ with an euclidean ground cost between the distributions. The red line is the y-axis equal to 0.
  • ...and 11 more figures

Theorems & Definitions (44)

  • Definition 1: OT Kernels
  • Definition 2: Reweighting and probability functions
  • Definition 3: Minibatch Wasserstein
  • Remark 4
  • Example 1: Uniform reweighting function
  • Example 2: Normalized reweighting function
  • Example 3: Drawing indices with replacement
  • Example 4: Drawing indices "without replacement"
  • Proposition 5: Minibatch OT loss fatras2019batchwass
  • Definition 6: minibatch transport plan
  • ...and 34 more