Table of Contents
Fetching ...

Computing Optimal Transport Maps and Wasserstein Barycenters Using Conditional Normalizing Flows

Gabriele Visentin, Patrick Cheridito

TL;DR

The paper introduces a primal, generative approach to compute optimal transport maps and Wasserstein-2 barycenters in high dimensions by training conditional normalizing flows that map input distributions to a common latent space. By enforcing pushforward constraints through likelihood-based objectives, the method avoids dual/adversarial training and yields both OT maps and a generative model of the barycenter with h(Z)=∑_s w_s f(Z,s). Theoretical results connect OT distances to L^p(λ) differences and show that the barycenter arises as a conditional expectation minimizing variance, enabling scalable computation for hundreds of input distributions. Empirically, the approach achieves high accuracy across Gaussian, uniform, Swiss-roll, MNIST, and large-n datasets, often outperforming state-of-the-art baselines in both quality and scalability. The framework thus enables practical, sample-efficient, and scalable barycenter construction and transport in high-dimensional settings, with broad applicability in statistics, imaging, and fairness.

Abstract

We present a novel method for efficiently computing optimal transport maps and Wasserstein barycenters in high-dimensional spaces. Our approach uses conditional normalizing flows to approximate the input distributions as invertible pushforward transformations from a common latent space. This makes it possible to directly solve the primal problem using gradient-based minimization of the transport cost, unlike previous methods that rely on dual formulations and complex adversarial optimization. We show how this approach can be extended to compute Wasserstein barycenters by solving a conditional variance minimization problem. A key advantage of our conditional architecture is that it enables the computation of barycenters for hundreds of input distributions, which was computationally infeasible with previous methods. Our numerical experiments illustrate that our approach yields accurate results across various high-dimensional tasks and compares favorably with previous state-of-the-art methods.

Computing Optimal Transport Maps and Wasserstein Barycenters Using Conditional Normalizing Flows

TL;DR

The paper introduces a primal, generative approach to compute optimal transport maps and Wasserstein-2 barycenters in high dimensions by training conditional normalizing flows that map input distributions to a common latent space. By enforcing pushforward constraints through likelihood-based objectives, the method avoids dual/adversarial training and yields both OT maps and a generative model of the barycenter with h(Z)=∑_s w_s f(Z,s). Theoretical results connect OT distances to L^p(λ) differences and show that the barycenter arises as a conditional expectation minimizing variance, enabling scalable computation for hundreds of input distributions. Empirically, the approach achieves high accuracy across Gaussian, uniform, Swiss-roll, MNIST, and large-n datasets, often outperforming state-of-the-art baselines in both quality and scalability. The framework thus enables practical, sample-efficient, and scalable barycenter construction and transport in high-dimensional settings, with broad applicability in statistics, imaging, and fairness.

Abstract

We present a novel method for efficiently computing optimal transport maps and Wasserstein barycenters in high-dimensional spaces. Our approach uses conditional normalizing flows to approximate the input distributions as invertible pushforward transformations from a common latent space. This makes it possible to directly solve the primal problem using gradient-based minimization of the transport cost, unlike previous methods that rely on dual formulations and complex adversarial optimization. We show how this approach can be extended to compute Wasserstein barycenters by solving a conditional variance minimization problem. A key advantage of our conditional architecture is that it enables the computation of barycenters for hundreds of input distributions, which was computationally infeasible with previous methods. Our numerical experiments illustrate that our approach yields accurate results across various high-dimensional tasks and compares favorably with previous state-of-the-art methods.

Paper Structure

This paper contains 26 sections, 8 theorems, 27 equations, 10 figures, 9 tables, 2 algorithms.

Key Result

Lemma 3.1

Let $p > 1$ and $\mu, \nu \in \mathcal{P}_{p, ac}(\mathbb{R}^d)$. Then, Kantorovich's problem eq:kantorovich has a unique solution $\gamma \in \Gamma(\mu, \nu)$. Moreover, $\gamma$ is of the form ${(\text{id}, T)}_{\#} \mu$, where $T \in B(\mu, \nu)$ is a $\mu$-almost surely unique solution to Monge

Figures (10)

  • Figure 1: $\mathbb{W}_2^2$ relative error on high-dimensional Gaussian data.
  • Figure 2: Samples from the true input distributions (first row) and from the input distributions as learned by our model (second row).
  • Figure 3: True barycenter (left) and learned barycenter (right).
  • Figure 4: Samples from the input distributions $\mu_s$ (first row) and their image after being transported to the barycenter (second row).
  • Figure 5: Sample from learned barycenter transported to the two input distributions.
  • ...and 5 more figures

Theorems & Definitions (13)

  • Lemma 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Lemma 1.1: Change of variables
  • proof
  • Lemma 1.2
  • proof
  • Lemma 1.3
  • proof
  • Lemma 1.4
  • ...and 3 more