Table of Contents
Fetching ...

Large-Scale Optimal Transport and Mapping Estimation

Vivien Seguy, Bharath Bhushan Damodaran, Rémi Flamary, Nicolas Courty, Antoine Rolet, Mathieu Blondel

TL;DR

The paper addresses learning an optimal map between distributions by first computing a regularized OT plan via a scalable dual stochastic gradient method and then extracting a Monge map by barycentric projection, approximated with a neural network. It proves convergence results showing that regularized OT plans and their barycentric projections converge to the true OT plan and Monge map for underlying continuous measures as sample size grows and regularization vanishes. The approach enables out-of-sample mapping and scalable domain adaptation, demonstrated on large-scale DA tasks and Generative Optimal Transport, where a learned Monge map acts as a generator. This yields a principled, scalable framework for mapping between distributions with practical impact on domain adaptation and generative modeling. The work provides theoretical guarantees and empirical evidence that hybrid OT-based mapping can handle continuous measures and large datasets efficiently.

Abstract

This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach when the amount of samples is very large. Second, we estimate a \textit{Monge map} as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. This parameterization allows generalization of the mapping outside the support of the input measure. We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures. We showcase our proposed approach on two applications: domain adaptation and generative modeling.

Large-Scale Optimal Transport and Mapping Estimation

TL;DR

The paper addresses learning an optimal map between distributions by first computing a regularized OT plan via a scalable dual stochastic gradient method and then extracting a Monge map by barycentric projection, approximated with a neural network. It proves convergence results showing that regularized OT plans and their barycentric projections converge to the true OT plan and Monge map for underlying continuous measures as sample size grows and regularization vanishes. The approach enables out-of-sample mapping and scalable domain adaptation, demonstrated on large-scale DA tasks and Generative Optimal Transport, where a learned Monge map acts as a generator. This yields a principled, scalable framework for mapping between distributions with practical impact on domain adaptation and generative modeling. The work provides theoretical guarantees and empirical evidence that hybrid OT-based mapping can handle continuous measures and large datasets efficiently.

Abstract

This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach when the amount of samples is very large. Second, we estimate a \textit{Monge map} as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. This parameterization allows generalization of the mapping outside the support of the input measure. We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures. We showcase our proposed approach on two applications: domain adaptation and generative modeling.

Paper Structure

This paper contains 14 sections, 4 theorems, 23 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Let $\mu\in P(\mathcal{X})$ and $\nu \in P(\mathcal{Y})$ where $\mathcal{X}$ and $\mathcal{Y}$ are complete metric spaces. Let $\mu_n = \sum_{i=1}^n a_i\delta_{x_i}$ and $\nu_n =\sum_{j=1}^n b_j\delta_{y_j}$ be discrete probability measures which converge weakly to $\mu$ and $\nu$ respectively, and

Figures (4)

  • Figure 1: Example of estimated optimal map between a continuous Gaussian distribution (colored level sets) and a multi-modal discrete measure (red +). (left) Continuous source and discrete target distributions. (center left) displacement field of the estimated optimal map: each arrow is proportional to $f(x_i)-x_i$ where $(x_i)$ is a uniform discrete grid. (center right) Generated samples obtained by sampling from the source distribution and applying our estimated Monge map$f$. (right) Level sets of the resulting density (approximated as a 2D histogram over $10^6$ samples).
  • Figure 2: Convergence plots of the the Stochastic Dual Algorithm \ref{['alg:OT']} against a stochastic semi-dual implementation (adapted from genevay2016stochastic: we use SGD instead of SAG), for several entropy-regularization values. Learning rates are $\{ 5.,20.,20. \}$ and batch sizes $\{ 1024,500,100 \}$ respectively and are taken the same for the dual and semi-dual methods.
  • Figure 3: Illustration of the OT Domain Adaptation method adapted from courty2017domain. Source samples are mapped to the target set through the barycentric projection $\bar{\pi}^\varepsilon$. A classifier is then learned on the mapped source samples.
  • Figure 4: Samples generated by our optimal generator learned through Algorithms \ref{['alg:OT']} and \ref{['alg:barycentric_projection']}.

Theorems & Definitions (8)

  • Theorem 1
  • Theorem 1
  • Definition 1
  • Theorem 2
  • Corollary 1
  • proof
  • proof
  • proof