Large-Scale Optimal Transport and Mapping Estimation
Vivien Seguy, Bharath Bhushan Damodaran, Rémi Flamary, Nicolas Courty, Antoine Rolet, Mathieu Blondel
TL;DR
The paper addresses learning an optimal map between distributions by first computing a regularized OT plan via a scalable dual stochastic gradient method and then extracting a Monge map by barycentric projection, approximated with a neural network. It proves convergence results showing that regularized OT plans and their barycentric projections converge to the true OT plan and Monge map for underlying continuous measures as sample size grows and regularization vanishes. The approach enables out-of-sample mapping and scalable domain adaptation, demonstrated on large-scale DA tasks and Generative Optimal Transport, where a learned Monge map acts as a generator. This yields a principled, scalable framework for mapping between distributions with practical impact on domain adaptation and generative modeling. The work provides theoretical guarantees and empirical evidence that hybrid OT-based mapping can handle continuous measures and large datasets efficiently.
Abstract
This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach when the amount of samples is very large. Second, we estimate a \textit{Monge map} as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. This parameterization allows generalization of the mapping outside the support of the input measure. We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures. We showcase our proposed approach on two applications: domain adaptation and generative modeling.
