Table of Contents
Fetching ...

Progressive Entropic Optimal Transport Solvers

Parnian Kassraie, Aram-Alexandre Pooladian, Michal Klein, James Thornton, Jonathan Niles-Weed, Marco Cuturi

TL;DR

ProgOT is a faster and more robust alternative to standard solvers when computing couplings at large scales, even outperforming neural network-based approaches and statistical consistency of the approach for estimating optimal transport maps is proved.

Abstract

Optimal transport (OT) has profoundly impacted machine learning by providing theoretical and computational tools to realign datasets. In this context, given two large point clouds of sizes $n$ and $m$ in $\mathbb{R}^d$, entropic OT (EOT) solvers have emerged as the most reliable tool to either solve the Kantorovich problem and output a $n\times m$ coupling matrix, or to solve the Monge problem and learn a vector-valued push-forward map. While the robustness of EOT couplings/maps makes them a go-to choice in practical applications, EOT solvers remain difficult to tune because of a small but influential set of hyperparameters, notably the omnipresent entropic regularization strength $\varepsilon$. Setting $\varepsilon$ can be difficult, as it simultaneously impacts various performance metrics, such as compute speed, statistical performance, generalization, and bias. In this work, we propose a new class of EOT solvers (ProgOT), that can estimate both plans and transport maps. We take advantage of several opportunities to optimize the computation of EOT solutions by dividing mass displacement using a time discretization, borrowing inspiration from dynamic OT formulations, and conquering each of these steps using EOT with properly scheduled parameters. We provide experimental evidence demonstrating that ProgOT is a faster and more robust alternative to standard solvers when computing couplings at large scales, even outperforming neural network-based approaches. We also prove statistical consistency of our approach for estimating optimal transport maps.

Progressive Entropic Optimal Transport Solvers

TL;DR

ProgOT is a faster and more robust alternative to standard solvers when computing couplings at large scales, even outperforming neural network-based approaches and statistical consistency of the approach for estimating optimal transport maps is proved.

Abstract

Optimal transport (OT) has profoundly impacted machine learning by providing theoretical and computational tools to realign datasets. In this context, given two large point clouds of sizes and in , entropic OT (EOT) solvers have emerged as the most reliable tool to either solve the Kantorovich problem and output a coupling matrix, or to solve the Monge problem and learn a vector-valued push-forward map. While the robustness of EOT couplings/maps makes them a go-to choice in practical applications, EOT solvers remain difficult to tune because of a small but influential set of hyperparameters, notably the omnipresent entropic regularization strength . Setting can be difficult, as it simultaneously impacts various performance metrics, such as compute speed, statistical performance, generalization, and bias. In this work, we propose a new class of EOT solvers (ProgOT), that can estimate both plans and transport maps. We take advantage of several opportunities to optimize the computation of EOT solutions by dividing mass displacement using a time discretization, borrowing inspiration from dynamic OT formulations, and conquering each of these steps using EOT with properly scheduled parameters. We provide experimental evidence demonstrating that ProgOT is a faster and more robust alternative to standard solvers when computing couplings at large scales, even outperforming neural network-based approaches. We also prove statistical consistency of our approach for estimating optimal transport maps.
Paper Structure (19 sections, 12 theorems, 52 equations, 16 figures, 6 tables, 2 algorithms)

This paper contains 19 sections, 12 theorems, 52 equations, 16 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

Suppose $\mu \in {\mathcal{P}}_{2,\text{ac}}(\Omega)$ and $\nu \in {\mathcal{P}}_2(\Omega)$. Then there exists a unique solution to eq:monge that is of the form $T_0 = \mathop{\mathrm{Id}}\nolimits - \nabla h^* \circ \nabla f_0$, where $h^*$ is the convex-conjugate of $h$, i.e. $h^*(y)\coloneqq \max where ${\mathcal{F}} \coloneqq \{(f,g) \in L^1(\mu)\times L^1(\nu) : \ f(x)+g(y)\leq h(x-y),\, \for

Figures (16)

  • Figure 1: (left) EOT solvers collapse when the value of $\varepsilon$ is not properly chosen. This typically results in biased map estimators and in blurry couplings (see Fig. \ref{['fig:couplings']} for the coupling matrix obtained between ${\mathbf{x}}_{\text{train}}$ and ${\mathbf{y}}_{\text{train}}$). (middle) Debiasing the output of EOT solvers can prevent a collapse to the mean seen in EOT estimators, but computes the same coupling. ProgOT(right) ameliorates these problems in various ways: by decomposing the resolution of the OT problem into multiple time steps, and using various forms of progressive scheduling, we recover both a coupling whose entropy can be tuned automatically and a map estimator that is fast and reliable.
  • Figure 2: Coupling matrices between train points in Fig. \ref{['fig:figure1']}. Comparison of EOT with a fairly large $\varepsilon$, and ProgOT which automatically tunes the entropy of its coupling according to the target point cloud's dispersion.
  • Figure 3: Intuition of ProgOT: By iteratively fitting to the interpolation path, the final transport step is less likely to collapse, resulting in more stable solver.
  • Figure 4: (A) Convergence of $\mathscr{T}_{\mathrm{Prog}}$ to the ground-truth map w.r.t. the empirical L2 norm, for $d = 4$. (B) Effect of scheduling $\alpha_k$, for $d = 64$. (C) Effect of scheduling $\varepsilon_k$ using \ref{['alg:eps_sched']}, for $d = 64$.
  • Figure 5: Performance as a coupling solver on the 4i dataset. ProgOT returns better couplings, in terms of the OT cost and the entropy, for a fraction of sinkhorn1964relationship iterations, while still returning a coupling that has the same deviation to the original marginals. The (top) row is computed using $h=\|.\|^2_2$, the (bottom) row shows results for the cost $h = \tfrac{1}{p}\|\cdot\|^p_p$ where $p=1.5$.
  • ...and 11 more figures

Theorems & Definitions (21)

  • Theorem 1: Bre91's Theorem Bre91
  • Definition 2: ProgOT
  • Theorem 3: Consistency of Progressive Entropic Maps
  • proof : Proof sketch
  • Proposition 4: Stability of entropic maps with variations in the source measure
  • Proposition 5
  • Lemma 6
  • proof : Proof of \ref{['lem:lipsch_bound']}
  • Lemma 7
  • proof : Proof of \ref{['prop:stability_phi']}
  • ...and 11 more