Table of Contents
Fetching ...

Efficient Transferable Optimal Transport via Min-Sliced Transport Plans

Xinran Liu, Elaheh Akbari, Rocio Diaz Martin, Navid NaderiAlizadeh, Soheil Kolouri

TL;DR

This work develops a transferable, scalable framework for optimal transport via min-Sliced Transport Plans (min-STP). It introduces LapSum-based differentiable sorting to enable efficient, differentiable 1D transport on slices, and proves that optimal slicers transfer across closely related distribution pairs, enabling amortized solutions. A minibatch formulation with provable accuracy guarantees makes the method practical for large-scale data, while empirical results in point-cloud alignment and flow-based modeling validate transferability and performance gains. The combination of theoretical transferability, stable smoothing, and scalable training offers a promising path to amortized OT in dynamic, multi-domain settings.

Abstract

Optimal Transport (OT) offers a powerful framework for finding correspondences between distributions and addressing matching and alignment problems in various areas of computer vision, including shape analysis, image generation, and multimodal tasks. The computation cost of OT, however, hinders its scalability. Slice-based transport plans have recently shown promise for reducing the computational cost by leveraging the closed-form solutions of 1D OT problems. These methods optimize a one-dimensional projection (slice) to obtain a conditional transport plan that minimizes the transport cost in the ambient space. While efficient, these methods leave open the question of whether learned optimal slicers can transfer to new distribution pairs under distributional shift. Understanding this transferability is crucial in settings with evolving data or repeated OT computations across closely related distributions. In this paper, we study the min-Sliced Transport Plan (min-STP) framework and investigate the transferability of optimized slicers: can a slicer trained on one distribution pair yield effective transport plans for new, unseen pairs? Theoretically, we show that optimized slicers remain close under slight perturbations of the data distributions, enabling efficient transfer across related tasks. To further improve scalability, we introduce a minibatch formulation of min-STP and provide statistical guarantees on its accuracy. Empirically, we demonstrate that the transferable min-STP achieves strong one-shot matching performance and facilitates amortized training for point cloud alignment and flow-based generative modeling.

Efficient Transferable Optimal Transport via Min-Sliced Transport Plans

TL;DR

This work develops a transferable, scalable framework for optimal transport via min-Sliced Transport Plans (min-STP). It introduces LapSum-based differentiable sorting to enable efficient, differentiable 1D transport on slices, and proves that optimal slicers transfer across closely related distribution pairs, enabling amortized solutions. A minibatch formulation with provable accuracy guarantees makes the method practical for large-scale data, while empirical results in point-cloud alignment and flow-based modeling validate transferability and performance gains. The combination of theoretical transferability, stable smoothing, and scalable training offers a promising path to amortized OT in dynamic, multi-domain settings.

Abstract

Optimal Transport (OT) offers a powerful framework for finding correspondences between distributions and addressing matching and alignment problems in various areas of computer vision, including shape analysis, image generation, and multimodal tasks. The computation cost of OT, however, hinders its scalability. Slice-based transport plans have recently shown promise for reducing the computational cost by leveraging the closed-form solutions of 1D OT problems. These methods optimize a one-dimensional projection (slice) to obtain a conditional transport plan that minimizes the transport cost in the ambient space. While efficient, these methods leave open the question of whether learned optimal slicers can transfer to new distribution pairs under distributional shift. Understanding this transferability is crucial in settings with evolving data or repeated OT computations across closely related distributions. In this paper, we study the min-Sliced Transport Plan (min-STP) framework and investigate the transferability of optimized slicers: can a slicer trained on one distribution pair yield effective transport plans for new, unseen pairs? Theoretically, we show that optimized slicers remain close under slight perturbations of the data distributions, enabling efficient transfer across related tasks. To further improve scalability, we introduce a minibatch formulation of min-STP and provide statistical guarantees on its accuracy. Empirically, we demonstrate that the transferable min-STP achieves strong one-shot matching performance and facilitates amortized training for point cloud alignment and flow-based generative modeling.

Paper Structure

This paper contains 26 sections, 21 theorems, 165 equations, 12 figures, 4 tables, 1 algorithm.

Key Result

Proposition 2.4

(chapel2025differentiable) Let $\mu,\nu \in \mathcal{P}_p(\mathcal{X})$. Let $f:\mathcal{X}\to\mathbb{R}$ be an injective map on the supports of $\mu$, and $\nu$. Then, for $p\geq 1$, $\mathrm{STP}_p(\cdot,\cdot~;f)$ is a distance on $\mathcal{P}_p(\mathcal{X})$.

Figures (12)

  • Figure 1: Overview of the Sliced Transport Plan (STP) framework and our transferability results. (a–c) The STP framework computes transport plans $\gamma_f$ using a generalized slicer $f : \mathbb{R}^d \to \mathbb{R}$, which projects high-dimensional samples onto one-dimensional marginals, sorts them via (soft) permutation matrices, and generates slice-wise transport plans that are lifted back to the ambient space to obtain a transportation plan between the input measures. The min-STP framework extends STP by learning an optimal slicer $f^{*}$ that minimizes the transportation cost in the ambient space. (d) In this paper, we establish a transferability theorem showing that if two source-target distribution pairs $(\mu_1, \nu_1)$ and $(\mu_2, \nu_2)$ are close, then there exists an optimal slicer $f_2^{*}$ for $(\mu_2,\nu_2)$ in an $\varepsilon$-vicinity of $f_1^{*}$. This result enables efficient slicer reuse across related tasks, achieving amortized optimal transport.
  • Figure 2: Transport costs with respect to slicing directions $\theta\in\mathbb{S}^1$ for distribution pair $\mu=\frac{1}{2}\delta_{[0, 0]}+\frac{1}{2}\delta_{[1, 1]}$ and $\nu=\frac{1}{2}\delta_{[0, 1]}+\frac{1}{2}\delta_{[1, 0]}$, using DGSWP chapel2025differentiable (50 perturbed samples) and LapSumstruski2025lapsum. LapSum yields smooth objectives across scales $\alpha$.
  • Figure 3: Training with two-branch symmetric gradients through differentiable sorting. The slicer $f$ projects $X$ and $Y$ to one-dimensional samples that are (soft) sorted to obtain $\tilde{P}_X$ and $\tilde{P}_Y$ alongside hard permutations $P_X,P_Y$. Two plans are constructed, $\gamma_{1}=\tilde{P}_X^{\!\top}T_{N,M}P_Y$ and $\gamma_{2}=P_X^{\top}T_{N,M}\tilde{P}_Y$, and the cost with their average is optimized $\tilde{\gamma}=\tfrac12(\gamma_1+\gamma_2)$ via $\sum_{i,j} c_{ij}\,[\tilde{\gamma}]_{ij}$.
  • Figure 4: Transport plans and costs under different training schemes, along with the optimal plan/cost. Each panel visualizes pointwise correspondences (gray segments) between two ring distributions, source $\mu$ and target $\nu$, with $N=M=1024$ points in each distribution. left: exact optimal transport (OT) plan. middle: $\mathrm{min}\text{-}\mathrm{STP}$ trained with the full batch (all 1024 samples). right: mini-batch $\mathrm{min}\text{-}\mathrm{STP}$ with batch sizes $B=64$.
  • Figure 5: (Top row) The generated tasks $\{(\mu_t,\nu_t)\}_{t=1}^{7}$ along with OT plans and costs. (Middle row) $\mathrm{min}\text{-}\mathrm{STP}$ plans and costs optimized using the slicer with set transformer lee2019set architecture and pretrained weights. Pretraining refers to using the optimal slicer $f_{t-1}^\star$ from the previous task $\mu_{t-1}, \nu_{t-1}$ for $t\ge2$. (Bottom row) Initial costs (averaging over 5 runs) of the slicer network against the OT lower bound, for both full batch training (left) and mini-batch training (right). The pretrained slicer begins near the OT cost, whereas the random slicer begins higher, confirming the transferability of the slicer. All values in the bar plots are measured before any optimization on task $t$. The cost for complete training is logged in Figure \ref{['fig:complete_train']} in the Supplementary Material.
  • ...and 7 more figures

Theorems & Definitions (47)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Proposition 2.4
  • Definition 2.5
  • Definition 3.1: Perturbed slicers
  • Definition 3.3: Expected lifted plan
  • Theorem 3.4
  • Proposition 3.5
  • proof : Proof sketch
  • ...and 37 more