Table of Contents
Fetching ...

Structured Matching via Cost-Regularized Unbalanced Optimal Transport

Emanuele Pardini, Katerina Papagiannouli

TL;DR

The paper tackles aligning measures across heterogeneous spaces when total mass differs and the ground cost is unknown. It introduces cost-regularized unbalanced OT (CR-UOT), a framework that jointly optimizes a transport plan and a convex cost regularizer, and establishes existence and convergence results; it further specializes to inner-product costs (GW-IP) and proves that optimal couplings can be induced by Monge maps under mild conditions. The authors develop entropic-regularized algorithms, including a block-coordinate descent method, and derive entropic maps that converge to deterministic Monge maps in appropriate regimes. Empirically, CR-UOT improves cross-modality alignment in single-cell multiomics (scGEM and SNAREseq), particularly when data are unbalanced or lack direct one-to-one correspondences, demonstrating practical impact for heterogeneous biological data integration.

Abstract

Unbalanced optimal transport (UOT) provides a flexible way to match or compare nonnegative finite Radon measures. However, UOT requires a predefined ground transport cost, which may misrepresent the data's underlying geometry. Choosing such a cost is particularly challenging when datasets live in heterogeneous spaces, often motivating practitioners to adopt Gromov-Wasserstein formulations. To address this challenge, we introduce cost-regularized unbalanced optimal transport (CR-UOT), a framework that allows the ground cost to vary while allowing mass creation and removal. We show that CR-UOT incorporates unbalanced Gromov-Wasserstein type problems through families of inner-product costs parameterized by linear transformations, enabling the matching of measures or point clouds across Euclidean spaces. We develop algorithms for such CR-UOT problems using entropic regularization and demonstrate that this approach improves the alignment of heterogeneous single-cell omics profiles, especially when many cells lack direct matches.

Structured Matching via Cost-Regularized Unbalanced Optimal Transport

TL;DR

The paper tackles aligning measures across heterogeneous spaces when total mass differs and the ground cost is unknown. It introduces cost-regularized unbalanced OT (CR-UOT), a framework that jointly optimizes a transport plan and a convex cost regularizer, and establishes existence and convergence results; it further specializes to inner-product costs (GW-IP) and proves that optimal couplings can be induced by Monge maps under mild conditions. The authors develop entropic-regularized algorithms, including a block-coordinate descent method, and derive entropic maps that converge to deterministic Monge maps in appropriate regimes. Empirically, CR-UOT improves cross-modality alignment in single-cell multiomics (scGEM and SNAREseq), particularly when data are unbalanced or lack direct one-to-one correspondences, demonstrating practical impact for heterogeneous biological data integration.

Abstract

Unbalanced optimal transport (UOT) provides a flexible way to match or compare nonnegative finite Radon measures. However, UOT requires a predefined ground transport cost, which may misrepresent the data's underlying geometry. Choosing such a cost is particularly challenging when datasets live in heterogeneous spaces, often motivating practitioners to adopt Gromov-Wasserstein formulations. To address this challenge, we introduce cost-regularized unbalanced optimal transport (CR-UOT), a framework that allows the ground cost to vary while allowing mass creation and removal. We show that CR-UOT incorporates unbalanced Gromov-Wasserstein type problems through families of inner-product costs parameterized by linear transformations, enabling the matching of measures or point clouds across Euclidean spaces. We develop algorithms for such CR-UOT problems using entropic regularization and demonstrate that this approach improves the alignment of heterogeneous single-cell omics profiles, especially when many cells lack direct matches.

Paper Structure

This paper contains 33 sections, 23 theorems, 107 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Theorem 3.5

Let $(\varphi_1, \varphi_2)$ be a pair of superlinear entropy functions satisfying comp_cond and $\varepsilon \geq 0$. Assume a cost-parametrized regularizer $\mathcal{R}$ as defined in Definition cost-param-reg with $\{c_{\theta}\}_{\theta\in \mathcal{F}}$ a uniformly bounded from below family of c

Figures (6)

  • Figure 1: Visualization of entropic map alignments for subsampled and full scGM datasets.
  • Figure 2: Plots of the LTA of the alignments for the full scGEM dataset obtained using the entropic map associated to the couple $(M,P)$ at each iteration of Algorithm \ref{['eq_BCD']}, and the corresponding LTA accuracies for subsampled and full scGEM.
  • Figure 3: The source data (blue) is generated sampling from a balanced mixture of uniform distributions on two ellipsoids in 3D, while the target data (green) is obtained by sampling from an unbalanced mixture of the uniform distribution on a square $\mathcal{S}$ and the uniform distribution on an ellipse $\mathcal{E}$ in 2D, precisely the latter mixture is $\beta = 0.85 \mathcal{E} + 0.15\mathcal{S}$. For visualization purposes we lift $\mathbb{R}^2$ into $\mathbb{R}^3$ by padding the third coordinate to zero. We visualize the aligned source point using red dots.
  • Figure 4: Visualization of the entropic map alignment of the full SNAREseq dataset with $\lambda=5.0$ using two-dimensional PCA. Different colours refer to different cell types.
  • Figure 5: Visualization of the entropic map alignment of the subsampled SNAREseq dataset with $\lambda=0.07$ using two-dimensional PCA. Different colours refer to different cell types.
  • ...and 1 more figures

Theorems & Definitions (32)

  • Definition 2.1: $\varphi$-divergences
  • Definition 3.4: Cost-Parametrized Regularizers
  • Theorem 3.5: Existence
  • Theorem 3.6
  • Theorem 3.6
  • Proposition 3.9
  • Remark 3.11
  • Theorem 3.12
  • Definition 4.1
  • Theorem 4.2
  • ...and 22 more