Table of Contents
Fetching ...

Displacement-Sparse Neural Optimal Transport

Peter Chen, Yue Xie, Qingpeng Zhang

TL;DR

This work addresses the interpretability gap in neural optimal transport by learning displacement-sparse maps within neural OT solvers. It introduces a biased minimax formulation with a general sparsity penalty, enabled by ICNNs, and a novel smoothed $oldsymbol{ au_0}$ regularizer that supports non-proximal penalties and general elastic costs. The authors prove theoretical guarantees as the sparsity strength $oldsymbol{oldlambda}$ vanishes and design an adaptive, simulated-annealing-based control to balance sparsity and feasibility in high-dimensional settings. Empirically, the method improves interpretability and downstream utility on synthetic sc-RNA perturbations and real 4i perturbation data, outperforming both exact OT and $oldsymbol{ au_{L1}}$ baselines in terms of dimensionality control and gene overlap. Overall, the approach yields more interpretable, low-dimensional transport maps suitable for large-scale biological analyses while remaining scalable to high-dimensional data.

Abstract

Optimal transport (OT) aims to find a map $T$ that transports mass from one probability measure to another while minimizing a cost function. Recently, neural OT solvers have gained popularity in high dimensional biological applications such as drug perturbation, due to their superior computational and memory efficiency compared to traditional exact Sinkhorn solvers. However, the overly complex high dimensional maps learned by neural OT solvers often suffer from poor interpretability. Prior work addressed this issue in the context of exact OT solvers by introducing \emph{displacement-sparse maps} via designed elastic cost, but such method failed to be applied to neural OT settings. In this work, we propose an intuitive and theoretically grounded approach to learning \emph{displacement-sparse maps} within neural OT solvers. Building on our new formulation, we introduce a novel smoothed $\ell_0$ regularizer that outperforms the $\ell_1$ based alternative from prior work. Leveraging Input Convex Neural Network's flexibility, we further develop a heuristic framework for adaptively controlling sparsity intensity, an approach uniquely enabled by the neural OT paradigm. We demonstrate the necessity of this adaptive framework in large-scale, high-dimensional training, showing not only improved accuracy but also practical ease of use for downstream applications.

Displacement-Sparse Neural Optimal Transport

TL;DR

This work addresses the interpretability gap in neural optimal transport by learning displacement-sparse maps within neural OT solvers. It introduces a biased minimax formulation with a general sparsity penalty, enabled by ICNNs, and a novel smoothed regularizer that supports non-proximal penalties and general elastic costs. The authors prove theoretical guarantees as the sparsity strength vanishes and design an adaptive, simulated-annealing-based control to balance sparsity and feasibility in high-dimensional settings. Empirically, the method improves interpretability and downstream utility on synthetic sc-RNA perturbations and real 4i perturbation data, outperforming both exact OT and baselines in terms of dimensionality control and gene overlap. Overall, the approach yields more interpretable, low-dimensional transport maps suitable for large-scale biological analyses while remaining scalable to high-dimensional data.

Abstract

Optimal transport (OT) aims to find a map that transports mass from one probability measure to another while minimizing a cost function. Recently, neural OT solvers have gained popularity in high dimensional biological applications such as drug perturbation, due to their superior computational and memory efficiency compared to traditional exact Sinkhorn solvers. However, the overly complex high dimensional maps learned by neural OT solvers often suffer from poor interpretability. Prior work addressed this issue in the context of exact OT solvers by introducing \emph{displacement-sparse maps} via designed elastic cost, but such method failed to be applied to neural OT settings. In this work, we propose an intuitive and theoretically grounded approach to learning \emph{displacement-sparse maps} within neural OT solvers. Building on our new formulation, we introduce a novel smoothed regularizer that outperforms the based alternative from prior work. Leveraging Input Convex Neural Network's flexibility, we further develop a heuristic framework for adaptively controlling sparsity intensity, an approach uniquely enabled by the neural OT paradigm. We demonstrate the necessity of this adaptive framework in large-scale, high-dimensional training, showing not only improved accuracy but also practical ease of use for downstream applications.

Paper Structure

This paper contains 20 sections, 8 theorems, 68 equations, 9 figures, 3 tables, 4 algorithms.

Key Result

Proposition 2.2

Under a3.1, there exists an optimal solution $(f_0, g_0)$. Furthermore, the corresponding optimal transport map $T$ from $Q$ to $P$ can be directly recovered via $\nabla g_0$.

Figures (9)

  • Figure 1: Input Convex Neural Network (ICNN) Structure.
  • Figure 2: Left: Two classic eight-Gaussian examples are presented, where the source measure is located at the center, and eight target measures are generated from Gaussian distributions. These examples illustrate the trade-off between the sparsity of the map and its feasibility under varying values of $a$. Right: A simulation example demonstrates how $\lambda$ changes during simulated annealing training under $a=0.9$, along with the corresponding levels of map sparsity and feasibility.
  • Figure 3: Synthesized sc-RNA perturbation dataset with $n=4000$, $d=3000$, and $k=100$. The average displacement dimensionality is shown.
  • Figure 4: The results of the 4i perturbation are presented. The average of ten runs is shown for each drug. $\lambda^{(i)}$ represents the ordered statistics of each set of $\lambda$ used in the experiments.
  • Figure 5: Three consecutive runs of high-dimensional dynamic adjustment of $\lambda$ under the same seed. Dimensionality constraint is set to $l = 100$ and smoothed $\ell_0$ penalty is used.
  • ...and 4 more figures

Theorems & Definitions (9)

  • Proposition 2.2
  • Lemma 3.1
  • Remark 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Theorem 3.5
  • Theorem 3.6
  • Theorem B.1: von Neumann's Minimax Theorem
  • Lemma B.2