A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport

Tianyi Lin; Marco Cuturi; Michael I. Jordan

A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport

Tianyi Lin, Marco Cuturi, Michael I. Jordan

TL;DR

The paper tackles the computational bottleneck of kernel-based optimal transport (KOT) estimators in high dimensions by formulating a nonsmooth fixed-point model and solving it with a specialized semismooth Newton (SSN) method. It proves a global convergence rate of $O(1/\sqrt{k})$ and a local quadratic rate under standard regularity, while achieving substantial per-iteration savings by exploiting problem structure to reduce large linear systems to smaller ones. Empirical results show significant speedups over short-step interior-point methods (SSIPM) on synthetic data and real single-cell datasets, reinforcing the practical viability of KOT estimators with maintained statistical guarantees. The approach preserves the statistical efficiency of kernel-based OT while enabling scalable computation, enabling reliable OT map estimation in high-dimensional settings.

Abstract

Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples. Recent works suggest that these estimators are more statistically efficient than plug-in (linear programming-based) OT estimators when comparing probability measures in high-dimensions~\citep{Vacher-2021-Dimension}. Unfortunately, that statistical benefit comes at a very steep computational price: because their computation relies on the short-step interior-point method (SSIPM), which comes with a large iteration count in practice, these estimators quickly become intractable w.r.t. sample size $n$. To scale these estimators to larger $n$, we propose a nonsmooth fixed-point model for the kernel-based OT problem, and show that it can be efficiently solved via a specialized semismooth Newton (SSN) method: We show, exploring the problem's structure, that the per-iteration cost of performing one SSN step can be significantly reduced in practice. We prove that our SSN method achieves a global convergence rate of $O(1/\sqrt{k})$, and a local quadratic convergence rate under standard regularity conditions. We show substantial speedups over SSIPM on both synthetic and real datasets.

A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport

TL;DR

and a local quadratic rate under standard regularity, while achieving substantial per-iteration savings by exploiting problem structure to reduce large linear systems to smaller ones. Empirical results show significant speedups over short-step interior-point methods (SSIPM) on synthetic data and real single-cell datasets, reinforcing the practical viability of KOT estimators with maintained statistical guarantees. The approach preserves the statistical efficiency of kernel-based OT while enabling scalable computation, enabling reliable OT map estimation in high-dimensional settings.

Abstract

. To scale these estimators to larger

, we propose a nonsmooth fixed-point model for the kernel-based OT problem, and show that it can be efficiently solved via a specialized semismooth Newton (SSN) method: We show, exploring the problem's structure, that the per-iteration cost of performing one SSN step can be significantly reduced in practice. We prove that our SSN method achieves a global convergence rate of

, and a local quadratic convergence rate under standard regularity conditions. We show substantial speedups over SSIPM on both synthetic and real datasets.

Paper Structure (27 sections, 5 theorems, 59 equations, 6 figures, 2 algorithms)

This paper contains 27 sections, 5 theorems, 59 equations, 6 figures, 2 algorithms.

Introduction
Curse of Dimensionality.
Regularization.
Leveraging Smoothness.
Scaling up Kernel-based OT.
Contributions.
Organization.
Further Related Works
Background: Kernel-Based OT
Method and Analysis
A nonsmooth equation model for kernel-based OT
Regularized SSN method.
Properties of the nonsmooth map $R$
Generalized Jacobian.
Newton updates
...and 12 more sections

Key Result

Proposition 4.1

A point $\hat{\gamma} \in \mathbb{R}^n$ is an optimal solution of Eq. prob:main if and only if $\hat{w} = (\hat{\gamma}, \hat{X})$ satisfies $R(\hat{w}) = 0$ for some $\hat{X} \in \mathcal{S}_+^n$.

Figures (6)

Figure 1: Visualization of the OT map with $n_{\textnormal{sample}} = n \in \{50, 100 ,200\}$.
Figure 2: Visualization of the constraint: (left, middle)$n_{\textnormal{sample}} = n \in \{50, 100\}$, (right) ground truth.
Figure 3: Comparisons of mean computation time of IPM vs. our algorithm (SSN) on CPU time.
Figure 4: Performance of entropic map (using OTT) vs. kernel-based OT estimators computed with the SSN algorithm on 6 drug perturbation datasets. $X$-axis represent the number of training samples and $Y$-axis represents the error induced by OT map $T$ on test samples in terms of OT distance.
Figure 5: Performance of pure EG and our algorithm for solving kernel-based OT problems with the varying sample size $n \in \{50, 100, 200, 500, 1000, 2000\}$. The numerical results are presented as residue norm v.s. time (seconds).
...and 1 more figures

Theorems & Definitions (13)

Remark 3.2
Remark 3.3
Proposition 4.1
Definition 4.1
Definition 4.2
Proposition 4.2
Lemma 4.3
Remark 4.4
Remark 4.5
Remark 4.6
...and 3 more

A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport

TL;DR

Abstract

A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (13)