Rate-Optimal Noise Annealing in Semi-Dual Neural Optimal Transport: Tangential Identifiability, Off-Manifold Ambiguity, and Guaranteed Recovery

Raymond Chu; Jaewoong Choi; Dohyun Kwon

Rate-Optimal Noise Annealing in Semi-Dual Neural Optimal Transport: Tangential Identifiability, Off-Manifold Ambiguity, and Guaranteed Recovery

Raymond Chu, Jaewoong Choi, Dohyun Kwon

TL;DR

The paper investigates spurious solutions in Semi-dual Neural OT when data lie on a low-dimensional manifold, showing that the recovery constraint under (semi-dual) optimization under-identifies off-manifold directions while preserving a tangential on-manifold signal. It proposes additive-noise smoothing to regularize the source distribution, proving map-level recovery guarantees as the noise vanishes and deriving a computable terminal noise level $\varepsilon_{stat}(N)$ that achieves rate-optimal statistics governed by the intrinsic dimension $m$. The authors quantify a bias–stability–sample tradeoff, establish finite-sample rates for plan learning, and reveal ill-conditioning as $\varepsilon\to 0$, motivating stopping at $\varepsilon_{stat}(N)$. They validate the theory with experiments demonstrating improved identifiability of the normal transport component, corroborating rate-optimal annealing and the practical value of smoothing in neural OT. Overall, the work provides geometric, probabilistic, and empirical support for rate-optimal noise annealing as a robust strategy for recovery in neural OT under manifold-structured data.

Abstract

Semi-dual neural optimal transport learns a transport map via a max-min objective, yet training can converge to incorrect or degenerate maps. We fully characterize these spurious solutions in the common regime where data concentrate on low-dimensional manifold: the objective is underconstrained off the data manifold, while the on-manifold transport signal remains identifiable. Following Choi, Choi, and Kwon (2025), we study additive-noise smoothing as a remedy and prove new map recovery guarantees as the noise vanishes. Our main practical contribution is a computable terminal noise level $\varepsilon_{\mathrm{stat}}(N)$ that attains the optimal statistical rate, with scaling governed by the intrinsic dimension $m$ of the data. The formula arises from a theoretical unified analysis of (i) quantitative stability of optimal plans, (ii) smoothing-induced bias, and (iii) finite-sample error, yielding rates that depend on $m$ rather than the ambient dimension. Finally, we show that the reduced semi-dual objective becomes increasingly ill-conditioned as $\varepsilon \downarrow 0$. This provides a principled stopping rule: annealing below $\varepsilon_{\mathrm{stat}}(N)$ can $\textit{worsen}$ optimization conditioning without improving statistical accuracy.

Rate-Optimal Noise Annealing in Semi-Dual Neural Optimal Transport: Tangential Identifiability, Off-Manifold Ambiguity, and Guaranteed Recovery

TL;DR

that achieves rate-optimal statistics governed by the intrinsic dimension

. The authors quantify a bias–stability–sample tradeoff, establish finite-sample rates for plan learning, and reveal ill-conditioning as

, motivating stopping at

. They validate the theory with experiments demonstrating improved identifiability of the normal transport component, corroborating rate-optimal annealing and the practical value of smoothing in neural OT. Overall, the work provides geometric, probabilistic, and empirical support for rate-optimal noise annealing as a robust strategy for recovery in neural OT under manifold-structured data.

Abstract

that attains the optimal statistical rate, with scaling governed by the intrinsic dimension

of the data. The formula arises from a theoretical unified analysis of (i) quantitative stability of optimal plans, (ii) smoothing-induced bias, and (iii) finite-sample error, yielding rates that depend on

rather than the ambient dimension. Finally, we show that the reduced semi-dual objective becomes increasingly ill-conditioned as

. This provides a principled stopping rule: annealing below

can

optimization conditioning without improving statistical accuracy.

Paper Structure (53 sections, 13 theorems, 126 equations, 5 figures, 6 tables)

This paper contains 53 sections, 13 theorems, 126 equations, 5 figures, 6 tables.

Introduction
Why spurious transport maps arise
Why smoothing works, and what it guarantees
Why small noise can be hard to optimize
The bias--stability--sample tradeoff
Preliminaries
Semi-dual Neural OT
Spurious solutions in SNOT
Geometric Origins of Spurious Solutions in Neural OT
Spurious solutions in the minimax formulation.
Regularization Removes Spurious Solutions and Stabilizes Learning
Off-manifold regularization
Map consistency under regularization
Bias-Stability Tradeoff for Noise Scheduling
Plan stability and quantitative rates
...and 38 more sections

Key Result

Theorem 3.1

For any $V\in C(\mathcal{Y})$ and cost $c\in C(\mathcal{X}\times \mathcal{Y})$, then there exists a $T_V$ that satisfies eq:opt_TV_def. Furthermore, one has that $T_V$ satisfies eq:opt_TV_def if and only if for $\mu$-a.e. $x$

Figures (5)

Figure 1: Estimated slope in $\mathbb{E}[W_2(\mu, \mu_N^\varepsilon)] \approx C(\varepsilon) N^{\rho(\varepsilon)}$. We set $\mu = \mathrm{Unif}([-1,1]^3) \otimes (\delta_0)^7 \subset \mathbb{R}^{10}$ ($d=10, m=3$). For $\varepsilon \leq \varepsilon_{\mathrm{stat}} \approx 10^{-2}$, the slope matches the theoretical optimal rate $\rho = -1/3$, confirming that OT plan learning avoids the curse of dimensionality under manifold structure. For $\varepsilon > \varepsilon_{\mathrm{stat}}$, the rate degrades due to finite-sample estimation error. Minor deviations from $-1/3$ are attributed to discretizing $\mu$ in the discrete OT solver.
Figure 2: Comparison of the transport map recovery on the Perpendicular Case. All SNOT solvers are accurate in the tangential component ($x$-axis). However, only the smoothing-based methods (OTP and Ours) successfully recover the normal component ($y$-axis), with our principled approach achieving the highest accuracy.
Figure 3: Neural map error vs. $\varepsilon$. Performance degrades for $\varepsilon \gg \varepsilon_{\text{stat}}$ (smoothing bias). Error remains stable for $\varepsilon \ll \varepsilon_{\text{stat}}$, confirming that reducing noise below this level does not improve the statistical rate. Error bars denote sample standard deviations obtained through 5 trials.
Figure 4: Effect of $\varepsilon$ on convergence and variance. Decreasing $\varepsilon$ induces slower convergence (Left) and higher variance across 10 runs (Right). These results empirically validate that small noise levels degrade optimization conditioning, justifying our terminal noise floor $\varepsilon_{\text{stat}}(N)$. Both panels use log-scaled $y$-axes.
Figure 5: Learned map gradient norm vs. $\varepsilon$. As noise vanishes, $\|\nabla T_{\theta,\varepsilon}\|_{L^\infty}$ diverges, reflecting the singularity of the exact OT map. This gradient blow-up shows the ill-conditioning observed in training.

Theorems & Definitions (32)

Theorem 3.1: Full characterization of the recovery step
Theorem 3.2: Tangential recovery
Example 3.3: Non-identifiability in semi-dual recovery
Theorem 4.1: Map-level stability under regularization
Remark 4.2
Proposition 4.3: Tangential recovery on manifolds as noise vanishes
Theorem 5.1: Finite-sample stability with smoothed empirical sources
Remark 5.2
Theorem 5.3: Curvature amplification in the reduced objective
proof : Proof of Theorem \ref{['thm:recovery_characterization']}
...and 22 more

Rate-Optimal Noise Annealing in Semi-Dual Neural Optimal Transport: Tangential Identifiability, Off-Manifold Ambiguity, and Guaranteed Recovery

TL;DR

Abstract

Rate-Optimal Noise Annealing in Semi-Dual Neural Optimal Transport: Tangential Identifiability, Off-Manifold Ambiguity, and Guaranteed Recovery

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (32)