Rate-Optimal Noise Annealing in Semi-Dual Neural Optimal Transport: Tangential Identifiability, Off-Manifold Ambiguity, and Guaranteed Recovery
Raymond Chu, Jaewoong Choi, Dohyun Kwon
TL;DR
The paper investigates spurious solutions in Semi-dual Neural OT when data lie on a low-dimensional manifold, showing that the recovery constraint under (semi-dual) optimization under-identifies off-manifold directions while preserving a tangential on-manifold signal. It proposes additive-noise smoothing to regularize the source distribution, proving map-level recovery guarantees as the noise vanishes and deriving a computable terminal noise level $\varepsilon_{stat}(N)$ that achieves rate-optimal statistics governed by the intrinsic dimension $m$. The authors quantify a bias–stability–sample tradeoff, establish finite-sample rates for plan learning, and reveal ill-conditioning as $\varepsilon\to 0$, motivating stopping at $\varepsilon_{stat}(N)$. They validate the theory with experiments demonstrating improved identifiability of the normal transport component, corroborating rate-optimal annealing and the practical value of smoothing in neural OT. Overall, the work provides geometric, probabilistic, and empirical support for rate-optimal noise annealing as a robust strategy for recovery in neural OT under manifold-structured data.
Abstract
Semi-dual neural optimal transport learns a transport map via a max-min objective, yet training can converge to incorrect or degenerate maps. We fully characterize these spurious solutions in the common regime where data concentrate on low-dimensional manifold: the objective is underconstrained off the data manifold, while the on-manifold transport signal remains identifiable. Following Choi, Choi, and Kwon (2025), we study additive-noise smoothing as a remedy and prove new map recovery guarantees as the noise vanishes. Our main practical contribution is a computable terminal noise level $\varepsilon_{\mathrm{stat}}(N)$ that attains the optimal statistical rate, with scaling governed by the intrinsic dimension $m$ of the data. The formula arises from a theoretical unified analysis of (i) quantitative stability of optimal plans, (ii) smoothing-induced bias, and (iii) finite-sample error, yielding rates that depend on $m$ rather than the ambient dimension. Finally, we show that the reduced semi-dual objective becomes increasingly ill-conditioned as $\varepsilon \downarrow 0$. This provides a principled stopping rule: annealing below $\varepsilon_{\mathrm{stat}}(N)$ can $\textit{worsen}$ optimization conditioning without improving statistical accuracy.
