Distributional Shrinkage II: Optimal Transport Denoisers with Higher-Order Scores
Tengyuan Liang
TL;DR
This work addresses distribution-level denoising for measurements $Y=X+\sigma Z$ by introducing a hierarchy of agnostic optimal-transport denoisers that push the observed distribution $Q$ onto the unknown signal distribution $P$. The core idea is to express the optimal transport map as an infinite expansion in the noise parameter $\eta=\sigma^2/2$, with each term governed by higher-order score functions and organized via Bell polynomials, yielding computable finite-K denoisers $T_K$ whose accuracy grows as $\eta^{K+1}$. Two practical estimation strategies are developed to identify the necessary higher-order scores from data: (i) plug-in Gaussian-kernel smoothing to estimate density derivatives of $Q$, and (ii) direct higher-order score matching that estimates the score functions themselves. The paper provides rigorous convergence rates for both the $F$- and $G$-expansions, clarifies the combinatorial structure linking score functions to OT maps, and situates the results within the broader empirical Bayes and diffusion-model literature. Overall, the framework enables distributional denoising without requiring knowledge of the prior $P$, offering a principled, theoretically-grounded approach to agnostic denoisers via optimal transport.
Abstract
We revisit the signal denoising problem through the lens of optimal transport: the goal is to recover an unknown scalar signal distribution $X \sim P$ from noisy observations $Y = X + σZ$, with $Z$ being standard Gaussian independent of $X$ and $σ>0$ a known noise level. Let $Q$ denote the distribution of $Y$. We introduce a hierarchy of denoisers $T_0, T_1, \ldots, T_\infty : \mathbb{R} \to \mathbb{R}$ that are agnostic to the signal distribution $P$, depending only on higher-order score functions of $Q$. Each denoiser $T_K$ is progressively refined using the $(2K-1)$-th order score function of $Q$ at noise resolution $σ^{2K}$, achieving better denoising quality measured by the Wasserstein metric $W(T_K \sharp Q, P)$. The limiting denoiser $T_\infty$ identifies the optimal transport map with $T_\infty \sharp Q = P$. We provide a complete characterization of the combinatorial structure underlying this hierarchy through Bell polynomial recursions, revealing how higher-order score functions encode the optimal transport map for signal denoising. We study two estimation strategies with convergence rates for higher-order scores from i.i.d. samples drawn from $Q$: (i) plug-in estimation via Gaussian kernel smoothing, and (ii) direct estimation via higher-order score matching. This hierarchy of agnostic denoisers opens new perspectives in signal denoising and empirical Bayes.
