Table of Contents
Fetching ...

A New Perspective On Denoising Based On Optimal Transport

Nicolas Garcia Trillos, Bodhisattva Sen

TL;DR

The paper develops an optimal-transport perspective on denoising latent-variable models where $Z|\Theta$ follows a known likelihood and $\Theta\sim G^*$. It proves the existence and uniqueness of an OT-based denoiser $\delta^*$, related to a Monge map by $\delta^*(z)=\nabla\varphi^*(\overline{\theta}(z))$, ensuring $\delta^*(Z)\sim G^*$; it also introduces a soft-penalty version $\delta^*_{\tau}$ that linearly interpolates between the Bayes estimator $\overline{\theta}(Z)$ and $\delta^*(Z)$. A complementary observable-space penalization (FModel) is analyzed via a Kantorovich relaxation, proving existence of solutions and showing that under identifiability $\delta^*$ can be recovered as $\tau\to 0$; a nontrivial link to multimarginal OT motivates potential numerical methods. The framework connects to Tweedie’s formula in exponential-family settings to estimate the posterior mean from marginal data, highlighting practical routes for finite-sample construction of $\delta^*$ without explicit $G^*$. Overall, the work advances theoretical foundations for OT-based denoising and points to tractable, OT-inspired algorithms for high-dimensional latent-variable inference.

Abstract

In the standard formulation of the denoising problem, one is given a probabilistic model relating a latent variable $Θ\in Ω\subset \mathbb{R}^m \; (m\ge 1)$ and an observation $Z \in \mathbb{R}^d$ according to: $Z \mid Θ\sim p(\cdot\mid Θ)$ and $Θ\sim G^*$, and the goal is to construct a map to recover the latent variable from the observation. The posterior mean, a natural candidate for estimating $Θ$ from $Z$, attains the minimum Bayes risk (under the squared error loss) but at the expense of over-shrinking the $Z$, and in general may fail to capture the geometric features of the prior distribution $G^*$ (e.g., low dimensionality, discreteness, sparsity, etc.). To rectify these drawbacks, we take a new perspective on this denoising problem that is inspired by optimal transport (OT) theory and use it to study a different, OT-based, denoiser at the population level setting. We rigorously prove that, under general assumptions on the model, this OT-based denoiser is mathematically well-defined and unique, and is closely connected to the solution to a Monge OT problem. We then prove that, under appropriate identifiability assumptions on the model, the OT-based denoiser can be recovered solely from information of the marginal distribution of $Z$ and the posterior mean of the model, after solving a linear relaxation problem over a suitable space of couplings that is reminiscent of standard multimarginal OT problems. In particular, thanks to Tweedie's formula, when the likelihood model $\{ p(\cdot \mid θ) \}_{θ\in Ω}$ is an exponential family of distributions, the OT based-denoiser can be recovered solely from the marginal distribution of $Z$. In general, our family of OT-like relaxations is of interest in its own right and for the denoising problem suggests alternative numerical methods inspired by the rich literature on computational OT.

A New Perspective On Denoising Based On Optimal Transport

TL;DR

The paper develops an optimal-transport perspective on denoising latent-variable models where follows a known likelihood and . It proves the existence and uniqueness of an OT-based denoiser , related to a Monge map by , ensuring ; it also introduces a soft-penalty version that linearly interpolates between the Bayes estimator and . A complementary observable-space penalization (FModel) is analyzed via a Kantorovich relaxation, proving existence of solutions and showing that under identifiability can be recovered as ; a nontrivial link to multimarginal OT motivates potential numerical methods. The framework connects to Tweedie’s formula in exponential-family settings to estimate the posterior mean from marginal data, highlighting practical routes for finite-sample construction of without explicit . Overall, the work advances theoretical foundations for OT-based denoising and points to tractable, OT-inspired algorithms for high-dimensional latent-variable inference.

Abstract

In the standard formulation of the denoising problem, one is given a probabilistic model relating a latent variable and an observation according to: and , and the goal is to construct a map to recover the latent variable from the observation. The posterior mean, a natural candidate for estimating from , attains the minimum Bayes risk (under the squared error loss) but at the expense of over-shrinking the , and in general may fail to capture the geometric features of the prior distribution (e.g., low dimensionality, discreteness, sparsity, etc.). To rectify these drawbacks, we take a new perspective on this denoising problem that is inspired by optimal transport (OT) theory and use it to study a different, OT-based, denoiser at the population level setting. We rigorously prove that, under general assumptions on the model, this OT-based denoiser is mathematically well-defined and unique, and is closely connected to the solution to a Monge OT problem. We then prove that, under appropriate identifiability assumptions on the model, the OT-based denoiser can be recovered solely from information of the marginal distribution of and the posterior mean of the model, after solving a linear relaxation problem over a suitable space of couplings that is reminiscent of standard multimarginal OT problems. In particular, thanks to Tweedie's formula, when the likelihood model is an exponential family of distributions, the OT based-denoiser can be recovered solely from the marginal distribution of . In general, our family of OT-like relaxations is of interest in its own right and for the denoising problem suggests alternative numerical methods inspired by the rich literature on computational OT.
Paper Structure (22 sections, 17 theorems, 148 equations, 2 figures)

This paper contains 22 sections, 17 theorems, 148 equations, 2 figures.

Key Result

Theorem 2.1

Let $\nu$ and $\widetilde{\nu}$ be two Borel probability measures over $\mathbb{R}^p$ such that $\int |x|^2 \, d\nu(x) < \infty$ and $\int |y|^2 \, d{\widetilde{\nu}}(y) < \infty$. Suppose further that $\nu$ has a Lebesgue density. Then there exists a convex function $\psi: \mathbb{R}^p \to \mathbb{ and the coupling $(\mathrm{Id} \times T)_{\sharp}\nu$ uniquely minimizes def:2Wass. In the above an

Figures (2)

  • Figure 1: Toy example with $n = 60$ in $d=1$ where $p(\cdot \mid \theta)$ is the density of $N(\theta, 1)$ and $G^* = N(0, \tau^2)$. Left: Observations $Z_1,\ldots, Z_n$ (in blue) obtained from model \ref{['eq:Mix-Mdl']} with $\tau^2 = 1$ are connected to their true unobserved latent variables $\{\Theta_i\}_{i=1}^n$ (in red); the Bayes estimator $\overline{\theta}(Z_i)$ (in black) is connected to $\Theta_i$ (in red) and the corresponding OT-based denoiser $\delta^*(Z_i)$ (in orange). Right: Plot of the risk curves of the three estimators of $\Theta$ --- $Z$ (in blue), $\overline{\theta}(Z)$ (in black) and $\delta^*(Z)$ (in orange) --- as $\tau^2$ varies from 0 to 10.
  • Figure 2: Toy example with $n = 60$ in $d=2$ where $p(\cdot \mid \theta)$ is the density of $N(\theta, (0.3)^2\cdot I_2)$ and $G^*$ is the uniform distribution on the unit circle. Left: Observations $Z_1,\ldots, Z_n$ (in blue) obtained from model \ref{['eq:Mix-Mdl']} are connected to the corresponding unobserved latent variables $\{\Theta_i\}_{i=1}^n$ (in red). Center: The Bayes estimator $\overline{\theta}(Z_i)$ (in black) is connected to $\Theta_i$ (in red), for every $i=1,\ldots, n$. Right: The Bayes estimator $\overline{\theta}(Z_i)$ (in black) is connected to its corresponding OT-based denoiser $\delta^*(Z_i)$ (in orange) lying on the circle.

Theorems & Definitions (59)

  • Example 1.1: Normal location mixture
  • Example 1.2: Normal scale mixture
  • Example 1.3: Uniform scale mixture
  • Example 1.4: Bayes estimator under squared error loss
  • Definition 2.1: Pushforward of a measure
  • Remark 2.1
  • Definition 2.2: $2$-Wasserstein distance
  • Theorem 2.1: Brenier
  • Remark 2.2: On our assumptions
  • Theorem 2.4
  • ...and 49 more