Table of Contents
Fetching ...

Denoising Diffusions with Optimal Transport: Localization, Curvature, and Multi-Scale Complexity

Tengyuan Liang, Kulunu Dharmakeerthi, Takuya Koriyama

TL;DR

The effectiveness of the diffuse-then-denoise process is studied: the contraction of the forward diffusion chain, offset by the possible expansion of the backward denoising chain, governs the denoising difficulty.

Abstract

Adding noise is easy; what about denoising? Diffusion is easy; what about reverting a diffusion? Diffusion-based generative models aim to denoise a Langevin diffusion chain, moving from a log-concave equilibrium measure $ν$, say an isotropic Gaussian, back to a complex, possibly non-log-concave initial measure $μ$. The score function performs denoising, moving backward in time, and predicting the conditional mean of the past location given the current one. We show that score denoising is the optimal backward map in transportation cost. What is its localization uncertainty? We show that the curvature function determines this localization uncertainty, measured as the conditional variance of the past location given the current. We study in this paper the effectiveness of the diffuse-then-denoise process: the contraction of the forward diffusion chain, offset by the possible expansion of the backward denoising chain, governs the denoising difficulty. For any initial measure $μ$, we prove that this offset net contraction at time $t$ is characterized by the curvature complexity of a smoothed $μ$ at a specific signal-to-noise ratio (SNR) scale $r(t)$. We discover that the multi-scale curvature complexity collectively determines the difficulty of the denoising chain. Our multi-scale complexity quantifies a fine-grained notion of average-case curvature instead of the worst-case. Curiously, it depends on an integrated tail function, measuring the relative mass of locations with positive curvature versus those with negative curvature; denoising at a specific SNR scale is easy if such an integrated tail is light. We conclude with several non-log-concave examples to demonstrate how the multi-scale complexity probes the bottleneck SNR for the diffuse-then-denoise process.

Denoising Diffusions with Optimal Transport: Localization, Curvature, and Multi-Scale Complexity

TL;DR

The effectiveness of the diffuse-then-denoise process is studied: the contraction of the forward diffusion chain, offset by the possible expansion of the backward denoising chain, governs the denoising difficulty.

Abstract

Adding noise is easy; what about denoising? Diffusion is easy; what about reverting a diffusion? Diffusion-based generative models aim to denoise a Langevin diffusion chain, moving from a log-concave equilibrium measure , say an isotropic Gaussian, back to a complex, possibly non-log-concave initial measure . The score function performs denoising, moving backward in time, and predicting the conditional mean of the past location given the current one. We show that score denoising is the optimal backward map in transportation cost. What is its localization uncertainty? We show that the curvature function determines this localization uncertainty, measured as the conditional variance of the past location given the current. We study in this paper the effectiveness of the diffuse-then-denoise process: the contraction of the forward diffusion chain, offset by the possible expansion of the backward denoising chain, governs the denoising difficulty. For any initial measure , we prove that this offset net contraction at time is characterized by the curvature complexity of a smoothed at a specific signal-to-noise ratio (SNR) scale . We discover that the multi-scale curvature complexity collectively determines the difficulty of the denoising chain. Our multi-scale complexity quantifies a fine-grained notion of average-case curvature instead of the worst-case. Curiously, it depends on an integrated tail function, measuring the relative mass of locations with positive curvature versus those with negative curvature; denoising at a specific SNR scale is easy if such an integrated tail is light. We conclude with several non-log-concave examples to demonstrate how the multi-scale complexity probes the bottleneck SNR for the diffuse-then-denoise process.

Paper Structure

This paper contains 25 sections, 11 theorems, 125 equations, 11 figures.

Key Result

Proposition 1

Consider the Wasserstein gradient descent as in eqn:forward-wass-descent with a lower-semicontinuous $\mathcal{G} = \mathcal{F} + \beta^{-1} \mathcal{E}$. Assume that there exists $\eta_\star >0$, such that for all $\eta \in (0, \eta_\star)$, $\mu_{t+\eta}$ in eqn:forward-wass-descent admits a well-

Figures (11)

  • Figure 1: We plot the density $p_{\mathbf{Y}_{\mathsf{r}}}(\cdot)$, for three SNR $\mathsf{r}$'s. Red shaded area corresponds to non-log-concave region with $\nabla^2 \log p_{\mathbf{Y}_{\mathsf{r}}}(\cdot) > -\delta$ with $\delta = 0$, and Orange shaded area corresponds to $\delta = 0.5$. For each $\delta$, we report the integrated tail $h_{\mu}(\delta, \mathsf{r})$ and survival function $s_\mathsf{r}(1-\delta)$ for $\delta \in \{ 0, 0.5 \}$. (a) low $\mathsf{r} = 0.71$, $s_{\mathsf{r}}(1) = 0$, $s_{\mathsf{r}}(0.5) = 0$; (b) mid $\mathsf{r} = 1.50$, $s_{\mathsf{r}}(1) = 0.18$, $s_{\mathsf{r}}(0.5) = 0.27$, non-trivial mass of bad locations; (c) high $\mathsf{r} = 3.00$, $s_{\mathsf{r}}(1) = 0.01$, $s_{\mathsf{r}}(0.5) = 0.01$, though bad locations do exist, samples $\mathbf{Y}_{\mathsf{r}}$ rarely end up there.
  • Figure 2: (a) $s_\mathsf{r}(u)$ Low SNR. (b) $s_\mathsf{r}(u)$ High SNR.
  • Figure 3: (a) $m^\ast(\mathsf{r}), \delta^\ast(\mathsf{r})$. (b) $\zeta^\ast(t)$.
  • Figure 4: (a) $s_\mathsf{r}(u)$. (b) $m^\ast(\mathsf{r}),\delta^\ast(\mathsf{r})$. (c) $\zeta^\ast(t)$.
  • Figure 5: (a) $s_\mathsf{r}(u)$. (b) $m^\ast(\mathsf{r}),\delta^\ast(\mathsf{r})$. (c) $\zeta^\ast(t)$.
  • ...and 6 more figures

Theorems & Definitions (36)

  • Definition 1: Score Function
  • Definition 2: OT Map, brenier1987decomposition
  • Proposition 1: Score Function and Backward OT Map
  • Proposition 2: Score and Backward Denoising
  • Proposition 3: Curvature and Localization
  • Proposition 4: Curvature and Score
  • Theorem 1: Forward Contraction
  • Remark 1
  • Theorem 2: Backward Expansion
  • Remark 2
  • ...and 26 more