Table of Contents
Fetching ...

Soft Diffusion: Score Matching for General Corruptions

Giannis Daras, Mauricio Delbracio, Hossein Talebi, Alexandros G. Dimakis, Peyman Milanfar

TL;DR

The paper expands diffusion-based generative modeling beyond additive noise by introducing Soft Score Matching to learn score functions for general linear corruptions such as Gaussian blur and masking. It unifies training objective, sampling, and scheduling to invert degradation processes, achieving state-of-the-art FID on CelebA-64 (1.85) and offering faster sampling than vanilla diffusion. The approach demonstrates that carefully designed corruption operators and principled optimization can significantly improve sample quality and efficiency. It also introduces Momentum Sampler and a probability-flow variant, broadening the practical applicability of diffusion models to a wider class of inverse problems.

Abstract

We define a broader family of corruption processes that generalizes previously known diffusion models. To reverse these general diffusions, we propose a new objective called Soft Score Matching that provably learns the score function for any linear corruption process and yields state of the art results for CelebA. Soft Score Matching incorporates the degradation process in the network. Our new loss trains the model to predict a clean image, \textit{that after corruption}, matches the diffused observation. We show that our objective learns the gradient of the likelihood under suitable regularity conditions for a family of corruption processes. We further develop a principled way to select the corruption levels for general diffusion processes and a novel sampling method that we call Momentum Sampler. We show experimentally that our framework works for general linear corruption processes, such as Gaussian blur and masking. We achieve state-of-the-art FID score $1.85$ on CelebA-64, outperforming all previous linear diffusion models. We also show significant computational benefits compared to vanilla denoising diffusion.

Soft Diffusion: Score Matching for General Corruptions

TL;DR

The paper expands diffusion-based generative modeling beyond additive noise by introducing Soft Score Matching to learn score functions for general linear corruptions such as Gaussian blur and masking. It unifies training objective, sampling, and scheduling to invert degradation processes, achieving state-of-the-art FID on CelebA-64 (1.85) and offering faster sampling than vanilla diffusion. The approach demonstrates that carefully designed corruption operators and principled optimization can significantly improve sample quality and efficiency. It also introduces Momentum Sampler and a probability-flow variant, broadening the practical applicability of diffusion models to a wider class of inverse problems.

Abstract

We define a broader family of corruption processes that generalizes previously known diffusion models. To reverse these general diffusions, we propose a new objective called Soft Score Matching that provably learns the score function for any linear corruption process and yields state of the art results for CelebA. Soft Score Matching incorporates the degradation process in the network. Our new loss trains the model to predict a clean image, \textit{that after corruption}, matches the diffused observation. We show that our objective learns the gradient of the likelihood under suitable regularity conditions for a family of corruption processes. We further develop a principled way to select the corruption levels for general diffusion processes and a novel sampling method that we call Momentum Sampler. We show experimentally that our framework works for general linear corruption processes, such as Gaussian blur and masking. We achieve state-of-the-art FID score on CelebA-64, outperforming all previous linear diffusion models. We also show significant computational benefits compared to vanilla denoising diffusion.
Paper Structure (37 sections, 2 theorems, 31 equations, 13 figures, 1 table, 2 algorithms)

This paper contains 37 sections, 2 theorems, 31 equations, 13 figures, 1 table, 2 algorithms.

Key Result

Theorem 3.1

Let $q_0, q_t$ be two distributions in $\mathbb R^n$. Assume that all conditional distributions, $q_t({\bm{x}}_t|{\bm{x}}_0)$, are fully supported and differentiable in $\mathbb R^n$. Let: Then, there is a universal constant $C$ (that does not depend on $\theta$) such that: $J_1(\theta) = J_2(\theta) + C$.

Figures (13)

  • Figure 1: Top two rows: Demonstration of our generalized diffusion method. Instead of corrupting by only adding noise, we propose a framework to provably learn the score function to reverse any linear diffusion (left: blur and noise, right: masking and noise). Our (blur and noise) models achieve state-of-the-art FID score $\mathbf{1.85}$ on CelebA-$64$. Uncurated samples shown in the last three rows.
  • Figure 2: Uncurated samples from our trained models on CIFAR-10 (left) and CelebA (right).
  • Figure 3: FID versus NFEs (CelebA-64).
  • Figure 4: Effect of sampling method on the quality of the generated samples. The images from the Naive Sampler \ref{['fig:sampling_ablation_1']} seem repetitive and lack details. Momentum Sampler \ref{['fig:sampling_ablation_2']} dramatically improves the sampling quality and the FID score.
  • Figure 5: Conditional means $\mathbb{E}[{\bm{x}}_0|{\bm{x}}_t]$ predictions of our blur/masking models, at different diffusion times.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Theorem 3.1
  • Theorem : \ref{['dsm_theorem']}
  • proof : Proof of Theorem \ref{['dsm_theorem']}