Table of Contents
Fetching ...

Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion Models

Yangming Li, Boris van Breugel, Mihaela van der Schaar

TL;DR

It is shown that current diffusion models actually have an expressive bottleneck in backward denoising and some assumption made by existing theoretical guarantees is too strong, and it is proved that diffusion models have unbounded errors in both local and globalDenoising.

Abstract

Because diffusion models have shown impressive performances in a number of tasks, such as image synthesis, there is a trend in recent works to prove (with certain assumptions) that these models have strong approximation capabilities. In this paper, we show that current diffusion models actually have an expressive bottleneck in backward denoising and some assumption made by existing theoretical guarantees is too strong. Based on this finding, we prove that diffusion models have unbounded errors in both local and global denoising. In light of our theoretical studies, we introduce soft mixture denoising (SMD), an expressive and efficient model for backward denoising. SMD not only permits diffusion models to well approximate any Gaussian mixture distributions in theory, but also is simple and efficient for implementation. Our experiments on multiple image datasets show that SMD significantly improves different types of diffusion models (e.g., DDPM), espeically in the situation of few backward iterations.

Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion Models

TL;DR

It is shown that current diffusion models actually have an expressive bottleneck in backward denoising and some assumption made by existing theoretical guarantees is too strong, and it is proved that diffusion models have unbounded errors in both local and globalDenoising.

Abstract

Because diffusion models have shown impressive performances in a number of tasks, such as image synthesis, there is a trend in recent works to prove (with certain assumptions) that these models have strong approximation capabilities. In this paper, we show that current diffusion models actually have an expressive bottleneck in backward denoising and some assumption made by existing theoretical guarantees is too strong. Based on this finding, we prove that diffusion models have unbounded errors in both local and global denoising. In light of our theoretical studies, we introduce soft mixture denoising (SMD), an expressive and efficient model for backward denoising. SMD not only permits diffusion models to well approximate any Gaussian mixture distributions in theory, but also is simple and efficient for implementation. Our experiments on multiple image datasets show that SMD significantly improves different types of diffusion models (e.g., DDPM), espeically in the situation of few backward iterations.
Paper Structure (23 sections, 5 theorems, 58 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 23 sections, 5 theorems, 58 equations, 7 figures, 2 tables, 2 algorithms.

Key Result

Proposition 3.1

For the diffusion process defined in Eq. (eq:forward def), suppose that the real data follow a Gaussian mixture: $q(\mathbf{x}_0) = \sum_{k=1}^K w_k \mathcal{N}(\mathbf{x}_0; \bm{\mu}_k, \bm{\Sigma}_k)$, which consists of $K$ Gaussian components with mixture weight $w_k$, mean vector $\bm{\mu}_k$, a where $w_k', \bm{\mu}_k'$ depend on both variable $\mathbf{x}_t$ and $\bm{\mu}_t$.

Figures (7)

  • Figure 1: SMD improves quality and reduces the number of backward iterations. Results for CelebA-HQ $256 \times 256$ with only $100$ backward iterations, for LDM with and without SDM. SDM achieves better realism and FID. Achieving the same FID with vanilla LDM would require $8\times$ more steps (see Fig. \ref{['fig:few iters']}). Note that SMD differs from fast samplers (e.g., DDIM song2021denoising and DPM lu2022dpm): while those methods focus on deterministic sampling and numerical stability, SMD improves the expressiveness of diffusion models.
  • Figure 2: Visualising the expressive bottleneck of standard diffusion models. Experimental results on synthetic dataset with $7\times 7$ Gaussians (right), for DDPM with $T=1000$. Even though DDPM has converged, we observe that the modes are not easily distinguishable. On the other hand, SMD converges much faster and results in distinguishable modes.
  • Figure 3: SMD reduces the number of sampling steps. Latent DDIM and DDPM for different iterations on CelebA-HQ ($256 \times 256$).
  • Figure 4: SMD quality is further improved by sampling multiple $\eta$, see Alg. \ref{['alg:training']} on LSUN-Conference ($64 \times 64$) for DDPM w/ SMD.
  • Figure 5: $64 \times 64$ images generated by DDPM w/ SMD.
  • ...and 2 more figures

Theorems & Definitions (14)

  • Proposition 3.1: Non-Gaussian Inverse Probability
  • Remark 3.1
  • proof
  • Definition 3.1: Local Denoising Error
  • Definition 3.2: Global Denoising Error
  • Theorem 3.1: Uniformly Unbounded Denoising Error
  • proof
  • Theorem 3.2: Unbounded Approximation Error
  • proof
  • Theorem 4.1: Expressive Soft Mixture Denoising
  • ...and 4 more