Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion Models

Yangming Li; Boris van Breugel; Mihaela van der Schaar

Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion Models

Yangming Li, Boris van Breugel, Mihaela van der Schaar

TL;DR

It is shown that current diffusion models actually have an expressive bottleneck in backward denoising and some assumption made by existing theoretical guarantees is too strong, and it is proved that diffusion models have unbounded errors in both local and globalDenoising.

Abstract

Because diffusion models have shown impressive performances in a number of tasks, such as image synthesis, there is a trend in recent works to prove (with certain assumptions) that these models have strong approximation capabilities. In this paper, we show that current diffusion models actually have an expressive bottleneck in backward denoising and some assumption made by existing theoretical guarantees is too strong. Based on this finding, we prove that diffusion models have unbounded errors in both local and global denoising. In light of our theoretical studies, we introduce soft mixture denoising (SMD), an expressive and efficient model for backward denoising. SMD not only permits diffusion models to well approximate any Gaussian mixture distributions in theory, but also is simple and efficient for implementation. Our experiments on multiple image datasets show that SMD significantly improves different types of diffusion models (e.g., DDPM), espeically in the situation of few backward iterations.

Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion Models

TL;DR

Abstract

Paper Structure (23 sections, 5 theorems, 58 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 23 sections, 5 theorems, 58 equations, 7 figures, 2 tables, 2 algorithms.

Introduction
Background: Discrete-time Diffusion Models
Theory: DMs Suffer from an Expressive Bottleneck
Limited Gaussian Denoising
Denoising and Approximation Errors
Limited Approximation Theorems
Method: Soft Mixture Denoising
Main Theory
Efficient Optimization and Sampling
Experiments
Visualising the Expressive Bottleneck
SMD Improves Image Quality
SMD Improves Inference Speed
Sampling Multiple $\eta$: a Cost-Quality Trade-off
Future Work
...and 8 more sections

Key Result

Proposition 3.1

For the diffusion process defined in Eq. (eq:forward def), suppose that the real data follow a Gaussian mixture: $q(\mathbf{x}_0) = \sum_{k=1}^K w_k \mathcal{N}(\mathbf{x}_0; \bm{\mu}_k, \bm{\Sigma}_k)$, which consists of $K$ Gaussian components with mixture weight $w_k$, mean vector $\bm{\mu}_k$, a where $w_k', \bm{\mu}_k'$ depend on both variable $\mathbf{x}_t$ and $\bm{\mu}_t$.

Figures (7)

Figure 1: SMD improves quality and reduces the number of backward iterations. Results for CelebA-HQ $256 \times 256$ with only $100$ backward iterations, for LDM with and without SDM. SDM achieves better realism and FID. Achieving the same FID with vanilla LDM would require $8\times$ more steps (see Fig. \ref{['fig:few iters']}). Note that SMD differs from fast samplers (e.g., DDIM song2021denoising and DPM lu2022dpm): while those methods focus on deterministic sampling and numerical stability, SMD improves the expressiveness of diffusion models.
Figure 2: Visualising the expressive bottleneck of standard diffusion models. Experimental results on synthetic dataset with $7\times 7$ Gaussians (right), for DDPM with $T=1000$. Even though DDPM has converged, we observe that the modes are not easily distinguishable. On the other hand, SMD converges much faster and results in distinguishable modes.
Figure 3: SMD reduces the number of sampling steps. Latent DDIM and DDPM for different iterations on CelebA-HQ ($256 \times 256$).
Figure 4: SMD quality is further improved by sampling multiple $\eta$, see Alg. \ref{['alg:training']} on LSUN-Conference ($64 \times 64$) for DDPM w/ SMD.
Figure 5: $64 \times 64$ images generated by DDPM w/ SMD.
...and 2 more figures

Theorems & Definitions (14)

Proposition 3.1: Non-Gaussian Inverse Probability
Remark 3.1
proof
Definition 3.1: Local Denoising Error
Definition 3.2: Global Denoising Error
Theorem 3.1: Uniformly Unbounded Denoising Error
proof
Theorem 3.2: Unbounded Approximation Error
proof
Theorem 4.1: Expressive Soft Mixture Denoising
...and 4 more

Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion Models

TL;DR

Abstract

Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (14)