On Error Propagation of Diffusion Models

Yangming Li; Mihaela van der Schaar

On Error Propagation of Diffusion Models

Yangming Li, Mihaela van der Schaar

TL;DR

This work identifies and analyzes error propagation in diffusion models, showing that their chain structure can amplify intermediate errors. It introduces a formal framework with modular error, cumulative error, and a propagation equation governed by an amplification factor $\mu_t$, and proves that standard DMs exhibit propagation tendencies ($\mu_t \ge 1$) under reasonable assumptions. To mitigate propagation, the authors propose a tractable regularization term derived from an upper bound on the cumulative error, along with a bootstrap-based estimation algorithm inspired by TD learning, enabling efficient training. Empirical results on CIFAR-10, CelebA, and ImageNet demonstrate substantial reductions in error propagation, improved FID scores, and superiority over baselines, validating both the theory and the practical utility of the proposed regularization. The work provides a principled approach to diagnose and reduce exposure-bias-like effects in diffusion models, with direct implications for more reliable and higher-quality image synthesis.

Abstract

Although diffusion models (DMs) have shown promising performances in a number of tasks (e.g., speech synthesis and image generation), they might suffer from error propagation because of their sequential structure. However, this is not certain because some sequential models, such as Conditional Random Field (CRF), are free from this problem. To address this issue, we develop a theoretical framework to mathematically formulate error propagation in the architecture of DMs, The framework contains three elements, including modular error, cumulative error, and propagation equation. The modular and cumulative errors are related by the equation, which interprets that DMs are indeed affected by error propagation. Our theoretical study also suggests that the cumulative error is closely related to the generation quality of DMs. Based on this finding, we apply the cumulative error as a regularization term to reduce error propagation. Because the term is computationally intractable, we derive its upper bound and design a bootstrap algorithm to efficiently estimate the bound for optimization. We have conducted extensive experiments on multiple image datasets, showing that our proposed regularization reduces error propagation, significantly improves vanilla DMs, and outperforms previous baselines.

On Error Propagation of Diffusion Models

TL;DR

, and proves that standard DMs exhibit propagation tendencies (

) under reasonable assumptions. To mitigate propagation, the authors propose a tractable regularization term derived from an upper bound on the cumulative error, along with a bootstrap-based estimation algorithm inspired by TD learning, enabling efficient training. Empirical results on CIFAR-10, CelebA, and ImageNet demonstrate substantial reductions in error propagation, improved FID scores, and superiority over baselines, validating both the theory and the practical utility of the proposed regularization. The work provides a principled approach to diagnose and reduce exposure-bias-like effects in diffusion models, with direct implications for more reliable and higher-quality image synthesis.

Abstract

Paper Structure (37 sections, 3 theorems, 37 equations, 4 figures, 2 tables)

This paper contains 37 sections, 3 theorems, 37 equations, 4 figures, 2 tables.

Introduction
DMs potentially suffer from error propagation.
Current works lack reliable explanations.
Our theory for the error propagation of DMs.
Why and how to reduce error propagation.
Contributions.
Background: Discrete-Time DMs
Theoretical Study
Analysis Framework
Elements of the framework.
Interpretation of the propagation equation.
Why current explanations are not solid.
Error Definitions
Derivation of the modular error.
Derivation of the cumulative error.
...and 22 more sections

Key Result

Theorem 3.1

For the forward and backward processes respectively defined in Eq. (eq:forward def) and Eq. (eq:backward proc), suppose that the output of neural network $\bm{\epsilon}_{\theta}(\cdot)$ (as defined in Eq. (eq:mu def)) follows a standard Gaussian regardless of input distributions and the entropy of d where $t \in [1, T]$ and we specially set $\mathcal{E}_{T+1} = 0$.

Figures (4)

Figure 1: A toy example (with $T = 3$) to show our theoretical framework for the error propagation of diffusion models. We use dash lines to indicate that the impact of cumulative error$\mathcal{E}_{t+1}^{\mathrm{cumu}}$ on denoising module $p_{\theta}(\mathbf{x}_{t-1} \mid \mathbf{x}_t)$ is defined at the distributional (not sample) level.
Figure 2: Uptrend dynamics of the MMD error $\mathcal{D}^{\mathrm{cumu}}_t$ w.r.t. decreasing iteration $t$. The cumulative error $\mathcal{E}_{t}^{\mathrm{cumu}}$ might show similar behaviors since it is tightly bounded by the MMD error.
Figure 3: Re-estimated dynamics of the MMD error $\mathcal{D}^{\mathrm{cumu}}_t$ with respect to decreasing iteration $t$ after applying our proposed regularization. These dynamics should be compared with those in Fig. \ref{['fig:intuition exp']}, showing that we have well handled error propagation.
Figure 4: A trade-off study of the hyper-parameter $L$ (i.e., the number of bootstrapping steps) on two datasets. The results show that, as $L$ gets larger, the model performance limitedly increases while the time cost (measured in seconds) boundlessly grows.

Theorems & Definitions (11)

Definition 3.1: Modular Error
Definition 3.2: Cumulative Error
Remark 3.1
Remark 3.2
Theorem 3.1: Propagation Equation
Remark 3.3
proof
Proposition 4.1: Bounds of the Cumulative Error
proof
Proposition A.1: Zero Cumulative Errors
...and 1 more

On Error Propagation of Diffusion Models

TL;DR

Abstract

On Error Propagation of Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)