Table of Contents
Fetching ...

An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization

Minshuo Chen, Song Mei, Jianqing Fan, Mengdi Wang

TL;DR

The paper surveys diffusion models from a continuous-time, score-based lens, connecting unconditional diffusion theory to conditional, guided generation and showcasing applications across vision, audio, RL, and life sciences. It consolidates learning and estimation of score functions, distribution learning, and sampling theory, and extends to conditional diffusion, guidance mechanisms, and diffusion-based black-box optimization. Key contributions include theoretical progress on score estimation under subspace and manifold structures, guidance-strength analyses, and a principled optimization framework that treats constrained rewards as conditional sampling problems. The synthesis clarifies when diffusion models can be trusted for high-dimensional, structured data and how guidance and optimization perspectives open new research directions. It also outlines future directions, including stochastic control links, robustness considerations, and discrete-diffusion variants for discrete data settings.

Abstract

Diffusion models, a powerful and universal generative AI technology, have achieved tremendous success in computer vision, audio, reinforcement learning, and computational biology. In these applications, diffusion models provide flexible high-dimensional data modeling, and act as a sampler for generating new samples under active guidance towards task-desired properties. Despite the significant empirical success, theory of diffusion models is very limited, potentially slowing down principled methodological innovations for further harnessing and improving diffusion models. In this paper, we review emerging applications of diffusion models, understanding their sample generation under various controls. Next, we overview the existing theories of diffusion models, covering their statistical properties and sampling capabilities. We adopt a progressive routine, beginning with unconditional diffusion models and connecting to conditional counterparts. Further, we review a new avenue in high-dimensional structured optimization through conditional diffusion models, where searching for solutions is reformulated as a conditional sampling problem and solved by diffusion models. Lastly, we discuss future directions about diffusion models. The purpose of this paper is to provide a well-rounded theoretical exposure for stimulating forward-looking theories and methods of diffusion models.

An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization

TL;DR

The paper surveys diffusion models from a continuous-time, score-based lens, connecting unconditional diffusion theory to conditional, guided generation and showcasing applications across vision, audio, RL, and life sciences. It consolidates learning and estimation of score functions, distribution learning, and sampling theory, and extends to conditional diffusion, guidance mechanisms, and diffusion-based black-box optimization. Key contributions include theoretical progress on score estimation under subspace and manifold structures, guidance-strength analyses, and a principled optimization framework that treats constrained rewards as conditional sampling problems. The synthesis clarifies when diffusion models can be trusted for high-dimensional, structured data and how guidance and optimization perspectives open new research directions. It also outlines future directions, including stochastic control links, robustness considerations, and discrete-diffusion variants for discrete data settings.

Abstract

Diffusion models, a powerful and universal generative AI technology, have achieved tremendous success in computer vision, audio, reinforcement learning, and computational biology. In these applications, diffusion models provide flexible high-dimensional data modeling, and act as a sampler for generating new samples under active guidance towards task-desired properties. Despite the significant empirical success, theory of diffusion models is very limited, potentially slowing down principled methodological innovations for further harnessing and improving diffusion models. In this paper, we review emerging applications of diffusion models, understanding their sample generation under various controls. Next, we overview the existing theories of diffusion models, covering their statistical properties and sampling capabilities. We adopt a progressive routine, beginning with unconditional diffusion models and connecting to conditional counterparts. Further, we review a new avenue in high-dimensional structured optimization through conditional diffusion models, where searching for solutions is reformulated as a conditional sampling problem and solved by diffusion models. Lastly, we discuss future directions about diffusion models. The purpose of this paper is to provide a well-rounded theoretical exposure for stimulating forward-looking theories and methods of diffusion models.
Paper Structure (35 sections, 2 theorems, 28 equations, 9 figures)

This paper contains 35 sections, 2 theorems, 28 equations, 9 figures.

Key Result

Theorem 1

Suppose the data distribution $P_{\rm data}$ is supported on a cube $[-1, 1]^D$ with a density function of smoothness index $s$. Under some conditionsThese are technical conditions on the data distribution approaching the boundary of the hypercube. A precise statement can be found in oko2023diffusio where $d_{\rm TV}$ is the total variation distance. Moreover, suppose the data distribution is supp

Figures (9)

  • Figure 1: Demonstration of forward and backward processes in diffusion models. The forward process is a noise corruption process, while the backward process is used for new sample generation.
  • Figure 2: Conditional diffusion models generate images under various guidance black2023training. The upper row demonstrates an alignment with text description consisting of multiple objects. The lower row demonstrates an abstract description of aesthetic quality.
  • Figure 3: Decision diffuser in ajay2022conditional. The model is trained on labeled trajectories and is capable of generating state-action trajectories conditioned on desired reward values, constraints, or skills.
  • Figure 4: U-Net architecture in ronneberger2015u for $32 \times 32$ resolution RGB images. When generating new samples using a discretized backward process, diffusion models utilize the U-Net at each discretization step for transforming samples. The image sample together with a time embedding is first compressed into a low-dimensional representation and then lifted back to the original dimension.
  • Figure 5: Simplified U-Net architecture in chen2023score for approximating score functions in the low-dimensional subspace data setting. Matrix $V$ represents the linear encoder and decoder, which is to be jointly learned with parameter $\theta$ during the optimization of loss \ref{['eq:score_practical']}. Here $f_\theta$ is a network with input and output dimensions being the subspace dimension.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Remark 1: Network class ${\mathcal{S}}$
  • Theorem 1: Sample Complexity of Distribution Estimation
  • Theorem 2: Conditional Diffusion for Black-Box Optimization