Table of Contents
Fetching ...

Diffusion Rejection Sampling

Byeonghu Na, Yeongmin Kim, Minsang Park, Donghyeok Shin, Wanmo Kang, Il-Chul Moon

TL;DR

DiffRS introduces a per-timestep rejection-sampling framework for diffusion models that realigns the pre-trained reverse transition with the true transition by learning a time-dependent density-ratio discriminator. The method computes an acceptance probability $A_t = \frac{1}{M_t} \frac{q_{t|t+1}}{p_{t|t+1}}$ using estimated ratios $L_t$, and applies a re-initialization strategy to manage rejections, thereby refining samples without retraining the diffusion model. Theoretical analysis shows a tightened KL bound $D_{KL}(q_0||p_0^{\theta,\phi}) \le J(\theta) + R(\phi)$ with optimal discriminators achieving equality, while empirically DiffRS attains state-of-the-art FID on CIFAR-10 and strong results on ImageNet and large-scale text-to-image generation, including fast samplers and diffusion distillation. The approach demonstrates broad applicability across domains, offering practical gains in sample quality and efficiency for modern diffusion pipelines.

Abstract

Recent advances in powerful pre-trained diffusion models encourage the development of methods to improve the sampling performance under well-trained diffusion models. This paper introduces Diffusion Rejection Sampling (DiffRS), which uses a rejection sampling scheme that aligns the sampling transition kernels with the true ones at each timestep. The proposed method can be viewed as a mechanism that evaluates the quality of samples at each intermediate timestep and refines them with varying effort depending on the sample. Theoretical analysis shows that DiffRS can achieve a tighter bound on sampling error compared to pre-trained models. Empirical results demonstrate the state-of-the-art performance of DiffRS on the benchmark datasets and the effectiveness of DiffRS for fast diffusion samplers and large-scale text-to-image diffusion models. Our code is available at https://github.com/aailabkaist/DiffRS.

Diffusion Rejection Sampling

TL;DR

DiffRS introduces a per-timestep rejection-sampling framework for diffusion models that realigns the pre-trained reverse transition with the true transition by learning a time-dependent density-ratio discriminator. The method computes an acceptance probability using estimated ratios , and applies a re-initialization strategy to manage rejections, thereby refining samples without retraining the diffusion model. Theoretical analysis shows a tightened KL bound with optimal discriminators achieving equality, while empirically DiffRS attains state-of-the-art FID on CIFAR-10 and strong results on ImageNet and large-scale text-to-image generation, including fast samplers and diffusion distillation. The approach demonstrates broad applicability across domains, offering practical gains in sample quality and efficiency for modern diffusion pipelines.

Abstract

Recent advances in powerful pre-trained diffusion models encourage the development of methods to improve the sampling performance under well-trained diffusion models. This paper introduces Diffusion Rejection Sampling (DiffRS), which uses a rejection sampling scheme that aligns the sampling transition kernels with the true ones at each timestep. The proposed method can be viewed as a mechanism that evaluates the quality of samples at each intermediate timestep and refines them with varying effort depending on the sample. Theoretical analysis shows that DiffRS can achieve a tighter bound on sampling error compared to pre-trained models. Empirical results demonstrate the state-of-the-art performance of DiffRS on the benchmark datasets and the effectiveness of DiffRS for fast diffusion samplers and large-scale text-to-image diffusion models. Our code is available at https://github.com/aailabkaist/DiffRS.
Paper Structure (33 sections, 2 theorems, 22 equations, 29 figures, 9 tables, 3 algorithms)

This paper contains 33 sections, 2 theorems, 22 equations, 29 figures, 9 tables, 3 algorithms.

Key Result

Theorem 3.1

The KL divergence between data distribution $q_0$ and refined distribution $p_{0}^{\boldsymbol{\theta},\boldsymbol{\phi}}$ is bounded by: where $R(\boldsymbol{\phi}) := \mathbb{E}_{q_{T}} [ - \log \bar{A}_T^{\boldsymbol{\phi}} ] + \sum_{t=0}^{T-1} \mathbb{E}_{q_{t,t+1}} [ - \log \bar{A}_{t}^{\boldsymbol{\phi}} ]$. Moreover, this bound attains equality for the optimal $\boldsymbol{\phi}^*$, and in

Figures (29)

  • Figure 1: Overview of DiffRS. We sequentially apply the rejection sampling on the pre-trained transition kernel $p^{\boldsymbol{\theta}}_{t|t+1}({\mathbf{x}}_t |{\mathbf{x}}_{t+1})$ (red) to align the true transition kernel $q_{t|t+1}({\mathbf{x}}_t | {\mathbf{x}}_{t+1})$ (blue). The acceptance probability is estimated by the time-dependent discriminator $d^{\boldsymbol{\phi}}_t$.
  • Figure 2: Overview of the proposed re-initialization.
  • Figure 3: Illustration of the sampling process for DiffRS. The path with the green background represents the DiffRS sampling process, and the rightmost images are generated from the intermediate images using a base sampler without rejection. Timesteps are expressed as the noise level $\sigma$ from the EDM scheme karras2022elucidating.
  • Figure 4: FID vs. NFE on ImageNet 64$\times$64 with EDM.
  • Figure 5: Generated images with the highest (top) and lowest (bottom) acceptance probability at each timestep, obtained using the EDM (Heun) sampler on CIFAR-10. $\sigma=\{28.4, 1.92, 0.002\}$ corresponds to the $t=\{15, 9, 1\}$, respectively, with $T=18$.
  • ...and 24 more figures

Theorems & Definitions (3)

  • Theorem 3.1
  • Theorem 1.1
  • proof