Diffusion Rejection Sampling
Byeonghu Na, Yeongmin Kim, Minsang Park, Donghyeok Shin, Wanmo Kang, Il-Chul Moon
TL;DR
DiffRS introduces a per-timestep rejection-sampling framework for diffusion models that realigns the pre-trained reverse transition with the true transition by learning a time-dependent density-ratio discriminator. The method computes an acceptance probability $A_t = \frac{1}{M_t} \frac{q_{t|t+1}}{p_{t|t+1}}$ using estimated ratios $L_t$, and applies a re-initialization strategy to manage rejections, thereby refining samples without retraining the diffusion model. Theoretical analysis shows a tightened KL bound $D_{KL}(q_0||p_0^{\theta,\phi}) \le J(\theta) + R(\phi)$ with optimal discriminators achieving equality, while empirically DiffRS attains state-of-the-art FID on CIFAR-10 and strong results on ImageNet and large-scale text-to-image generation, including fast samplers and diffusion distillation. The approach demonstrates broad applicability across domains, offering practical gains in sample quality and efficiency for modern diffusion pipelines.
Abstract
Recent advances in powerful pre-trained diffusion models encourage the development of methods to improve the sampling performance under well-trained diffusion models. This paper introduces Diffusion Rejection Sampling (DiffRS), which uses a rejection sampling scheme that aligns the sampling transition kernels with the true ones at each timestep. The proposed method can be viewed as a mechanism that evaluates the quality of samples at each intermediate timestep and refines them with varying effort depending on the sample. Theoretical analysis shows that DiffRS can achieve a tighter bound on sampling error compared to pre-trained models. Empirical results demonstrate the state-of-the-art performance of DiffRS on the benchmark datasets and the effectiveness of DiffRS for fast diffusion samplers and large-scale text-to-image diffusion models. Our code is available at https://github.com/aailabkaist/DiffRS.
