Constant Rate Scheduling: A General Framework for Optimizing Diffusion Noise Schedule via Distributional Change
Shuntaro Okada, Kenji Doi, Ryota Yoshihashi, Hirokatsu Kataoka, Tomohiro Tanaka
TL;DR
CRS introduces a general, discrepancy-based framework to enforce a constant rate of distributional change in diffusion processes, optimizing both training and sampling noise schedules. By defining a velocity function ${v(\alpha)}$ from discrepancy measures (FID-based, data-prediction, or noise-prediction) and solving ${\frac{d \alpha(t)}{dt} \propto v(\alpha)^{-\xi}}$, CRS achieves improved sample fidelity and mode coverage across pixel-space and latent-space models, across many NFEs. Empirically, CRS yields state-of-the-art FID on LSUN Horse 256×256 (FID=${2.03}$) and demonstrates broad gains across datasets, samplers, and architectures, though it incurs training-time overhead and relies on domain-appropriate discrepancy measures. The approach offers a unified view of training and sampling schedules, with practical alternatives (e.g., CRS-vx + vcos) to scale to large datasets, and points to future theoretical and methodological developments to further strengthen schedule optimization in diffusion models.
Abstract
We propose a general framework for optimizing noise schedules in diffusion models, applicable to both training and sampling. Our method enforces a constant rate of change in the probability distribution of diffused data throughout the diffusion process, where the rate of change is quantified using a user-defined discrepancy measure. We introduce three such measures, which can be flexibly selected or combined depending on the domain and model architecture. While our framework is inspired by theoretical insights, we do not aim to provide a complete theoretical justification of how distributional change affects sample quality. Instead, we focus on establishing a general-purpose scheduling framework and validating its empirical effectiveness. Through extensive experiments, we demonstrate that our approach consistently improves the performance of both pixel-space and latent-space diffusion models, across various datasets, samplers, and a wide range of number of function evaluations from 5 to 250. In particular, when applied to both training and sampling schedules, our method achieves a state-of-the-art FID score of 2.03 on LSUN Horse 256$\times$256, without compromising mode coverage.
