Table of Contents
Fetching ...

Constant Rate Scheduling: A General Framework for Optimizing Diffusion Noise Schedule via Distributional Change

Shuntaro Okada, Kenji Doi, Ryota Yoshihashi, Hirokatsu Kataoka, Tomohiro Tanaka

TL;DR

CRS introduces a general, discrepancy-based framework to enforce a constant rate of distributional change in diffusion processes, optimizing both training and sampling noise schedules. By defining a velocity function ${v(\alpha)}$ from discrepancy measures (FID-based, data-prediction, or noise-prediction) and solving ${\frac{d \alpha(t)}{dt} \propto v(\alpha)^{-\xi}}$, CRS achieves improved sample fidelity and mode coverage across pixel-space and latent-space models, across many NFEs. Empirically, CRS yields state-of-the-art FID on LSUN Horse 256×256 (FID=${2.03}$) and demonstrates broad gains across datasets, samplers, and architectures, though it incurs training-time overhead and relies on domain-appropriate discrepancy measures. The approach offers a unified view of training and sampling schedules, with practical alternatives (e.g., CRS-vx + vcos) to scale to large datasets, and points to future theoretical and methodological developments to further strengthen schedule optimization in diffusion models.

Abstract

We propose a general framework for optimizing noise schedules in diffusion models, applicable to both training and sampling. Our method enforces a constant rate of change in the probability distribution of diffused data throughout the diffusion process, where the rate of change is quantified using a user-defined discrepancy measure. We introduce three such measures, which can be flexibly selected or combined depending on the domain and model architecture. While our framework is inspired by theoretical insights, we do not aim to provide a complete theoretical justification of how distributional change affects sample quality. Instead, we focus on establishing a general-purpose scheduling framework and validating its empirical effectiveness. Through extensive experiments, we demonstrate that our approach consistently improves the performance of both pixel-space and latent-space diffusion models, across various datasets, samplers, and a wide range of number of function evaluations from 5 to 250. In particular, when applied to both training and sampling schedules, our method achieves a state-of-the-art FID score of 2.03 on LSUN Horse 256$\times$256, without compromising mode coverage.

Constant Rate Scheduling: A General Framework for Optimizing Diffusion Noise Schedule via Distributional Change

TL;DR

CRS introduces a general, discrepancy-based framework to enforce a constant rate of distributional change in diffusion processes, optimizing both training and sampling noise schedules. By defining a velocity function from discrepancy measures (FID-based, data-prediction, or noise-prediction) and solving , CRS achieves improved sample fidelity and mode coverage across pixel-space and latent-space models, across many NFEs. Empirically, CRS yields state-of-the-art FID on LSUN Horse 256×256 (FID=) and demonstrates broad gains across datasets, samplers, and architectures, though it incurs training-time overhead and relies on domain-appropriate discrepancy measures. The approach offers a unified view of training and sampling schedules, with practical alternatives (e.g., CRS-vx + vcos) to scale to large datasets, and points to future theoretical and methodological developments to further strengthen schedule optimization in diffusion models.

Abstract

We propose a general framework for optimizing noise schedules in diffusion models, applicable to both training and sampling. Our method enforces a constant rate of change in the probability distribution of diffused data throughout the diffusion process, where the rate of change is quantified using a user-defined discrepancy measure. We introduce three such measures, which can be flexibly selected or combined depending on the domain and model architecture. While our framework is inspired by theoretical insights, we do not aim to provide a complete theoretical justification of how distributional change affects sample quality. Instead, we focus on establishing a general-purpose scheduling framework and validating its empirical effectiveness. Through extensive experiments, we demonstrate that our approach consistently improves the performance of both pixel-space and latent-space diffusion models, across various datasets, samplers, and a wide range of number of function evaluations from 5 to 250. In particular, when applied to both training and sampling schedules, our method achieves a state-of-the-art FID score of 2.03 on LSUN Horse 256256, without compromising mode coverage.

Paper Structure

This paper contains 43 sections, 36 equations, 20 figures, 30 tables, 3 algorithms.

Figures (20)

  • Figure 1: Toy example of diffused data distributions with three data points in one-dimensional data space. Probability distributions barely change when $\alpha \lesssim 0.6$, and we can rapidly change noise level. Three modes corresponding to data points emerge when $0.6 \lesssim \alpha \lesssim 0.97$, and noise level should be changed slowly for mode coverage. Three modes become distinct when $\alpha \gtrsim 0.97$, requiring careful control of noise level for sample fidelity.
  • Figure 2: Noise schedules computed using different discrepancy measures on LSUN Horse 256$\times$256 in the pixel-space diffusion model.
  • Figure 3: Noise schedules computed using different discrepancy measures on LSUN Church 256$\times$256 in the latent-space diffusion model.
  • Figure 4: Effect of the weights $w_{x}$ and $w_{\mathrm{FID}}$ on the noise schedule generated by CRS-$v_{x} + v_{\mathrm{FID}}$, with fixed exponents $\xi_{x} = \xi_{\mathrm{FID}} = 1$.
  • Figure 5: Effect of the exponent $\xi_{\mathrm{FID}}$ on the noise schedule generated by CRS-$v_{x}+v_{\mathrm{FID}}$, with fixed weights $w_{x} = w_{\mathrm{FID}} = 0.5$.
  • ...and 15 more figures