Table of Contents
Fetching ...

Information-Guided Noise Allocation for Efficient Diffusion Training

Gabriel Raya, Bac Nguyen, Georgios Batzolis, Yuhta Takida, Dejan Stancevic, Naoki Murata, Chieh-Hsin Lai, Yuki Mitsufuji, Luca Ambrogioni

TL;DR

InformationNoise is introduced, a principled data-adaptive training noise schedule that replaces heuristic schedule design with an information-guided noise sampling distribution derived from entropy-reduction rates estimated from denoising losses already computed during training.

Abstract

Training diffusion models typically relies on manually tuned noise schedules, which can waste computation on weakly informative noise regions and limit transfer across datasets, resolutions, and representations. We revisit noise schedule allocation through an information-theoretic lens and propose the conditional entropy rate of the forward process as a theoretically grounded, data-dependent diagnostic for identifying suboptimal noise-level allocation in existing schedules. Based on these insight, we introduce InfoNoise, a principled data-adaptive training noise schedule that replaces heuristic schedule design with an information-guided noise sampling distribution derived from entropy-reduction rates estimated from denoising losses already computed during training. Across natural-image benchmarks, InfoNoise matches or surpasses tuned EDM-style schedules, in some cases with a substantial training speedup (about $1.4\times$ on CIFAR-10). On discrete datasets, where standard image-tuned schedules exhibit significant mismatch, it reaches superior quality in up to $3\times$ fewer training steps. Overall, InfoNoise makes noise scheduling data-adaptive, reducing the need for per-dataset schedule design as diffusion models expand across domains.

Information-Guided Noise Allocation for Efficient Diffusion Training

TL;DR

InformationNoise is introduced, a principled data-adaptive training noise schedule that replaces heuristic schedule design with an information-guided noise sampling distribution derived from entropy-reduction rates estimated from denoising losses already computed during training.

Abstract

Training diffusion models typically relies on manually tuned noise schedules, which can waste computation on weakly informative noise regions and limit transfer across datasets, resolutions, and representations. We revisit noise schedule allocation through an information-theoretic lens and propose the conditional entropy rate of the forward process as a theoretically grounded, data-dependent diagnostic for identifying suboptimal noise-level allocation in existing schedules. Based on these insight, we introduce InfoNoise, a principled data-adaptive training noise schedule that replaces heuristic schedule design with an information-guided noise sampling distribution derived from entropy-reduction rates estimated from denoising losses already computed during training. Across natural-image benchmarks, InfoNoise matches or surpasses tuned EDM-style schedules, in some cases with a substantial training speedup (about on CIFAR-10). On discrete datasets, where standard image-tuned schedules exhibit significant mismatch, it reaches superior quality in up to fewer training steps. Overall, InfoNoise makes noise scheduling data-adaptive, reducing the need for per-dataset schedule design as diffusion models expand across domains.
Paper Structure (74 sections, 66 equations, 14 figures, 1 table, 1 algorithm)

This paper contains 74 sections, 66 equations, 14 figures, 1 table, 1 algorithm.

Figures (14)

  • Figure 1: Uncertainty collapses in an intermediate noise range.Top. Two-point toy data ($-1,+1$) with additive Gaussian corruption at noise level $\sigma$. As $\sigma$ decreases (moving right to left), the optimal denoising field bifurcates: a single symmetric stable fixed point near $0$ splits into two stable branches near $\pm 1$ (dotted curves), marking a decision window where the posterior transitions from averaging over modes to committing to one. The red curve overlays the toy entropy-rate profile, peaking in this same window where uncertainty resolves most rapidly. Background shows the log-density of $\mathbf{x}_\sigma$ (yellow high, purple low). In this view, the bifurcation marks the onset of symmetry breaking, coinciding with the noise range where uncertainty collapses fastest. Bottom. CIFAR-10 optimal-denoiser diagnostic (\ref{['supp:optimal_denoiser']}). Each column shows a noisy input (top) and its optimal denoised prediction (bottom), with the most pronounced qualitative change concentrated at intermediate $\sigma$, mirroring the toy decision window.
  • Figure 2: Fixed noise schedules do not transfer across datasets, resolutions, and representations.(a) At the same noise level $\sigma$, images at different resolutions can lose markedly different structure; equal $\sigma$ need not imply comparable degradation. (b) Entropy-rate profiles vary across datasets and representations, indicating that the noise range where uncertainty drops fastest is data-dependent. Fixed schedules can therefore over-sample near-flat regions and under-sample the informative window where learning has the highest leverage.
  • Figure 3: InfoNoise schedule construction (CIFAR-10, DNA).Left: from rate to allocation. We estimate the entropy rate signal $\dot{\mathrm{H}}[\mathbf{x}_0\mid \mathbf{x}_\sigma]$ and normalize it to a target density $\rho(\sigma)$ (orange). Its CDF $u(\sigma)$ defines entropic time: sampling uniformly in $u$ concentrates draws in the $\sigma$ ranges where uncertainty is reduced most rapidly. Right: realizing $\rho$ under weighted training. With per-noise loss weights $w(\sigma)$, we sample $\pi(\sigma)\propto \rho(\sigma)/w(\sigma)$ so that the effective emphasis $\phi(\sigma)=\pi(\sigma)w(\sigma)$ matches $\rho(\sigma)$ (up to normalization).
  • Figure 4: Discrete domains expose schedule mismatch.Left: Offline reference allocation (from a frozen checkpoint) versus the emphasis induced by fixed samplers (EDM log-normal, log-uniform) and by InfoNoise. Right: Quality versus training compute on DNA (Sei FID) and binarized images (FID).
  • Figure 5: Online schedule emergence on DNA. Evolution of the training noise schedule$\pi(\sigma)$ at several points during training (early$\rightarrow$late), after an initial warm-up under $\pi_{\mathrm{base}}$. The black curve shows the offline diagnostic computed from a frozen, converged checkpoint (analysis-only). InfoNoise rapidly concentrates mass in a narrow intermediate-noise band and then changes little, indicating that the informative region is detectable early.
  • ...and 9 more figures