Table of Contents
Fetching ...

Warm Diffusion: Recipe for Blur-Noise Mixture Diffusion Models

Hao-Chien Hsueh, Chi-En Yen, Wen-Hsiao Peng, Ching-Chun Huang

TL;DR

This paper bridges two key diffusion paradigms: hot diffusion, which relies entirely on noise, and cold diffusion, which uses only blurring without noise, and proposes Warm Diffusion, a unified Blur-Noise Mixture Diffusion Model (BNMD), to control blurring and noise jointly.

Abstract

Diffusion probabilistic models have achieved remarkable success in generative tasks across diverse data types. While recent studies have explored alternative degradation processes beyond Gaussian noise, this paper bridges two key diffusion paradigms: hot diffusion, which relies entirely on noise, and cold diffusion, which uses only blurring without noise. We argue that hot diffusion fails to exploit the strong correlation between high-frequency image detail and low-frequency structures, leading to random behaviors in the early steps of generation. Conversely, while cold diffusion leverages image correlations for prediction, it neglects the role of noise (randomness) in shaping the data manifold, resulting in out-of-manifold issues and partially explaining its performance drop. To integrate both strengths, we propose Warm Diffusion, a unified Blur-Noise Mixture Diffusion Model (BNMD), to control blurring and noise jointly. Our divide-and-conquer strategy exploits the spectral dependency in images, simplifying score model estimation by disentangling the denoising and deblurring processes. We further analyze the Blur-to-Noise Ratio (BNR) using spectral analysis to investigate the trade-off between model learning dynamics and changes in the data manifold. Extensive experiments across benchmarks validate the effectiveness of our approach for image generation.

Warm Diffusion: Recipe for Blur-Noise Mixture Diffusion Models

TL;DR

This paper bridges two key diffusion paradigms: hot diffusion, which relies entirely on noise, and cold diffusion, which uses only blurring without noise, and proposes Warm Diffusion, a unified Blur-Noise Mixture Diffusion Model (BNMD), to control blurring and noise jointly.

Abstract

Diffusion probabilistic models have achieved remarkable success in generative tasks across diverse data types. While recent studies have explored alternative degradation processes beyond Gaussian noise, this paper bridges two key diffusion paradigms: hot diffusion, which relies entirely on noise, and cold diffusion, which uses only blurring without noise. We argue that hot diffusion fails to exploit the strong correlation between high-frequency image detail and low-frequency structures, leading to random behaviors in the early steps of generation. Conversely, while cold diffusion leverages image correlations for prediction, it neglects the role of noise (randomness) in shaping the data manifold, resulting in out-of-manifold issues and partially explaining its performance drop. To integrate both strengths, we propose Warm Diffusion, a unified Blur-Noise Mixture Diffusion Model (BNMD), to control blurring and noise jointly. Our divide-and-conquer strategy exploits the spectral dependency in images, simplifying score model estimation by disentangling the denoising and deblurring processes. We further analyze the Blur-to-Noise Ratio (BNR) using spectral analysis to investigate the trade-off between model learning dynamics and changes in the data manifold. Extensive experiments across benchmarks validate the effectiveness of our approach for image generation.

Paper Structure

This paper contains 11 sections, 7 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Illustration of Warm Diffusion, the proposed two-pronged diffusion process. (a) Blur-noise mixture diffusion processes that allow flexible control over blur and noise levels, enabling a smooth transition between (1) Hot Diffusion and (4) Cold Diffusion. (b) A divide-and-conquer strategy using a joint model for denoising and deblurring, which leverages spectral dependency to recover noise-obscured signals and restore high-frequency details. (c) Data manifolds under different BNRs. Red and blue lines denote Gaussian means with shared low-frequency but distinct high-frequency components. Higher BNR leads to earlier merging as blurring removes high-frequency detail, potentially causing manifold shifts.
  • Figure 2: Workflow of the proposed diffusion process. The forward process progressively applies blurring and noise, controlled by the Blur-to-Noise Ratio (BNR), to degrade the sample from high quality to low quality. During this phase, training pairs are collected to train the prediction model (e.g., U-Net) for use in the reverse process. For sample generation, the reverse process works as follows: (a) The prediction model simultaneously performs denoising and deblurring. (b) With the prediction results, the reverse step transitions the sample from step $t$ to $t-1$. Specifically, the denoiser gradually guides the sample toward a blurry prediction, while the deblurring prediction helps return the sample to a higher-quality state.
  • Figure 3: Impact of Varying BNRs on Model Behavior. We illustrate the observed signal, denoising target, and deblurring target, along with their respective signal spectrum analyses, across different BNR values. From left to right, the noise level remains constant while the BNR value increases. As the BNR rises, the denoising task (red arrow) becomes progressively easier, shifting more responsibility to the deblurring task (blue arrow) and effectively utilizing the spectral dependency of images. In contrast, when BNR = 0, the model requires a stronger denoiser to directly generate the image, without leveraging the spectral dependency assistance from the deblurrer.
  • Figure 4: Illustration of the connection between BNR and the data manifold. When comparing two different BNRs at the same blur level, a higher BNR corresponds to a smaller noise scale, resulting in a narrower noise-covering space, as shown on the right. In the deblurring (reverse) step, a sample is guided toward the deblurring target, representing the mean image of all possible paired outputs. It is important to note that during the forward process, a single low-quality (LQ) sample is typically paired with multiple high-quality (HQ) samples for training. Due to this ill-posed nature of the deblurring task, samples with higher BNR values are more likely to deviate from the data manifold during the transition. Once samples fall out of the manifold, the neural network struggles to produce accurate predictions, leading to a decline in generation quality.
  • Figure 5: Illustration of sample quality corresponding to different BNR and NFE. Each curve represents a specific BNR value. As shown in the chart, higher BNR values require more sampling steps to achieve better sample quality.
  • ...and 1 more figures