Table of Contents
Fetching ...

Resolution Chromatography of Diffusion Models

Juno Hwang, Yong-Hyun Park, Junghyo Jo

TL;DR

This work introduces resolution chromatography, a quantitative framework that decomposes diffusion-model sampling into per-resolution signal-generation rates determined by the noise schedule. By establishing SNR-based time adjustments and intensity scalings, it explains the observed coarse-to-fine progression and enables practical techniques such as cascaded upscaling and time-aware prompting. The authors derive a general theory, prove a schedule-mapping property, and validate it with text-to-image diffusion models, illustrating both conceptual insight and actionable methods. The approach promises better noise-schedule design and new avenues for resolution-aware diffusion modeling in high-resolution image generation.

Abstract

Diffusion models generate high-resolution images through iterative stochastic processes. In particular, the denoising method is one of the most popular approaches that predicts the noise in samples and denoises it at each time step. It has been commonly observed that the resolution of generated samples changes over time, starting off blurry and coarse, and becoming sharper and finer. In this paper, we introduce "resolution chromatography" that indicates the signal generation rate of each resolution, which is very helpful concept to mathematically explain this coarse-to-fine behavior in generation process, to understand the role of noise schedule, and to design time-dependent modulation. Using resolution chromatography, we determine which resolution level becomes dominant at a specific time step, and experimentally verify our theory with text-to-image diffusion models. We also propose some direct applications utilizing the concept: upscaling pre-trained models to higher resolutions and time-dependent prompt composing. Our theory not only enables a better understanding of numerous pre-existing techniques for manipulating image generation, but also suggests the potential for designing better noise schedules.

Resolution Chromatography of Diffusion Models

TL;DR

This work introduces resolution chromatography, a quantitative framework that decomposes diffusion-model sampling into per-resolution signal-generation rates determined by the noise schedule. By establishing SNR-based time adjustments and intensity scalings, it explains the observed coarse-to-fine progression and enables practical techniques such as cascaded upscaling and time-aware prompting. The authors derive a general theory, prove a schedule-mapping property, and validate it with text-to-image diffusion models, illustrating both conceptual insight and actionable methods. The approach promises better noise-schedule design and new avenues for resolution-aware diffusion modeling in high-resolution image generation.

Abstract

Diffusion models generate high-resolution images through iterative stochastic processes. In particular, the denoising method is one of the most popular approaches that predicts the noise in samples and denoises it at each time step. It has been commonly observed that the resolution of generated samples changes over time, starting off blurry and coarse, and becoming sharper and finer. In this paper, we introduce "resolution chromatography" that indicates the signal generation rate of each resolution, which is very helpful concept to mathematically explain this coarse-to-fine behavior in generation process, to understand the role of noise schedule, and to design time-dependent modulation. Using resolution chromatography, we determine which resolution level becomes dominant at a specific time step, and experimentally verify our theory with text-to-image diffusion models. We also propose some direct applications utilizing the concept: upscaling pre-trained models to higher resolutions and time-dependent prompt composing. Our theory not only enables a better understanding of numerous pre-existing techniques for manipulating image generation, but also suggests the potential for designing better noise schedules.
Paper Structure (21 sections, 1 theorem, 37 equations, 9 figures, 1 algorithm)

This paper contains 21 sections, 1 theorem, 37 equations, 9 figures, 1 algorithm.

Key Result

Theorem 1

Let $\alpha_t$ and $\alpha_{t}'$ be two monotonically decreasing noise schedules, and $r_m(t)$ and $r_m'(t)$ their respective resolution chromatographies. Suppose there exists a mapping $t'(t)$ such that $\alpha_t = \alpha_{t'}'$. Then, for all $m$, it follows that $r_m(t) = r_m'(t')$.

Figures (9)

  • Figure 1: Coarse-to-fine signal generation process in diffusion models. (a) Noised samples $x_t$, their corresponding denoised samples' expectation $\mathbb{E}[x_0|x_t]$, and the differences between consecutive time steps. (b) Power Spectral Density (PSD) of changes in expectations over time, averaged across 500 samples. As time $t$ approaches to 0, the intensity in the low-frequency domain decreases, while the high-frequency domain becomes more intense, suggesting the coarse-to-fine behavior.
  • Figure 2: Time adjustment for SNR match. In the middle, we observe the diffusion process of a pristine, high-resolution image. At the bottom, we can see downsampled versions of these high-dimensional images after undergoing the diffusion process. The noise schedule, which dictates the signal-to-noise ratio (SNR), undergoes distinct alterations when applied to high-resolution and low-resolution images. Initially, the green-bounded image matches its SNR to that of the yellow-bounded image, which represents a downsampled compartment of the original high-resolution image, at an earlier stage of the diffusion process. In this figure, we used a kernel size of $n=4$ to emphasize the difference in noise variance.
  • Figure 3: Time adjustment for cosine schedule under iterative downsampling with kernel size of $n=2$ .
  • Figure 4: Resolution chromatography. Left: Theoretical calculation of the evolving contribution of signals at various resolutions, denoted by $r_m(t)$, through backward diffusion processes from $t=T$ to $0$. Right: The decomposition of resolution chromatography for classifier-free guidance, highlighting the individual contributions of different resolutions, denoted by $\epsilon_{\textrm{CFG}}^{(m)}$. The heat maps at the bottom represent the relative intensity norm of each noise predictor.
  • Figure 5: Cascaded image generation. Low-resolution images are utilized as templates for producing high-resolution counterparts through the integration of high-resolution residual components. The training process begins with the preparation of a low-resolution dataset, which is employed to train the low-resolution noise predictor. Subsequently, the high-resolution residual signal is independently learned through the residual noise predictor. Finally, the image generation process combines the low-resolution noise predictor and the high-resolution residual noise predictor, following appropriate intensity rescaling and time adjustments.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Theorem 1