Table of Contents
Fetching ...

Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis

Sangyun Lee, Hyungjin Chung, Jaehyeon Kim, Jong Chul Ye

TL;DR

This work questions the conventional all-frequency diffusion generation by introducing a frequency-aware, coarse-to-fine paradigm. It develops a generalized diffusion framework operating in a rotated coordinate system and a blur diffusion variant that diffuses frequency components at different rates, enabling progressive deblurring and denoising without extra upsamplers. Key contributions include a tractable forward model, a blur diffusion specialization, a denoising score matching training objective, and a reverse-time sampler, with empirical improvements in FID on LSUN 64×64. The approach offers a principled way to inject image-specific inductive bias into diffusion models and shows promise for scaling to higher resolutions and broader tasks.

Abstract

Recently, diffusion models have shown remarkable results in image synthesis by gradually removing noise and amplifying signals. Although the simple generative process surprisingly works well, is this the best way to generate image data? For instance, despite the fact that human perception is more sensitive to the low frequencies of an image, diffusion models themselves do not consider any relative importance of each frequency component. Therefore, to incorporate the inductive bias for image data, we propose a novel generative process that synthesizes images in a coarse-to-fine manner. First, we generalize the standard diffusion models by enabling diffusion in a rotated coordinate system with different velocities for each component of the vector. We further propose a blur diffusion as a special case, where each frequency component of an image is diffused at different speeds. Specifically, the proposed blur diffusion consists of a forward process that blurs an image and adds noise gradually, after which a corresponding reverse process deblurs an image and removes noise progressively. Experiments show that the proposed model outperforms the previous method in FID on LSUN bedroom and church datasets. Code is available at https://github.com/sangyun884/blur-diffusion.

Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis

TL;DR

This work questions the conventional all-frequency diffusion generation by introducing a frequency-aware, coarse-to-fine paradigm. It develops a generalized diffusion framework operating in a rotated coordinate system and a blur diffusion variant that diffuses frequency components at different rates, enabling progressive deblurring and denoising without extra upsamplers. Key contributions include a tractable forward model, a blur diffusion specialization, a denoising score matching training objective, and a reverse-time sampler, with empirical improvements in FID on LSUN 64×64. The approach offers a principled way to inject image-specific inductive bias into diffusion models and shows promise for scaling to higher resolutions and broader tasks.

Abstract

Recently, diffusion models have shown remarkable results in image synthesis by gradually removing noise and amplifying signals. Although the simple generative process surprisingly works well, is this the best way to generate image data? For instance, despite the fact that human perception is more sensitive to the low frequencies of an image, diffusion models themselves do not consider any relative importance of each frequency component. Therefore, to incorporate the inductive bias for image data, we propose a novel generative process that synthesizes images in a coarse-to-fine manner. First, we generalize the standard diffusion models by enabling diffusion in a rotated coordinate system with different velocities for each component of the vector. We further propose a blur diffusion as a special case, where each frequency component of an image is diffused at different speeds. Specifically, the proposed blur diffusion consists of a forward process that blurs an image and adds noise gradually, after which a corresponding reverse process deblurs an image and removes noise progressively. Experiments show that the proposed model outperforms the previous method in FID on LSUN bedroom and church datasets. Code is available at https://github.com/sangyun884/blur-diffusion.
Paper Structure (19 sections, 1 theorem, 25 equations, 4 figures, 1 table)

This paper contains 19 sections, 1 theorem, 25 equations, 4 figures, 1 table.

Key Result

Proposition 1

Let ${\mathbf B}_i = {\mathbf I} - (1-\beta_i) {\mathbf D}^{2f(i)}$ and ${\mathbf U} = \tilde{{\mathbf U}}$. Then, eq:gen_diff is equivalent to eq:blur_diff.

Figures (4)

  • Figure 1: Reverse generative processes of two different diffusion models. (a) Previous diffusion models generate images by gradually strengthening signals. (b) The proposed method synthesizes images through progressive deblurring in a coarse-to-fine manner.
  • Figure 2: Results on LSUN-bedroom 64 $\times$ 64. f_type : log (left), f_type : quartic (right).
  • Figure 3: Comparison of generated images with different generation strategies. Left: fine-to-coarse, right: coarse-to-fine.
  • Figure 4: Different functional forms of blur schedule $f(i)$ we experimented with.

Theorems & Definitions (1)

  • Proposition 1