Table of Contents
Fetching ...

Spatial-and-Frequency-aware Restoration method for Images based on Diffusion Models

Kyungsung Lee, Donggyu Lee, Myungjoo Kang

TL;DR

SaFaRI introduces spatial-and-frequency-aware priors into diffusion-based image restoration by replacing pixel-domain fidelity with a transformed fidelity $\lVert \psi(\boldsymbol y)-\psi(\mathbf A \hat{\boldsymbol x}_{0|t}) \rVert_2^2$ that combines bicubic upsampling and Fourier-domain high/low-pass components. The method leverages an injective $\psi$ to decompose Fidelity into spatial, high-frequency, and low-frequency terms, with a theoretical bound ensuring stable conditioning $|p_{\psi,t}(\boldsymbol y|\boldsymbol x_t)-p_\psi(\boldsymbol y|\hat{\boldsymbol x}_{0|t})| \le \frac{1}{\mathrm{e}^{1/2}Z_{\psi} \gamma} L_{\psi} \|\mathbf A\| m_1$, and uses Tweedie's formula to relate $\hat{\boldsymbol x}_{0|t}$ to the score. Empirically, SaFaRI achieves state-of-the-art zero-shot IR performance on ImageNet and FFHQ across inpainting, denoising/deblurring, and super-resolution, surpassing DiffPIR, DPS, PnP-ADMM, and ILVR in LPIPS and FID, with qualitative improvements in texture and boundary fidelity. By enabling perceptual data fidelity in both spatial and spectral domains, SaFaRI offers a practically impactful, training-free boost to image restoration quality, while prompting further theoretical study of the transform-induced perturbations.

Abstract

Diffusion models have recently emerged as a promising framework for Image Restoration (IR), owing to their ability to produce high-quality reconstructions and their compatibility with established methods. Existing methods for solving noisy inverse problems in IR, considers the pixel-wise data-fidelity. In this paper, we propose SaFaRI, a spatial-and-frequency-aware diffusion model for IR with Gaussian noise. Our model encourages images to preserve data-fidelity in both the spatial and frequency domains, resulting in enhanced reconstruction quality. We comprehensively evaluate the performance of our model on a variety of noisy inverse problems, including inpainting, denoising, and super-resolution. Our thorough evaluation demonstrates that SaFaRI achieves state-of-the-art performance on both the ImageNet datasets and FFHQ datasets, outperforming existing zero-shot IR methods in terms of LPIPS and FID metrics.

Spatial-and-Frequency-aware Restoration method for Images based on Diffusion Models

TL;DR

SaFaRI introduces spatial-and-frequency-aware priors into diffusion-based image restoration by replacing pixel-domain fidelity with a transformed fidelity that combines bicubic upsampling and Fourier-domain high/low-pass components. The method leverages an injective to decompose Fidelity into spatial, high-frequency, and low-frequency terms, with a theoretical bound ensuring stable conditioning , and uses Tweedie's formula to relate to the score. Empirically, SaFaRI achieves state-of-the-art zero-shot IR performance on ImageNet and FFHQ across inpainting, denoising/deblurring, and super-resolution, surpassing DiffPIR, DPS, PnP-ADMM, and ILVR in LPIPS and FID, with qualitative improvements in texture and boundary fidelity. By enabling perceptual data fidelity in both spatial and spectral domains, SaFaRI offers a practically impactful, training-free boost to image restoration quality, while prompting further theoretical study of the transform-induced perturbations.

Abstract

Diffusion models have recently emerged as a promising framework for Image Restoration (IR), owing to their ability to produce high-quality reconstructions and their compatibility with established methods. Existing methods for solving noisy inverse problems in IR, considers the pixel-wise data-fidelity. In this paper, we propose SaFaRI, a spatial-and-frequency-aware diffusion model for IR with Gaussian noise. Our model encourages images to preserve data-fidelity in both the spatial and frequency domains, resulting in enhanced reconstruction quality. We comprehensively evaluate the performance of our model on a variety of noisy inverse problems, including inpainting, denoising, and super-resolution. Our thorough evaluation demonstrates that SaFaRI achieves state-of-the-art performance on both the ImageNet datasets and FFHQ datasets, outperforming existing zero-shot IR methods in terms of LPIPS and FID metrics.
Paper Structure (23 sections, 3 theorems, 28 equations, 13 figures, 12 tables, 1 algorithm)

This paper contains 23 sections, 3 theorems, 28 equations, 13 figures, 12 tables, 1 algorithm.

Key Result

Lemma 0

The modified conditional probability $p_{\psi}({\boldsymbol y} | {\boldsymbol x}_0)$ defined as (eq:14) is Lipschitz continuous with respect to ${\boldsymbol x}_0$.

Figures (13)

  • Figure 1: Examples and visual explanations of our method's functionality. (a)-(d): Results of the image restoration tasks: box-type inpainting, random-type inpainting, Gaussian deblurring and super resolution, respectively. (e): The first row illustrates the sequential changes in $A \hat{\boldsymbol x }_{0|t}$ after applying high-pass filtering, leading to the final filtered image of ${\boldsymbol y}$, while the second row presents the low-pass counterparts.
  • Figure 2: The overview of SaFaRI. Starting with the intermediate state ${\boldsymbol x}_t$, we first generate the unconditional prediction $\hat{\boldsymbol x }_{0|t}$ using the diffusion model. Then we obtain the next state ${\boldsymbol x}_{t-1}$ by leveraging the loss guidance terms obtained through bicubic upsampling $\psi_{s}$ with scaling factor $r$, high-pass filter $\psi_H$ and the low-pass filter $\psi_L$.
  • Figure 3: Qualitative results of image restoration. We establish the efficacy of SaFaRI in restoring images across a variety of tasks.
  • Figure 4: The results of SaFaRI, Gaussian blurring under different $\rho_t^H$ configurations. (left) The case $\rho_t^H = 0.25 / \sqrt{{\mathcal{L}}_H}$ (middle) The case $\rho_t^H = 1.25 / \sqrt{{\mathcal{L}}_H}$ (right) Ground Truth.
  • Figure 5: The results of SaFaRI, Gaussian blurring under different $\rho_t^H$ configurations. (left) The case $\rho_t^L = 1.25 / \sqrt{{\mathcal{L}}_L}$ (middle) The case $\rho_t^L = 0.25 / \sqrt{{\mathcal{L}}_L}$ (right) Ground Truth.
  • ...and 8 more figures

Theorems & Definitions (6)

  • Lemma 0
  • Theorem 1
  • Remark 1
  • proof
  • Theorem 1
  • proof