Table of Contents
Fetching ...

Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise

Zhenning Shi, Haoshuai Zheng, Chen Xu, Changsheng Dong, Bin Pan, Xueshuo Xie, Along He, Tao Li, Huazhu Fu

TL;DR

This work proposed Resfusion, a general framework that incorporates the residual term into the diffusion forward process, starting the reverse process directly from the noisy degraded images, and maintains the integrity of existing noise schedules, unifying the training and inference processes.

Abstract

Recently, research on denoising diffusion models has expanded its application to the field of image restoration. Traditional diffusion-based image restoration methods utilize degraded images as conditional input to effectively guide the reverse generation process, without modifying the original denoising diffusion process. However, since the degraded images already include low-frequency information, starting from Gaussian white noise will result in increased sampling steps. We propose Resfusion, a general framework that incorporates the residual term into the diffusion forward process, starting the reverse process directly from the noisy degraded images. The form of our inference process is consistent with the DDPM. We introduced a weighted residual noise, named resnoise, as the prediction target and explicitly provide the quantitative relationship between the residual term and the noise term in resnoise. By leveraging a smooth equivalence transformation, Resfusion determine the optimal acceleration step and maintains the integrity of existing noise schedules, unifying the training and inference processes. The experimental results demonstrate that Resfusion exhibits competitive performance on ISTD dataset, LOL dataset and Raindrop dataset with only five sampling steps. Furthermore, Resfusion can be easily applied to image generation and emerges with strong versatility. Our code and model are available at https://github.com/nkicsl/Resfusion.

Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise

TL;DR

This work proposed Resfusion, a general framework that incorporates the residual term into the diffusion forward process, starting the reverse process directly from the noisy degraded images, and maintains the integrity of existing noise schedules, unifying the training and inference processes.

Abstract

Recently, research on denoising diffusion models has expanded its application to the field of image restoration. Traditional diffusion-based image restoration methods utilize degraded images as conditional input to effectively guide the reverse generation process, without modifying the original denoising diffusion process. However, since the degraded images already include low-frequency information, starting from Gaussian white noise will result in increased sampling steps. We propose Resfusion, a general framework that incorporates the residual term into the diffusion forward process, starting the reverse process directly from the noisy degraded images. The form of our inference process is consistent with the DDPM. We introduced a weighted residual noise, named resnoise, as the prediction target and explicitly provide the quantitative relationship between the residual term and the noise term in resnoise. By leveraging a smooth equivalence transformation, Resfusion determine the optimal acceleration step and maintains the integrity of existing noise schedules, unifying the training and inference processes. The experimental results demonstrate that Resfusion exhibits competitive performance on ISTD dataset, LOL dataset and Raindrop dataset with only five sampling steps. Furthermore, Resfusion can be easily applied to image generation and emerges with strong versatility. Our code and model are available at https://github.com/nkicsl/Resfusion.
Paper Structure (22 sections, 24 equations, 18 figures, 8 tables, 2 algorithms)

This paper contains 22 sections, 24 equations, 18 figures, 8 tables, 2 algorithms.

Figures (18)

  • Figure 1: The proposed Resfusion is a general framework for image restoration and can be easily expand to image generation (setting $\hat{x}_{0}=0$). We introduce the residual term ($R = \hat{x}_{0}-x_{0}$) into the forward process, redefine $q(x_{t} | x_{t-1})$ to $q(x_{t} | x_{t-1}, R)$ (as shown by the $\color{orange}{orange}$ arrow), and name this diffusion process as resnoise diffusion. Through employing a novel technique called "smooth equivalence transformation", we can directly use the degraded image $\hat{x}_{0}$ to obtain $x_{T'}$ (as shown by the $\color{blue}{blue}$ arrow). We bridge the gap between the input image and ground truth, unifying the training and inference processes.
  • Figure 2: The working principle of Resfusion. ${x}_{0}$ represents the distribution of the ground truth, while $\hat{x}_{0}$ represents the distribution of the degraded images. $\hat{x}_{0} - {x}_{0}$ represents the gap between them, defined as the residual term $R$ in Eq. \ref{['eq: residual term']}. Resfusion does not explicitly guide $\hat{x}_{0}$ to ${x}_{0}$. Instead, it implicitly learns the distribution of $R$ by doing resnoise-diffusion reverse process from $x_{t}$ to $x_{0}$. The resnoise-diffusion reverse process can be imagined as doing diffusion reverse process from $R+\epsilon$ to $x_{0}$ (as shown by the $\color{violet}{violet}$ arrow), guiding ${x}_{t}$ gradually towards ${x}_{0}$ along this direction. Following the principles of similar triangles, the coefficient of $R$ at step $t$ is computed as $1-\sqrt{\overline{\alpha}_{t}}$. At any step $t$ during the training process, ${x}_{t}$ can be calculated based on ${x}_{0}$ and $R$ through Eq. \ref{['eq: x_t_resnoise']}.
  • Figure 3: Visual comparisons of the restored results by different shadow-removal methods on the ISTD dataset.
  • Figure 4: Visual comparisons of the restored results by different image restoration methods on the LOL dataset and the Raindrop dataset.
  • Figure 5: The analysis of the residual term and the noise term on the LOL dataset. Only removing noise will reconstruct the details of the degraded image without causing any semantic shift. Only removing residual can only accomplish the semantic shift (from low-light to normal-light) without reconstructing the details. Removing resnoise can achieve both the semantic shift and the detail reconstruction.
  • ...and 13 more figures