Table of Contents
Fetching ...

Residual Denoising Diffusion Models

Jiawei Liu, Qiang Wang, Huijie Fan, Yinong Wang, Yandong Tang, Liangqiong Qu

TL;DR

RDDM enables a generic UNet, trained with only an L1 loss and a batch size of 1, to compete with state-of-the-art image restoration methods and demonstrates that the sampling process is consistent with that of DDPM and DDIM through coefficient transformation.

Abstract

We propose residual denoising diffusion models (RDDM), a novel dual diffusion process that decouples the traditional single denoising diffusion process into residual diffusion and noise diffusion. This dual diffusion framework expands the denoising-based diffusion models, initially uninterpretable for image restoration, into a unified and interpretable model for both image generation and restoration by introducing residuals. Specifically, our residual diffusion represents directional diffusion from the target image to the degraded input image and explicitly guides the reverse generation process for image restoration, while noise diffusion represents random perturbations in the diffusion process. The residual prioritizes certainty, while the noise emphasizes diversity, enabling RDDM to effectively unify tasks with varying certainty or diversity requirements, such as image generation and restoration. We demonstrate that our sampling process is consistent with that of DDPM and DDIM through coefficient transformation, and propose a partially path-independent generation process to better understand the reverse process. Notably, our RDDM enables a generic UNet, trained with only an L1 loss and a batch size of 1, to compete with state-of-the-art image restoration methods. We provide code and pre-trained models to encourage further exploration, application, and development of our innovative framework (https://github.com/nachifur/RDDM).

Residual Denoising Diffusion Models

TL;DR

RDDM enables a generic UNet, trained with only an L1 loss and a batch size of 1, to compete with state-of-the-art image restoration methods and demonstrates that the sampling process is consistent with that of DDPM and DDIM through coefficient transformation.

Abstract

We propose residual denoising diffusion models (RDDM), a novel dual diffusion process that decouples the traditional single denoising diffusion process into residual diffusion and noise diffusion. This dual diffusion framework expands the denoising-based diffusion models, initially uninterpretable for image restoration, into a unified and interpretable model for both image generation and restoration by introducing residuals. Specifically, our residual diffusion represents directional diffusion from the target image to the degraded input image and explicitly guides the reverse generation process for image restoration, while noise diffusion represents random perturbations in the diffusion process. The residual prioritizes certainty, while the noise emphasizes diversity, enabling RDDM to effectively unify tasks with varying certainty or diversity requirements, such as image generation and restoration. We demonstrate that our sampling process is consistent with that of DDPM and DDIM through coefficient transformation, and propose a partially path-independent generation process to better understand the reverse process. Notably, our RDDM enables a generic UNet, trained with only an L1 loss and a batch size of 1, to compete with state-of-the-art image restoration methods. We provide code and pre-trained models to encourage further exploration, application, and development of our innovative framework (https://github.com/nachifur/RDDM).
Paper Structure (26 sections, 34 equations, 22 figures, 11 tables, 2 algorithms)

This paper contains 26 sections, 34 equations, 22 figures, 11 tables, 2 algorithms.

Figures (22)

  • Figure 1: Denoising diffusion process - DDPM ho2020denoising (a) and our residual denoising diffusion process (b). For image restoration, we introduce residual diffusion to represent the diffusion direction from the target image to the input image.
  • Figure 2: Decoupled dual diffusion framework. The previous forward diffusion process is decoupled into residual diffusion and noise diffusion, while in the reverse process, the simultaneous sampling can be decoupled into first removing the residuals and then removing noise.
  • Figure 3: The proposed residual denoising diffusion model (RDDM) is a unified framework for image generation and restoration (a shadow removal task is shown here). We introduce residuals ($I_{res}$) in RDDM, redefining the forward diffusion process to involve simultaneous diffusion of residuals and noise. The residuals ($I_{res}=I_{in}-I_0$) diffusion represents the directional diffusion from the target image $I_0$ to the degraded input image $I_{in}$, while the noise ($\epsilon$) diffusion represents the random perturbations in the diffusion process. In RDDM, $I_0$ gradually diffuses into $I_T=I_{in}+\epsilon$, $\epsilon \sim \mathcal{N} (\mathbf{0} ,\mathbf{I})$. In the third columns, $I_T$ is a purely noisy image for image generation since $I_{in}=0$, and a noise-carrying degraded image for image restoration as $I_{in}$ is the degraded image.
  • Figure 4: Coefficient transformation from DDIM song2020denoising to RDDM using Eq. \ref{['Eq:19']}. (a) We show several schedules for $\alpha^t_{DDIM}$, e.g., linear song2020denoising, scaled linear rombach2022high, and squared cosine nichol2021improved. (b) We transform $\alpha_{DDIM}^t$ into $\alpha_t$ in our RDDM. (c) We transform $\alpha_{DDIM}^t$ into $\beta_t^2$ in our RDDM. (d) A few simple schedules. (e) $P(x,a)$ is a normalized power function (see Eq. \ref{['Eq:20']}). "mean", "linearly increasing", and "linearly decreasing" in (d) can be denoted as $P(x,0)$, $P(x,1)$ and $P(1-x,1)$, respectively. See Algorithm \ref{['Algorithm1']} in Appendix \ref{['Appendix:a.3']} for more details of (b) and (c).
  • Figure 5: Analysis of readjusting coefficient schedules. We find that changing the $\alpha_t$ schedule barely affects the denoising process in (f) and edited faces may have higher face scores when assessed using AI face scoring software. These images were generated using a pre-trained UNet on the CelebA ($256\times 256$) dataset liu2015faceattributes with 10 sampling steps.
  • ...and 17 more figures