Table of Contents
Fetching ...

WaveDM: Wavelet-Based Diffusion Models for Image Restoration

Yi Huang, Jiancheng Huang, Jianzhuang Liu, Mingfu Yan, Yu Dong, Jiaxi Lv, Chaoqi Chen, Shifeng Chen

TL;DR

This work addresses the slow inference of diffusion-based image restoration by transferring diffusion modeling to the wavelet domain, where input size is reduced and frequency content is separated. WaveDM uses a low-frequency diffusion process conditioned on degraded-wavelet spectra, complemented by a lightweight High Frequency Refinement Module, and employs Efficient Conditional Sampling to reduce the total steps to around $5$. The approach achieves state-of-the-art results on twelve restoration tasks while matching the speed of traditional one-pass methods and exceeding $100\times$ speedups over vanilla diffusion models. The method demonstrates strong generalization across diverse degradations and emphasizes practical impact by balancing restoration quality with computational efficiency, though it requires lengthy training and large-scale data for best performance.

Abstract

Latest diffusion-based methods for many image restoration tasks outperform traditional models, but they encounter the long-time inference problem. To tackle it, this paper proposes a Wavelet-Based Diffusion Model (WaveDM). WaveDM learns the distribution of clean images in the wavelet domain conditioned on the wavelet spectrum of degraded images after wavelet transform, which is more time-saving in each step of sampling than modeling in the spatial domain. To ensure restoration performance, a unique training strategy is proposed where the low-frequency and high-frequency spectrums are learned using distinct modules. In addition, an Efficient Conditional Sampling (ECS) strategy is developed from experiments, which reduces the number of total sampling steps to around 5. Evaluations on twelve benchmark datasets including image raindrop removal, rain steaks removal, dehazing, defocus deblurring, demoiréing, and denoising demonstrate that WaveDM achieves state-of-the-art performance with the efficiency that is comparable to traditional one-pass methods and over 100$\times$ faster than existing image restoration methods using vanilla diffusion models.

WaveDM: Wavelet-Based Diffusion Models for Image Restoration

TL;DR

This work addresses the slow inference of diffusion-based image restoration by transferring diffusion modeling to the wavelet domain, where input size is reduced and frequency content is separated. WaveDM uses a low-frequency diffusion process conditioned on degraded-wavelet spectra, complemented by a lightweight High Frequency Refinement Module, and employs Efficient Conditional Sampling to reduce the total steps to around . The approach achieves state-of-the-art results on twelve restoration tasks while matching the speed of traditional one-pass methods and exceeding speedups over vanilla diffusion models. The method demonstrates strong generalization across diverse degradations and emphasizes practical impact by balancing restoration quality with computational efficiency, though it requires lengthy training and large-scale data for best performance.

Abstract

Latest diffusion-based methods for many image restoration tasks outperform traditional models, but they encounter the long-time inference problem. To tackle it, this paper proposes a Wavelet-Based Diffusion Model (WaveDM). WaveDM learns the distribution of clean images in the wavelet domain conditioned on the wavelet spectrum of degraded images after wavelet transform, which is more time-saving in each step of sampling than modeling in the spatial domain. To ensure restoration performance, a unique training strategy is proposed where the low-frequency and high-frequency spectrums are learned using distinct modules. In addition, an Efficient Conditional Sampling (ECS) strategy is developed from experiments, which reduces the number of total sampling steps to around 5. Evaluations on twelve benchmark datasets including image raindrop removal, rain steaks removal, dehazing, defocus deblurring, demoiréing, and denoising demonstrate that WaveDM achieves state-of-the-art performance with the efficiency that is comparable to traditional one-pass methods and over 100 faster than existing image restoration methods using vanilla diffusion models.
Paper Structure (39 sections, 21 equations, 8 figures, 8 tables)

This paper contains 39 sections, 21 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Training of the wavelet-based diffusion model (WaveDM) for image restoration, where $\mathop{\mathrm{\mathbf{X}}}\nolimits_d$ and $\mathop{\mathrm{\mathbf{X}}}\nolimits_0$ stand for a pair of RGB degraded and clean images. $\mathop{\mathrm{\mathbf{x}}}\nolimits_d$ and $\mathop{\mathrm{\mathbf{x}}}\nolimits_0$ are the wavelet spectrum of $\mathop{\mathrm{\mathbf{X}}}\nolimits_d$ and $\mathop{\mathrm{\mathbf{X}}}\nolimits_0$ after the Haar wavelet transform, respectively. $\mathop{\mathrm{\mathbf{x}}}\nolimits_t^l$ is the diffusion result of the low-frequency spectrum $\mathop{\mathrm{\mathbf{x}}}\nolimits_0^l$ extracted from the first three bands of $\mathop{\mathrm{\mathbf{x}}}\nolimits_0$. $\tilde{\mathop{\mathrm{\mathbf{x}}}\nolimits}_0^h$ denotes the high-frequency spectrum of the clean image based on $\mathop{\mathrm{\mathbf{x}}}\nolimits_d$ with the HFRM. $\mathop{\mathrm{\mathbf{x}}}\nolimits_d$, $\tilde{\mathop{\mathrm{\mathbf{x}}}\nolimits}_0^h$ and $\mathop{\mathrm{\mathbf{x}}}\nolimits_t^l$ are concatenated together as input to the noise estimation network $\bm{\epsilon}_{\theta}(\mathop{\mathrm{\mathbf{x}}}\nolimits_t^l,\tilde{\mathop{\mathrm{\mathbf{x}}}\nolimits}_0^h,\mathop{\mathrm{\mathbf{x}}}\nolimits_d,t)$ to predict the noise $\bm{\epsilon}_t^l$ at all time moments.
  • Figure 2: Overview of WaveDM with ECS. $q(\mathop{\mathrm{\mathbf{x}}}\nolimits_t^l\vert\mathop{\mathrm{\mathbf{x}}}\nolimits_{t-1}^l)$ stands for the forward diffusion (dashed line). The sampling process $p_{\theta}(\mathop{\mathrm{\mathbf{x}}}\nolimits_{t-1}^l\vert\mathop{\mathrm{\mathbf{x}}}\nolimits_t^l, \tilde{\mathop{\mathrm{\mathbf{x}}}\nolimits}_0^h, \mathop{\mathrm{\mathbf{x}}}\nolimits_d)$ (solid lines) starts from a standard Gaussian noise $\mathop{\mathrm{\mathbf{x}}}\nolimits_T^l \sim\mathcal{N}(\mathbf{0},\mathop{\mathrm{\mathbf{I}}}\nolimits)$ to generate the low-frequency spectrum of the clean image, where $\mathop{\mathrm{\mathbf{x}}}\nolimits_d$ and $\tilde{\mathop{\mathrm{\mathbf{x}}}\nolimits}_0^h$ serve as conditions (blue solid lines) from step $T$ to step $M$. Then the intermediate result $\mathop{\mathrm{\mathbf{x}}}\nolimits_M^l$ is utilized to predict the low-frequency spectrum $\mathop{\mathrm{\mathbf{x}}}\nolimits_0^l$ of the clean image directly, followed by inverse wavelet transform that turns the concatenation of $\tilde{\mathop{\mathrm{\mathbf{x}}}\nolimits}_0^h$ and $\mathop{\mathrm{\mathbf{x}}}\nolimits_0^l$ into a clean RGB image $\mathop{\mathrm{\mathbf{X}}}\nolimits_0$.
  • Figure 3: Different numbers of the wavelet bands for diffusion. $N_l$ indicates using the 1st to the $N_l$-th bands. $N_h$ indicates using the 48-th to the $N_h$-th bands.
  • Figure 4: Performance evaluation during sampling process using 10 steps of stride 100 on four datasets.The ★ and denote the best PSNR values of the obtained $\mathop{\mathrm{\mathbf{X}}}\nolimits_0$ from Eq. \ref{['eq_ecs_wavedm']} and \ref{['eq_iwt']} and $\mathop{\mathrm{\mathbf{X}}}\nolimits_t$ from Eq. \ref{['eq_ddim_wavedm']} and \ref{['eq_iwt']}, respectively. $t$ represents the current time moment of sampling.
  • Figure 5: Restoration performance comparison between DDIM sampling and ECS under multiple sampling step settings on four datasets. $t$ represents the current time moment of sampling.
  • ...and 3 more figures