Table of Contents
Fetching ...

Hierarchical Integration Diffusion Model for Realistic Image Deblurring

Zheng Chen, Yulun Zhang, Ding Liu, Bin Xia, Jinjin Gu, Linghe Kong, Xin Yuan

TL;DR

This work tackles the computational burden and misalignment issues of diffusion-model-based image deblurring by performing diffusion in a highly compact latent space to generate a priors feature. A two-stage training pipeline is proposed: Stage One compresses ground-truth information into a latent prior via a latent encoder, and Stage Two trains a latent diffusion model to produce a refined prior feature that guides a Restormer-based Transformer through a Hierarchical Integration Module (HIM) with multi-scale fusion. The resulting HI-Diff combines latent-space diffusion with a regression-based deblurring backbone, achieving state-of-the-art performance on synthetic and real-world blur datasets while using significantly fewer resources than full-space diffusion. The approach yields sharper textures, better detail restoration, and robust generalization, offering a practical path toward high-fidelity deblurring in real applications. Evaluation across GoPro, HIDE, RealBlur, and RWBI demonstrates substantial PSNR/SSIM gains and favorable efficiency trade-offs.

Abstract

Diffusion models (DMs) have recently been introduced in image deblurring and exhibited promising performance, particularly in terms of details reconstruction. However, the diffusion model requires a large number of inference iterations to recover the clean image from pure Gaussian noise, which consumes massive computational resources. Moreover, the distribution synthesized by the diffusion model is often misaligned with the target results, leading to restrictions in distortion-based metrics. To address the above issues, we propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Specifically, we perform the DM in a highly compacted latent space to generate the prior feature for the deblurring process. The deblurring process is implemented by a regression-based method to obtain better distortion accuracy. Meanwhile, the highly compact latent space ensures the efficiency of the DM. Furthermore, we design the hierarchical integration module to fuse the prior into the regression-based model from multiple scales, enabling better generalization in complex blurry scenarios. Comprehensive experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods. Code and trained models are available at https://github.com/zhengchen1999/HI-Diff.

Hierarchical Integration Diffusion Model for Realistic Image Deblurring

TL;DR

This work tackles the computational burden and misalignment issues of diffusion-model-based image deblurring by performing diffusion in a highly compact latent space to generate a priors feature. A two-stage training pipeline is proposed: Stage One compresses ground-truth information into a latent prior via a latent encoder, and Stage Two trains a latent diffusion model to produce a refined prior feature that guides a Restormer-based Transformer through a Hierarchical Integration Module (HIM) with multi-scale fusion. The resulting HI-Diff combines latent-space diffusion with a regression-based deblurring backbone, achieving state-of-the-art performance on synthetic and real-world blur datasets while using significantly fewer resources than full-space diffusion. The approach yields sharper textures, better detail restoration, and robust generalization, offering a practical path toward high-fidelity deblurring in real applications. Evaluation across GoPro, HIDE, RealBlur, and RWBI demonstrates substantial PSNR/SSIM gains and favorable efficiency trade-offs.

Abstract

Diffusion models (DMs) have recently been introduced in image deblurring and exhibited promising performance, particularly in terms of details reconstruction. However, the diffusion model requires a large number of inference iterations to recover the clean image from pure Gaussian noise, which consumes massive computational resources. Moreover, the distribution synthesized by the diffusion model is often misaligned with the target results, leading to restrictions in distortion-based metrics. To address the above issues, we propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Specifically, we perform the DM in a highly compacted latent space to generate the prior feature for the deblurring process. The deblurring process is implemented by a regression-based method to obtain better distortion accuracy. Meanwhile, the highly compact latent space ensures the efficiency of the DM. Furthermore, we design the hierarchical integration module to fuse the prior into the regression-based model from multiple scales, enabling better generalization in complex blurry scenarios. Comprehensive experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods. Code and trained models are available at https://github.com/zhengchen1999/HI-Diff.
Paper Structure (15 sections, 7 equations, 5 figures, 4 tables)

This paper contains 15 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The overall framework of our HI-Diff. (a) Transformer, adopts hierarchical encoder-decoder architecture, equipped with HIM, for the deblurring process. (b) Diffusion Model, is performed in a highly compact latent space for computational efficiency. (c) The multi-scale prior feature $\{\mathbf{z}_1, \mathbf{z}_2, \mathbf{z}_3\}$ is obtained by downsampling the prior feature multiple times. In stage one, $\mathbf{z}_1$$=$$\mathbf{z}$; in stage two, $\mathbf{z}_1$$=$$\hat{\mathbf{z}}$. (d) The hierarchical integration module (HIM), calculates cross-attention between the intermediate feature of Transformer and the multi-scale prior feature. (e) The latent encoder (LE), where the size of the input feature ($in$) is $H$$\times$$W$$\times$6 for ${\rm LE}$, and $H$$\times$$W$$\times$3 for ${\rm LE_{DM}}$.
  • Figure 2: Deblurred samples for different models in Tab. \ref{['tab:ablation']}. The first row shows effects of diffusion prior, while the second row exhibits effects of hierarchical integration.
  • Figure 3: Ablation study of the number of iterations $T$ in diffusion model, $T$: $\{1,2,4,8,16,32\}$.
  • Figure 4: Visual comparison on GoPro nah2017deep, HIDE shen2019human, RealBlur rim2020real, and RWBI zhang2020deblurring datasets. RWBI only contains blurry images are captured with different hand-held devices. Models are trained only on the GoPro dataset. Our HI-Diff generates images with clearer details.
  • Figure 5: Visual comparison on the RealBlur rim2020real dataset. Models are trained on the RealBlur dataset.