Efficient Image Deblurring Networks based on Diffusion Models

Kang Chen; Yuanjie Liu

Efficient Image Deblurring Networks based on Diffusion Models

Kang Chen, Yuanjie Liu

TL;DR

This work tackles high-resolution defocus and motion deblurring under memory constraints by introducing Swintormer, a Sliding Window Transformer that integrates diffusion-model latent priors to guide restoration. It achieves linear-complexity attention via Shifted Windows-Dconv Attention and leverages a latent diffusion model (LDM) to generate prior features $z_0$ in latent space, fed into a diffusion-guided deblurring pipeline trained with both $L_1$ and perceptual losses. The method delivers state-of-the-art results on RealDOF, DPDD, and GoPro datasets, while dramatically reducing per-iteration MACs from 140.35 GMACs to 8.02 GMACs, enabling high-quality deblurring on memory-limited devices. An ablation study confirms the efficacy of diffusion priors, windowing strategies, and a training/inference consistency pre-processing scheme, and code is released for reproducibility at the provided repository.

Abstract

This article presents a sliding window model for defocus deblurring, named Swintormer, which achieves the best performance to date with remarkably low memory usage. This method utilizes a diffusion model to generate latent prior features, aiding in the restoration of more detailed images. Additionally, by adapting the sliding window strategy, it incorporates specialized Transformer blocks to enhance inference efficiency. The adoption of this new approach has led to a substantial reduction in Multiply-Accumulate Operations (MACs) per iteration, drastically cutting down memory requirements. In comparison to the currently leading GRL method, our Swintormer model significantly reduces the computational load that must depend on memory capacity, from 140.35 GMACs to 8.02 GMACs, while improving the Peak Signal-to-Noise Ratio (PSNR) for defocus deblurring from 27.04 dB to 27.07 dB. This innovative technique enables the processing of higher resolution images on memory-limited devices, vastly broadening potential application scenarios. The article wraps up with an ablation study, offering a comprehensive examination of how each network module contributes to the final performance.The source code and model will be available at the following website: https://github.com/bnm6900030/swintormer.

Efficient Image Deblurring Networks based on Diffusion Models

TL;DR

in latent space, fed into a diffusion-guided deblurring pipeline trained with both

and perceptual losses. The method delivers state-of-the-art results on RealDOF, DPDD, and GoPro datasets, while dramatically reducing per-iteration MACs from 140.35 GMACs to 8.02 GMACs, enabling high-quality deblurring on memory-limited devices. An ablation study confirms the efficacy of diffusion priors, windowing strategies, and a training/inference consistency pre-processing scheme, and code is released for reproducibility at the provided repository.

Abstract

Paper Structure (15 sections, 19 equations, 4 figures, 5 tables)

This paper contains 15 sections, 19 equations, 4 figures, 5 tables.

Introduction
Related Works
Image deblurring
Diffusion Model
Method
Shifted Windows-Dconv Attention
Diffusion Model
Inference Strategy
Training Strategy
Inference
Experiments and Analysis
Defocus Deblurring Results
Motion Deblurring Results
Ablation Studies
Conclusion

Figures (4)

Figure 1: The architecture of the Swintormer.
Figure 2: Visual comparison on the RealDOF dataset lee2021iterative.
Figure 3: Visual comparison on the DPDD dataset abuolaim2020defocus.
Figure 4: Visual comparison on the GoPro dataset nah2017deep.

Efficient Image Deblurring Networks based on Diffusion Models

TL;DR

Abstract

Efficient Image Deblurring Networks based on Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)