Table of Contents
Fetching ...

DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models

Lingshun Kong, Jiawei Zhang, Dongqing Zou, Jimmy Ren, Xiaohe Wu, Jiangxin Dong, Jinshan Pan

TL;DR

The paper addresses real-world image deblurring by leveraging pre-trained Stable Diffusion priors without conditioning directly on blurred inputs. It introduces a Latent Kernel Prediction Network (LKPN) that learns pixel-wise kernels in latent space and applies them through Element-wise Adaptive Convolution (EAC) to progressively restore structure, with iterative refinement guided by diffusion outputs. Conditioning is implemented via a ControlNet-style branch that fuses LKPN-derived latent guidance with the blurred input, and LKPN is trained jointly with the diffusion model using latent and pixel-space losses. Extensive experiments on synthetic and real-world datasets demonstrate that DeblurDiff achieves superior perceptual quality and structural fidelity compared to state-of-the-art methods, validating robustness to diverse blur patterns.

Abstract

Diffusion models have achieved significant progress in image generation. The pre-trained Stable Diffusion (SD) models are helpful for image deblurring by providing clear image priors. However, directly using a blurry image or pre-deblurred one as a conditional control for SD will either hinder accurate structure extraction or make the results overly dependent on the deblurring network. In this work, we propose a Latent Kernel Prediction Network (LKPN) to achieve robust real-world image deblurring. Specifically, we co-train the LKPN in latent space with conditional diffusion. The LKPN learns a spatially variant kernel to guide the restoration of sharp images in the latent space. By applying element-wise adaptive convolution (EAC), the learned kernel is utilized to adaptively process the input feature, effectively preserving the structural information of the input. This process thereby more effectively guides the generative process of Stable Diffusion (SD), enhancing both the deblurring efficacy and the quality of detail reconstruction. Moreover, the results at each diffusion step are utilized to iteratively estimate the kernels in LKPN to better restore the sharp latent by EAC. This iterative refinement enhances the accuracy and robustness of the deblurring process. Extensive experimental results demonstrate that the proposed method outperforms state-of-the-art image deblurring methods on both benchmark and real-world images.

DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models

TL;DR

The paper addresses real-world image deblurring by leveraging pre-trained Stable Diffusion priors without conditioning directly on blurred inputs. It introduces a Latent Kernel Prediction Network (LKPN) that learns pixel-wise kernels in latent space and applies them through Element-wise Adaptive Convolution (EAC) to progressively restore structure, with iterative refinement guided by diffusion outputs. Conditioning is implemented via a ControlNet-style branch that fuses LKPN-derived latent guidance with the blurred input, and LKPN is trained jointly with the diffusion model using latent and pixel-space losses. Extensive experiments on synthetic and real-world datasets demonstrate that DeblurDiff achieves superior perceptual quality and structural fidelity compared to state-of-the-art methods, validating robustness to diverse blur patterns.

Abstract

Diffusion models have achieved significant progress in image generation. The pre-trained Stable Diffusion (SD) models are helpful for image deblurring by providing clear image priors. However, directly using a blurry image or pre-deblurred one as a conditional control for SD will either hinder accurate structure extraction or make the results overly dependent on the deblurring network. In this work, we propose a Latent Kernel Prediction Network (LKPN) to achieve robust real-world image deblurring. Specifically, we co-train the LKPN in latent space with conditional diffusion. The LKPN learns a spatially variant kernel to guide the restoration of sharp images in the latent space. By applying element-wise adaptive convolution (EAC), the learned kernel is utilized to adaptively process the input feature, effectively preserving the structural information of the input. This process thereby more effectively guides the generative process of Stable Diffusion (SD), enhancing both the deblurring efficacy and the quality of detail reconstruction. Moreover, the results at each diffusion step are utilized to iteratively estimate the kernels in LKPN to better restore the sharp latent by EAC. This iterative refinement enhances the accuracy and robustness of the deblurring process. Extensive experimental results demonstrate that the proposed method outperforms state-of-the-art image deblurring methods on both benchmark and real-world images.

Paper Structure

This paper contains 17 sections, 8 equations, 21 figures, 3 tables, 1 algorithm.

Figures (21)

  • Figure 1: Visual comparison with state-of-the-art image deblurring methods. The results of GAN-based method (b) and diffusion-based method without pretraining (d) still contain significant blur effects. Directly using the blurry image as the conditional input (e) presents significant challenges in effectively extracting structural information. (f) is a method based on pre-trained SD that performs pre-deblurring on the input features, which alters the original information, leading to erratic generation. For (g), it uses the result of the pre-trained FFTformer (c) as the condition. (g) is influenced by the erroneous structures in (c), resulting in generated outputs that retain erroneous artifacts and erroneous structures. In contrast, our approach, guided by the clear structural information provided by LKPN, generates a more distinct and artifact-free image.
  • Figure 2: Iterative results of the diffusion model. The arrow represents the iterative diffusion process. To visualize this process, we decode the features deblurred by the LKPN and EAC to the image space using the VAE decoder in each time step. Using the blurry image directly as a conditional input (c) makes the diffusion model struggle to recover clear structures and fine details in (a). For (d), it uses the result of the pre-trained FFTformer (b) as the condition. However, the deblurring network can introduce incorrect structures, leading to erroneous content generation. In contrast, the proposed LKPN can preserve the input information and restore the structure (e) by EAC, thereby guiding the diffusion model to generate better results (f).
  • Figure 3: Overall architecture of the proposed DeblurDiff. It integrates a Latent Kernel Prediction Network (LKPN) with a generative diffusion model to address the challenges of real-world image deblurring. The LKPN progressively recovers clear structures from blurred images by estimating pixel-specific deblurring kernels at each step of the diffusion process. These kernels are adaptively adjusted based on local content and applied through Element-wise Adaptive Convolution (EAC). The refined clear $z^s$ is used as a condition to guide the diffusion process, enabling the model to effectively preserve the input information and structural integrity.
  • Figure 4: Deblurred results on the DVD dataset dvd. Existing methods struggle to effectively restore clear images. In contrast, our approach not only removes blur but also recovers sharp structures and fine details.
  • Figure 5: Deblurred results on the RWBI dataset dbgan. The structures are not recovered well in (b)-(g). The proposed method generates an image with much clearer structures.
  • ...and 16 more figures