Table of Contents
Fetching ...

One-Step Diffusion Model for Image Motion-Deblurring

Xiaoyang Liu, Yuquan Wang, Zheng Chen, Jiezhang Cao, He Zhang, Yulun Zhang, Xiaokang Yang

TL;DR

This work introduces a One-Step Diffusion Model for Deblurring (OSDD) that dramatically speeds up diffusion-based image deblurring by collapsing the denoising process to a single step in latent space. A novel Enhanced VAE (eVAE) mitigates fidelity loss from latent compression, while a synthetic high-quality deblurring dataset and a Dynamic Dual-Adapter (DDA) balance real and synthetic knowledge to prevent perceptual collapse. The model demonstrates strong performance on full-reference and no-reference metrics across GoPro and RealBlur datasets, with notable perceptual gains and faster inference than multi-step diffusion methods. The proposed approach offers a practical, generalizable diffusion-based solution for challenging motion blur scenarios and highlights the potential of combining latent diffusion, synthetic data, and adaptive adapters for restoration tasks.

Abstract

Currently, methods for single-image deblurring based on CNNs and transformers have demonstrated promising performance. However, these methods often suffer from perceptual limitations, poor generalization ability, and struggle with heavy or complex blur. While diffusion-based methods can partially address these shortcomings, their multi-step denoising process limits their practical usage. In this paper, we conduct an in-depth exploration of diffusion models in deblurring and propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step, significantly improving inference efficiency while maintaining high fidelity. To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration. Additionally, we construct a high-quality synthetic deblurring dataset to mitigate perceptual collapse and design a dynamic dual-adapter (DDA) to enhance perceptual quality while preserving fidelity. Extensive experiments demonstrate that our method achieves strong performance on both full and no-reference metrics. Our code and pre-trained model will be publicly available at https://github.com/xyLiu339/OSDD.

One-Step Diffusion Model for Image Motion-Deblurring

TL;DR

This work introduces a One-Step Diffusion Model for Deblurring (OSDD) that dramatically speeds up diffusion-based image deblurring by collapsing the denoising process to a single step in latent space. A novel Enhanced VAE (eVAE) mitigates fidelity loss from latent compression, while a synthetic high-quality deblurring dataset and a Dynamic Dual-Adapter (DDA) balance real and synthetic knowledge to prevent perceptual collapse. The model demonstrates strong performance on full-reference and no-reference metrics across GoPro and RealBlur datasets, with notable perceptual gains and faster inference than multi-step diffusion methods. The proposed approach offers a practical, generalizable diffusion-based solution for challenging motion blur scenarios and highlights the potential of combining latent diffusion, synthetic data, and adaptive adapters for restoration tasks.

Abstract

Currently, methods for single-image deblurring based on CNNs and transformers have demonstrated promising performance. However, these methods often suffer from perceptual limitations, poor generalization ability, and struggle with heavy or complex blur. While diffusion-based methods can partially address these shortcomings, their multi-step denoising process limits their practical usage. In this paper, we conduct an in-depth exploration of diffusion models in deblurring and propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step, significantly improving inference efficiency while maintaining high fidelity. To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration. Additionally, we construct a high-quality synthetic deblurring dataset to mitigate perceptual collapse and design a dynamic dual-adapter (DDA) to enhance perceptual quality while preserving fidelity. Extensive experiments demonstrate that our method achieves strong performance on both full and no-reference metrics. Our code and pre-trained model will be publicly available at https://github.com/xyLiu339/OSDD.

Paper Structure

This paper contains 21 sections, 9 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Visual debluring comparison on RealBlur-J. Our model shows robust capabilities of handling complex blurs and restoring clear boundaries, providing a better visual experience.
  • Figure 2: Some samples from the GoPro Nah2017deep test set are shown. At the bottom of each sample, from left to right, we present the high-quality (HQ) image, AdaRevD-L xintm2024AdaRevD, and VAE reconstruction. It is evident that the VAE even fails to accurately reconstruct the HQ image in the deblurring dataset, which significantly limits the upper bound of the entire diffusion model's performance.
  • Figure 3: On the left, we illustrate the design of our enhanced VAE, where features from the encoder are concatenated to the decoder through a convolutional block. On the right, we present the training strategy of the eVAE. Encoder 1 handles the skip-connection features, while Encoder 2 is responsible for encoding the high-quality image.
  • Figure 4: On the left, the specific structure and operational principles of the Dynamic Dual-Adapter (DDA) are illustrated. On the right, we show the pipeline of our latent-diffusion model training. The frozen VAE modules are inherited from the first-stage training(\ref{['sec: vaeft']}).
  • Figure 5: Examples of some randomly generated kernel trajectories and corresponding LQ and HQ images.
  • ...and 4 more figures