DIVD: Deblurring with Improved Video Diffusion Model

Haoyang Long; Yan Wang; Wendong Wang

DIVD: Deblurring with Improved Video Diffusion Model

Haoyang Long, Yan Wang, Wendong Wang

TL;DR

This work tackles video deblurring by reframing it as a conditional diffusion problem and introducing two novel components: Window-based Temporal Self-Attention (WTSA) for parallel, windowed processing of multiple frames, and Multi-frame Relative Positional Encoding (MRPE) to provide complete temporal-spatial positional information. The combination enables implicit alignment and fusion of misaligned adjacent frames, yielding state-of-the-art perceptual quality while preserving detail and maintaining competitive distortion metrics. Extensive experiments on GOPRO and DVD demonstrate strong performance on perceptual metrics such as LPIPS, FID, and KID, with ablations validating the contribution of WTSA and MRPE. The approach highlights the importance of perceptual evaluation in image restoration and offers a scalable, diffusion-based solution for high-fidelity video deblurring, albeit with slower inference and a gap in PSNR compared to the current SOTA.

Abstract

Video deblurring presents a considerable challenge owing to the complexity of blur, which frequently results from a combination of camera shakes, and object motions. In the field of video deblurring, many previous works have primarily concentrated on distortion-based metrics, such as PSNR. However, this approach often results in a weak correlation with human perception and yields reconstructions that lack realism. Diffusion models and video diffusion models have respectively excelled in the fields of image and video generation, particularly achieving remarkable results in terms of image authenticity and realistic perception. However, due to the computational complexity and challenges inherent in adapting diffusion models, there is still uncertainty regarding the potential of video diffusion models in video deblurring tasks. To explore the viability of video diffusion models in the task of video deblurring, we introduce a diffusion model specifically for this purpose. In this field, leveraging highly correlated information between adjacent frames and addressing the challenge of temporal misalignment are crucial research directions. To tackle these challenges, many improvements based on the video diffusion model are introduced in this work. As a result, our model outperforms existing models and achieves state-of-the-art results on a range of perceptual metrics. Our model preserves a significant amount of detail in the images while maintaining competitive distortion metrics. Furthermore, to the best of our knowledge, this is the first time the diffusion model has been applied in video deblurring to overcome the limitations mentioned above.

DIVD: Deblurring with Improved Video Diffusion Model

TL;DR

Abstract

DIVD: Deblurring with Improved Video Diffusion Model

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)