Table of Contents
Fetching ...

Coding-Prior Guided Diffusion Network for Video Deblurring

Yike Liu, Jianhui Zhang, Haipeng Li, Shuaicheng Liu, Bing Zeng

TL;DR

This work tackles video deblurring by exploiting both compression-domain priors (motion vectors and coding residuals) and pretrained diffusion priors. It introduces CPGD-Net, a two-stage framework with CPFP for efficient inter-frame alignment and CPC (via CPControlNet) for guided, detail-rich generation using a diffusion model. Key contributions include the coding-prior feature propagation module, the coding-prior controlled generation module, augmentation of datasets with coding priors, and demonstrated state-of-the-art perceptual quality on GoPro and DVD benchmarks. The results show significant improvements in no-reference IQA metrics and perceptual fidelity, illustrating the practical impact of combining compression-domain information with generative priors for video deblurring.

Abstract

While recent video deblurring methods have advanced significantly, they often overlook two valuable prior information: (1) motion vectors (MVs) and coding residuals (CRs) from video codecs, which provide efficient inter-frame alignment cues, and (2) the rich real-world knowledge embedded in pre-trained diffusion generative models. We present CPGDNet, a novel two-stage framework that effectively leverages both coding priors and generative diffusion priors for high-quality deblurring. First, our coding-prior feature propagation (CPFP) module utilizes MVs for efficient frame alignment and CRs to generate attention masks, addressing motion inaccuracies and texture variations. Second, a coding-prior controlled generation (CPC) module network integrates coding priors into a pretrained diffusion model, guiding it to enhance critical regions and synthesize realistic details. Experiments demonstrate our method achieves state-of-the-art perceptual quality with up to 30% improvement in IQA metrics. Both the code and the codingprior-augmented dataset will be open-sourced.

Coding-Prior Guided Diffusion Network for Video Deblurring

TL;DR

This work tackles video deblurring by exploiting both compression-domain priors (motion vectors and coding residuals) and pretrained diffusion priors. It introduces CPGD-Net, a two-stage framework with CPFP for efficient inter-frame alignment and CPC (via CPControlNet) for guided, detail-rich generation using a diffusion model. Key contributions include the coding-prior feature propagation module, the coding-prior controlled generation module, augmentation of datasets with coding priors, and demonstrated state-of-the-art perceptual quality on GoPro and DVD benchmarks. The results show significant improvements in no-reference IQA metrics and perceptual fidelity, illustrating the practical impact of combining compression-domain information with generative priors for video deblurring.

Abstract

While recent video deblurring methods have advanced significantly, they often overlook two valuable prior information: (1) motion vectors (MVs) and coding residuals (CRs) from video codecs, which provide efficient inter-frame alignment cues, and (2) the rich real-world knowledge embedded in pre-trained diffusion generative models. We present CPGDNet, a novel two-stage framework that effectively leverages both coding priors and generative diffusion priors for high-quality deblurring. First, our coding-prior feature propagation (CPFP) module utilizes MVs for efficient frame alignment and CRs to generate attention masks, addressing motion inaccuracies and texture variations. Second, a coding-prior controlled generation (CPC) module network integrates coding priors into a pretrained diffusion model, guiding it to enhance critical regions and synthesize realistic details. Experiments demonstrate our method achieves state-of-the-art perceptual quality with up to 30% improvement in IQA metrics. Both the code and the codingprior-augmented dataset will be open-sourced.

Paper Structure

This paper contains 20 sections, 7 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Our framework integrates video decoding priors with diffusion-based generative priors for video deblurring. From the compressed video bit stream, we extract motion vectors (MVs) and coding residuals (CRs) alongside decoded frames. Comparative results show improved blur removal and detail reconstruction. (Zoom in for best view.)
  • Figure 2: Overview of our CPGD-Net pipeline: Our framework consists of (1) a Coding-Prior Feature Propagation (CPFP) module that aligns features through cascaded CPFA blocks and performs initial restoration, followed by (2) a Coding-Prior Controlled Generation (CPC) module that synthesizes high-quality outputs by conditioning a diffusion model on the stage-one results, coding priors (motion vectors/residuals), and text prompts through a denoising process.
  • Figure 3: Visualization of MVs and CRs. (a) Ground truth image; (b) Motion-blurred image. (c) Motion vector. (d) Coding residual.
  • Figure 4: Detile of the CPFA block. Given features $F_{t-1}$, motion vectors $V_{t-1\rightarrow t}$, and coding residuals $R_{t-1\to t}$: (1) Warps $F_{t-1}$ using $V_{t-1\rightarrow t}$ to obtain $\widetilde{F}_t$;(2) Predict deformable convolution parameters $\{O_{t-1\rightarrow t}, M_{t-1\rightarrow t}\}$ via concatenated $\{V_{t-1\rightarrow t}, R_{t-1\to t}, \widetilde{F}_t\}$; (3) Apply $\mathrm{DCN}(F_{t-1}, O_{t-1\rightarrow t}, M_{t-1\to t})$ and fuse with $F^*_{t-1}$ to output $F^*_t$.
  • Figure 5: At each transformer layer, motion vectors $V$ and coding residuals $R$ are converted to an attention mask $A_i$. The mask modulates the query $Q$ to prioritize blur-sensitive regions.
  • ...and 6 more figures