Table of Contents
Fetching ...

Collaborative Feedback Discriminative Propagation for Video Super-Resolution

Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, Jinshan Pan

TL;DR

This work tackles the artifact-prone nature of alignment in video super-resolution by introducing CFD, a framework that combines discriminative alignment correction (DAC) with collaborative feedback propagation (CFP). DAC adaptively calibrates misaligned features using guidance from shallow frame features to reduce artifact propagation, while CFP jointly leverages forward and backward temporal information via a Feedback ConvGRU and gated collaborative feed-forward blocks for long-range refinement in LR space. CFD is integrated into multiple backbones (BasicVSR, BasicVSR++, and PSRT), yielding substantial PSNR gains on benchmarks like REDS4 and Vimeo-90K, with improved efficiency. The approach is validated through extensive ablations, demonstrating the effectiveness of each component and the practicality of deploying CFD across diverse VSR architectures.

Abstract

The key success of existing video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information, which is usually achieved by a recurrent propagation module with an alignment module. However, inaccurate alignment usually leads to aligned features with significant artifacts, which will be accumulated during propagation and thus affect video restoration. Moreover, propagation modules only propagate the same timestep features forward or backward that may fail in case of complex motion or occlusion, limiting their performance for high-quality frame restoration. To address these issues, we propose a collaborative feedback discriminative (CFD) method to correct inaccurate aligned features and model long -range spatial and temporal information for better video reconstruction. In detail, we develop a discriminative alignment correction (DAC) method to adaptively explore information and reduce the influences of the artifacts caused by inaccurate alignment. Then, we propose a collaborative feedback propagation (CFP) module that employs feedback and gating mechanisms to better explore spatial and temporal information of different timestep features from forward and backward propagation simultaneously. Finally, we embed the proposed DAC and CFP into commonly used VSR networks to verify the effectiveness of our method. Quantitative and qualitative experiments on several benchmarks demonstrate that our method can improve the performance of existing VSR models while maintaining a lower model complexity. The source code and pre-trained models will be available at \url{https://github.com/House-Leo/CFDVSR}.

Collaborative Feedback Discriminative Propagation for Video Super-Resolution

TL;DR

This work tackles the artifact-prone nature of alignment in video super-resolution by introducing CFD, a framework that combines discriminative alignment correction (DAC) with collaborative feedback propagation (CFP). DAC adaptively calibrates misaligned features using guidance from shallow frame features to reduce artifact propagation, while CFP jointly leverages forward and backward temporal information via a Feedback ConvGRU and gated collaborative feed-forward blocks for long-range refinement in LR space. CFD is integrated into multiple backbones (BasicVSR, BasicVSR++, and PSRT), yielding substantial PSNR gains on benchmarks like REDS4 and Vimeo-90K, with improved efficiency. The approach is validated through extensive ablations, demonstrating the effectiveness of each component and the practicality of deploying CFD across diverse VSR architectures.

Abstract

The key success of existing video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information, which is usually achieved by a recurrent propagation module with an alignment module. However, inaccurate alignment usually leads to aligned features with significant artifacts, which will be accumulated during propagation and thus affect video restoration. Moreover, propagation modules only propagate the same timestep features forward or backward that may fail in case of complex motion or occlusion, limiting their performance for high-quality frame restoration. To address these issues, we propose a collaborative feedback discriminative (CFD) method to correct inaccurate aligned features and model long -range spatial and temporal information for better video reconstruction. In detail, we develop a discriminative alignment correction (DAC) method to adaptively explore information and reduce the influences of the artifacts caused by inaccurate alignment. Then, we propose a collaborative feedback propagation (CFP) module that employs feedback and gating mechanisms to better explore spatial and temporal information of different timestep features from forward and backward propagation simultaneously. Finally, we embed the proposed DAC and CFP into commonly used VSR networks to verify the effectiveness of our method. Quantitative and qualitative experiments on several benchmarks demonstrate that our method can improve the performance of existing VSR models while maintaining a lower model complexity. The source code and pre-trained models will be available at \url{https://github.com/House-Leo/CFDVSR}.
Paper Structure (11 sections, 5 equations, 9 figures, 3 tables)

This paper contains 11 sections, 5 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Comparison results of our proposed CFD-BasicVSR, CFD-BasicVSR++ and other methods on the REDS4 dataset nah2019ntire in terms of PSNR, running time and model parameters. Circle sizes indicate the number of parameters. Both our proposed models achieve a better trade-off between efficiency and performance.
  • Figure 2: The overall architecture of our proposed model. Our model consists of a feature extractor, a forward/backward propagation module with the discriminative alignment correction (DAC), a collaborative feedback propagation (CFP) module, and reconstruction module. The DAC uses shallow features $f_{t}$ to explore more details information after feature warping, which corrects the aligned features for propagation. Moreover, as the core components of our CFP, Feedback ConvGRU and gated collaborate feed-forward block (GCFB) bring more temporal interactions between different timestep features from forward and backward propagation simultaneously. Here $s_{t}$ is the optical flow at $t$-th timestep, $h^{\{f,b\}}_{t}$ and $r_{t}$ denote the $t$-th timestep features in forward/backward propagation and CFP, respectively.
  • Figure 3: Effect of the proposed DAC method on VSR. The inaccurate flow estimation and the resampling operation in the spatial warping module cause severe damage to structure and edge information, which is prone to generate artifacts in corresponding regions during feature alignment (see red box in (c)). In contrast, we notice that the shallow features (b) could better preserve such information. This instinctively prompts us to compensate for the information loss using shallow features. With our proposed DAC method, the aligned features can accurately recover the information in this region (see red box in (d)), and the restored VSR result (f) has sharper texture details than (d).
  • Figure 4: Visual comparisons on the REDS4 dataset. (a) Ground truth. (b)-(h) denote the results generated by Bicubic, BasicVSR chan2021basicvsr, BasicVSR++ chan2022basicvsrplusplus, VRT liang2022vrt, TTVSR liu2022ttvsr, CFD-BasicVSR (Ours), and CFD-BasicVSR++ (Ours), respectively. The results in (b)-(f) do not have accurate details information of eyes and mouths. However, the proposed methods (g) and (f) can effectively restore the facial details.
  • Figure 5: Visual comparisons on the Vimeo-T dataset. (a) Ground truth. (b)-(h) denote the results generated by Bicubic, BasicVSR chan2021basicvsr, BasicVSR++ chan2022basicvsrplusplus, VRT liang2022vrt, TTVSR liu2022ttvsr, CFD-BasicVSR (Ours), and CFD-BasicVSR++ (Ours), respectively. The results restored by chan2021basicvsrchan2022basicvsrplusplusliang2022vrtliu2022ttvsr still contain blurring and artifacts. In contrast, the proposed methods are able to accurately restore the upper-right part of the frame.
  • ...and 4 more figures