Table of Contents
Fetching ...

Blind Video Super-Resolution based on Implicit Kernels

Qiang Zhu, Yuxuan Jiang, Shuyuan Zhu, Fan Zhang, David Bull, Bing Zeng

TL;DR

This work tackles blind video super-resolution under unknown degradations by modeling spatially varying blur with an INR-based, multi-scale kernel dictionary. A novel recurrent Transformer predicts per-pixel kernel coefficients used in Implicit Spatial Correction and Implicit Temporal Alignment to jointly correct frames and align temporal features. The approach, BVSR-IK, demonstrates consistent PSNR improvements over state-of-the-art methods across Gaussian and realistic motion blur on REDS4, Vid4, and UDM10, and ablations confirm the importance of ISC, ITA, and the recurrent design. By enabling scale-aware, spatially varying deblurring and alignment, it offers a practical improvement for BVSR in real-world degraded videos and provides code for reproducibility.

Abstract

Blind video super-resolution (BVSR) is a low-level vision task which aims to generate high-resolution videos from low-resolution counterparts in unknown degradation scenarios. Existing approaches typically predict blur kernels that are spatially invariant in each video frame or even the entire video. These methods do not consider potential spatio-temporal varying degradations in videos, resulting in suboptimal BVSR performance. In this context, we propose a novel BVSR model based on Implicit Kernels, BVSR-IK, which constructs a multi-scale kernel dictionary parameterized by implicit neural representations. It also employs a newly designed recurrent Transformer to predict the coefficient weights for accurate filtering in both frame correction and feature alignment. Experimental results have demonstrated the effectiveness of the proposed BVSR-IK, when compared with four state-of-the-art BVSR models on three commonly used datasets, with BVSR-IK outperforming the second best approach, FMA-Net, by up to 0.59 dB in PSNR. Source code will be available at https://github.com/QZ1-boy/BVSR-IK.

Blind Video Super-Resolution based on Implicit Kernels

TL;DR

This work tackles blind video super-resolution under unknown degradations by modeling spatially varying blur with an INR-based, multi-scale kernel dictionary. A novel recurrent Transformer predicts per-pixel kernel coefficients used in Implicit Spatial Correction and Implicit Temporal Alignment to jointly correct frames and align temporal features. The approach, BVSR-IK, demonstrates consistent PSNR improvements over state-of-the-art methods across Gaussian and realistic motion blur on REDS4, Vid4, and UDM10, and ablations confirm the importance of ISC, ITA, and the recurrent design. By enabling scale-aware, spatially varying deblurring and alignment, it offers a practical improvement for BVSR in real-world degraded videos and provides code for reproducibility.

Abstract

Blind video super-resolution (BVSR) is a low-level vision task which aims to generate high-resolution videos from low-resolution counterparts in unknown degradation scenarios. Existing approaches typically predict blur kernels that are spatially invariant in each video frame or even the entire video. These methods do not consider potential spatio-temporal varying degradations in videos, resulting in suboptimal BVSR performance. In this context, we propose a novel BVSR model based on Implicit Kernels, BVSR-IK, which constructs a multi-scale kernel dictionary parameterized by implicit neural representations. It also employs a newly designed recurrent Transformer to predict the coefficient weights for accurate filtering in both frame correction and feature alignment. Experimental results have demonstrated the effectiveness of the proposed BVSR-IK, when compared with four state-of-the-art BVSR models on three commonly used datasets, with BVSR-IK outperforming the second best approach, FMA-Net, by up to 0.59 dB in PSNR. Source code will be available at https://github.com/QZ1-boy/BVSR-IK.

Paper Structure

This paper contains 13 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison between existing BVSR methods bai2024selfpan2021deepxiao2023deepyouk2024fma and our BVSR-IK method. (a) Previous methods predict a single kernel and utilize it to perform super-resolution in an Aggregation manner. (b) BVSR-IK (ours) generates an INR-based multi-scale kernel dictionary and applies it in a Correction-Alignment manner for BVSR.
  • Figure 2: The framework of the BVSR-IK model. The BVSR-IK model consists of three modules, i.e., ISC, ITA, and UP. Each LR video frame is first fed into the ISC module to generate its kernel dictionary and corrected frame. Then, the feature of the corrected frame extracted by residual blocks and the constructed kernel dictionary are fed into the ITA module to achieve the temporal feature alignment through bidirectional propagation. Finally, the aligned feature is fed into the UP module to generate the SR video frame.
  • Figure 3: Visual results on REDS4 nah2019ntire, Vid4 liu2013bayesian and UDM10 yi2019progressive datasets for Gaussian blur and realistic motion blur scenarios.
  • Figure 4: Visualization of ablation studies on BVSR-IK on REDS4 nah2019ntire dataset frame for realistic motion blur scenario.
  • Figure 5: Robustness of BVSR models for noise scenario.