Table of Contents
Fetching ...

FCVSR: A Frequency-aware Method for Compressed Video Super-Resolution

Qiang Zhu, Fan Zhang, Feiyu Chen, Shuyuan Zhu, David Bull, Bing Zeng

TL;DR

FCVSR tackles compressed video super-resolution by leveraging frequency-domain information through a motion-guided adaptive alignment (MGAA) and a multi-frequency feature refinement (MFFR). A frequency-aware loss, combining spatial and contrastive components, guides the restoration of fine high-frequency details. The key contributions—MGAA for motion-aware frequency-domain alignment, MFFR for subband-specific refinement, and the frequency-aware contrastive loss—collectively yield improved PSNR/SSIM/VMAF while maintaining low to moderate complexity. This approach offers practical benefits for improving the quality of compressed videos in real-world pipelines, especially where decoding artifacts and motion dynamics are challenging.

Abstract

Compressed video super-resolution (SR) aims to generate high-resolution (HR) videos from the corresponding low-resolution (LR) compressed videos. Recently, some compressed video SR methods attempt to exploit the spatio-temporal information in the frequency domain, showing great promise in super-resolution performance. However, these methods do not differentiate various frequency subbands spatially or capture the temporal frequency dynamics, potentially leading to suboptimal results. In this paper, we propose a deep frequency-based compressed video SR model (FCVSR) consisting of a motion-guided adaptive alignment (MGAA) network and a multi-frequency feature refinement (MFFR) module. Additionally, a frequency-aware contrastive loss is proposed for training FCVSR, in order to reconstruct finer spatial details. The proposed model has been evaluated on three public compressed video super-resolution datasets, with results demonstrating its effectiveness when compared to existing works in terms of super-resolution performance (up to a 0.14dB gain in PSNR over the second-best model) and complexity.

FCVSR: A Frequency-aware Method for Compressed Video Super-Resolution

TL;DR

FCVSR tackles compressed video super-resolution by leveraging frequency-domain information through a motion-guided adaptive alignment (MGAA) and a multi-frequency feature refinement (MFFR). A frequency-aware loss, combining spatial and contrastive components, guides the restoration of fine high-frequency details. The key contributions—MGAA for motion-aware frequency-domain alignment, MFFR for subband-specific refinement, and the frequency-aware contrastive loss—collectively yield improved PSNR/SSIM/VMAF while maintaining low to moderate complexity. This approach offers practical benefits for improving the quality of compressed videos in real-world pipelines, especially where decoding artifacts and motion dynamics are challenging.

Abstract

Compressed video super-resolution (SR) aims to generate high-resolution (HR) videos from the corresponding low-resolution (LR) compressed videos. Recently, some compressed video SR methods attempt to exploit the spatio-temporal information in the frequency domain, showing great promise in super-resolution performance. However, these methods do not differentiate various frequency subbands spatially or capture the temporal frequency dynamics, potentially leading to suboptimal results. In this paper, we propose a deep frequency-based compressed video SR model (FCVSR) consisting of a motion-guided adaptive alignment (MGAA) network and a multi-frequency feature refinement (MFFR) module. Additionally, a frequency-aware contrastive loss is proposed for training FCVSR, in order to reconstruct finer spatial details. The proposed model has been evaluated on three public compressed video super-resolution datasets, with results demonstrating its effectiveness when compared to existing works in terms of super-resolution performance (up to a 0.14dB gain in PSNR over the second-best model) and complexity.

Paper Structure

This paper contains 23 sections, 26 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Illustration of performance-complexity trade-offs for different compressed VSR models. It can be observed that the proposed FCVSR model offers better super-resolution performance with lower complexity compared to benchmark methods.
  • Figure 2: The architecture of the FCVSR model. A compressed LR video is fed into a convolution layer, MGAA, MFFR, and reconstruction (REC) modules to generate an HR video.
  • Figure 3: The architecture of motion-guided adaptive alignment (MGAA) module. The set of features $\left\{\mathcal{F}_{i}\right\}_{i=t-3}^{t-1}$ are divided into the forward set $\left\{\mathcal{F}_{i}\right\}_{i=t-3}^{t-2}$ and the backward set $\left\{\mathcal{F}_{i}\right\}_{i=t-2}^{t-1}$ for feature alignment. For the forward branch, the forward set $\left\{\mathcal{F}_{i}\right\}_{i=t-3}^{t-2}$ is first sent into the Motion Estimator (ME) module to generate the motion offsets $\emph{O}_{t-2} = \left\{{o}_{n}\right\}_{n=1}^{N}$. Besides, feature $\mathcal{F}_{t-2}$ is fed into the Kernel Predictor (KP) module to obtain the kernel set $\mathbf{K}=\left\{\mathbf{K}_n\right\}_{n=1}^N$. The motion offsets $\emph{O}_{t-2}$ and the kernel set $\mathbf{K}$ are utilized in the motion-guided adaptive convolution (MGAC) layer to achieve the feature alignment. Finally, the forwardly aligned feature $\bar{\mathcal{F}}^{f}_{t-2}$ and backwardly aligned feature $\bar{\mathcal{F}}^{b}_{t-2}$ are concatenated to obtain the output feature $\bar{\mathcal{F}}_{t-2}$.
  • Figure 4: The architecture of the multi-frequency feature refinement (MFFR) module.
  • Figure 5: Visualization of input feature, output feature, decoupled features, enhanced features in the MFFR module.
  • ...and 3 more figures