Table of Contents
Fetching ...

Compressed-Domain-Aware Online Video Super-Resolution

Yuhang Wang, Hai Li, Shujuan Hou, Zhetao Dong, Xiaoyao Yang

TL;DR

A motion-vector-guided deformable alignment module that uses motion vectors for coarse warping and learns only local residual offsets for fine-tuned adjustments, thereby maintaining accuracy while reducing computation in a compressed-domain-aware network for online VSR.

Abstract

In bandwidth-limited online video streaming, videos are usually downsampled and compressed. Although recent online video super-resolution (online VSR) approaches achieve promising results, they are still compute-intensive and fall short of real-time processing at higher resolutions, due to complex motion estimation for alignment and redundant processing of consecutive frames. To address these issues, we propose a compressed-domain-aware network (CDA-VSR) for online VSR, which utilizes compressed-domain information, including motion vectors, residual maps, and frame types to balance quality and efficiency. Specifically, we propose a motion-vector-guided deformable alignment module that uses motion vectors for coarse warping and learns only local residual offsets for fine-tuned adjustments, thereby maintaining accuracy while reducing computation. Then, we utilize a residual map gated fusion module to derive spatial weights from residual maps, suppressing mismatched regions and emphasizing reliable details. Further, we design a frame-type-aware reconstruction module for adaptive compute allocation across frame types, balancing accuracy and efficiency. On the REDS4 dataset, our CDA-VSR surpasses the state-of-the-art method TMP, with a maximum PSNR improvement of 0.13 dB while delivering more than double the inference speed. The code will be released at https://github.com/sspBIT/CDA-VSR.

Compressed-Domain-Aware Online Video Super-Resolution

TL;DR

A motion-vector-guided deformable alignment module that uses motion vectors for coarse warping and learns only local residual offsets for fine-tuned adjustments, thereby maintaining accuracy while reducing computation in a compressed-domain-aware network for online VSR.

Abstract

In bandwidth-limited online video streaming, videos are usually downsampled and compressed. Although recent online video super-resolution (online VSR) approaches achieve promising results, they are still compute-intensive and fall short of real-time processing at higher resolutions, due to complex motion estimation for alignment and redundant processing of consecutive frames. To address these issues, we propose a compressed-domain-aware network (CDA-VSR) for online VSR, which utilizes compressed-domain information, including motion vectors, residual maps, and frame types to balance quality and efficiency. Specifically, we propose a motion-vector-guided deformable alignment module that uses motion vectors for coarse warping and learns only local residual offsets for fine-tuned adjustments, thereby maintaining accuracy while reducing computation. Then, we utilize a residual map gated fusion module to derive spatial weights from residual maps, suppressing mismatched regions and emphasizing reliable details. Further, we design a frame-type-aware reconstruction module for adaptive compute allocation across frame types, balancing accuracy and efficiency. On the REDS4 dataset, our CDA-VSR surpasses the state-of-the-art method TMP, with a maximum PSNR improvement of 0.13 dB while delivering more than double the inference speed. The code will be released at https://github.com/sspBIT/CDA-VSR.
Paper Structure (14 sections, 10 equations, 6 figures, 5 tables)

This paper contains 14 sections, 10 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Comparison between existing online VSR methods and our proposed method. The server downsamples and encodes the source video, and then transmits the compressed stream. The client decodes and performs super-resolution. Our method uses compressed-domain information, including frame type, motion vectors, and residual maps, to guide alignment, fusion, and reconstruction, improving both accuracy and efficiency.
  • Figure 2: PSNR, FPS, and Parameters of different methods on REDS4 for 4$\times$ upscaling at CRF=18.
  • Figure 3: Overall architecture of the proposed Compressed-Domain-Aware VSR framework (CDA-VSR). Given LR frames and compressed-domain information (motion vectors, residual maps, and frame types), CDA-VSR reconstructs the corresponding HR frames through three key modules: (1) the MV-guided Deformable Alignment (MVGDA); (2) the Residual Map Gated Fusion (RMGF); (3) the Frame-Type-Aware Reconstruction (FTAR) with two branches, Fine-grained I-Frame Reconstruction and Fast P-Frame Reconstruction.
  • Figure 4: Qualitative comparison of different online VSR methods on the REDS4 dataset.
  • Figure 5: Feature map visualization of different alignment methods. Three variants (OnlyMV, OnlyDCN, OnlyGL) and our MV-guided deformable alignment (MVGDA) are compared by visualizing the feature maps before and after alignment. For clarity, only the first channel of each feature map is shown.
  • ...and 1 more figures