Table of Contents
Fetching ...

Plug-and-Play Versatile Compressed Video Enhancement

Huimin Zeng, Jiacheng Li, Zhiwei Xiong

TL;DR

This work tackles the challenge of visual quality degradation in compressed videos across varied compression levels while supporting multiple downstream vision tasks. It introduces a codec-aware enhancement framework with two networks: Compression-Aware Adaptation (CAA) that hierarchically adapts enhancement parameters conditioned on $CRF_s$ and $CRF_i$, and Bitstream-Aware Enhancement (BAE) that leverages motion vectors and partition maps for motion alignment and region-aware refinement. The method demonstrates superior quality enhancement (e.g., PSNR gains up to ~1.2 dB at $CRF_s=15$) and broad versatility across tasks such as video super-resolution, optical flow estimation, video object segmentation, and inpainting, while maintaining competitive computational efficiency. By reusing existing codec information, the approach provides a practical, plug-and-play solution for real-world pipelines, enabling robust performance on compressed videos without a dedicated model per compression setting. This has significant implications for real-time cloud-based analytics and downstream vision systems operating on compressed streams.

Abstract

As a widely adopted technique in data transmission, video compression effectively reduces the size of files, making it possible for real-time cloud computing. However, it comes at the cost of visual quality, posing challenges to the robustness of downstream vision models. In this work, we present a versatile codec-aware enhancement framework that reuses codec information to adaptively enhance videos under different compression settings, assisting various downstream vision tasks without introducing computation bottleneck. Specifically, the proposed codec-aware framework consists of a compression-aware adaptation (CAA) network that employs a hierarchical adaptation mechanism to estimate parameters of the frame-wise enhancement network, namely the bitstream-aware enhancement (BAE) network. The BAE network further leverages temporal and spatial priors embedded in the bitstream to effectively improve the quality of compressed input frames. Extensive experimental results demonstrate the superior quality enhancement performance of our framework over existing enhancement methods, as well as its versatility in assisting multiple downstream tasks on compressed videos as a plug-and-play module. Code and models are available at https://huimin-zeng.github.io/PnP-VCVE/.

Plug-and-Play Versatile Compressed Video Enhancement

TL;DR

This work tackles the challenge of visual quality degradation in compressed videos across varied compression levels while supporting multiple downstream vision tasks. It introduces a codec-aware enhancement framework with two networks: Compression-Aware Adaptation (CAA) that hierarchically adapts enhancement parameters conditioned on and , and Bitstream-Aware Enhancement (BAE) that leverages motion vectors and partition maps for motion alignment and region-aware refinement. The method demonstrates superior quality enhancement (e.g., PSNR gains up to ~1.2 dB at ) and broad versatility across tasks such as video super-resolution, optical flow estimation, video object segmentation, and inpainting, while maintaining competitive computational efficiency. By reusing existing codec information, the approach provides a practical, plug-and-play solution for real-world pipelines, enabling robust performance on compressed videos without a dedicated model per compression setting. This has significant implications for real-time cloud-based analytics and downstream vision systems operating on compressed streams.

Abstract

As a widely adopted technique in data transmission, video compression effectively reduces the size of files, making it possible for real-time cloud computing. However, it comes at the cost of visual quality, posing challenges to the robustness of downstream vision models. In this work, we present a versatile codec-aware enhancement framework that reuses codec information to adaptively enhance videos under different compression settings, assisting various downstream vision tasks without introducing computation bottleneck. Specifically, the proposed codec-aware framework consists of a compression-aware adaptation (CAA) network that employs a hierarchical adaptation mechanism to estimate parameters of the frame-wise enhancement network, namely the bitstream-aware enhancement (BAE) network. The BAE network further leverages temporal and spatial priors embedded in the bitstream to effectively improve the quality of compressed input frames. Extensive experimental results demonstrate the superior quality enhancement performance of our framework over existing enhancement methods, as well as its versatility in assisting multiple downstream tasks on compressed videos as a plug-and-play module. Code and models are available at https://huimin-zeng.github.io/PnP-VCVE/.

Paper Structure

This paper contains 29 sections, 5 equations, 19 figures, 6 tables, 1 algorithm.

Figures (19)

  • Figure 1: The proposed codec-aware enhancement framework reuses codec information to adaptively enhance videos across different compression settings, while assisting in various downstream tasks in a plug-and-play manner.
  • Figure 2: Hierarchical structure of quality adjustment, where frames are divided into multiple groups of pictures (GOP). The Constant Rate Factor (CRF) affects video quality at both sequence and frame levels. An increase in the CRF value indicates a reduction in video quality (e.g., lower PSNR values).
  • Figure 3: The proposed Codec-Aware Enhancement Framework consists of two sub-networks: 1) the Compression-Aware Adaptation (CAA) Network, which hierarchically applies sequence adaptation and frame adaptation to dynamically adjust parameters of the enhancement network; and 2) Bitstream-Aware Enhancement (BAE) Network, which leverages motion vectors to align frames and conducts region-aware refinement to flexibly enhance regions of different complexity.
  • Figure 4: Visualization of $w_n$ against different $CRF_s$, where each expert shows a distinct preference for specific $CRF_s$ values.
  • Figure 5: Visualization of features in region-aware refinement, where $h_i$ and $\hat{h}_i$ indicate the input and output features, respectively. The refined features are denoted in the format of $\mathcal{S}(M_i^{type}, h_i)$.
  • ...and 14 more figures