Table of Contents
Fetching ...

Enhancing Neural Video Compression of Static Scenes with Positive-Incentive Noise

Cheng Yuan, Zhenyu Jia, Jiawei Shao, Xuelong Li

TL;DR

This work proposes to incorporate positive-incentive noise into NVC for static scene videos, enabling robust video transmission under adverse network conditions and economic long-term retention of surveillance footage.

Abstract

Static scene videos, such as surveillance feeds and videotelephony streams, constitute a dominant share of storage consumption and network traffic. However, both traditional standardized codecs and neural video compression (NVC) methods struggle to encode these videos efficiently due to inadequate usage of temporal redundancy and severe distribution gaps between training and test data, respectively. While recent generative compression methods improve perceptual quality, they introduce hallucinated details that are unacceptable in authenticity-critical applications. To overcome these limitations, we propose to incorporate positive-incentive noise into NVC for static scene videos, where short-term temporal changes are reinterpreted as positive-incentive noise to facilitate model finetuning. By disentangling transient variations from the persistent background, structured prior information is internalized in the compression model. During inference, the invariant component requires minimal signaling, thus reducing data transmission while maintaining pixel-level fidelity. Preliminary experiments demonstrate a 73% Bjøntegaard delta (BD) rate saving compared to general NVC models. Our method provides an effective solution to trade computation for bandwidth, enabling robust video transmission under adverse network conditions and economic long-term retention of surveillance footage.

Enhancing Neural Video Compression of Static Scenes with Positive-Incentive Noise

TL;DR

This work proposes to incorporate positive-incentive noise into NVC for static scene videos, enabling robust video transmission under adverse network conditions and economic long-term retention of surveillance footage.

Abstract

Static scene videos, such as surveillance feeds and videotelephony streams, constitute a dominant share of storage consumption and network traffic. However, both traditional standardized codecs and neural video compression (NVC) methods struggle to encode these videos efficiently due to inadequate usage of temporal redundancy and severe distribution gaps between training and test data, respectively. While recent generative compression methods improve perceptual quality, they introduce hallucinated details that are unacceptable in authenticity-critical applications. To overcome these limitations, we propose to incorporate positive-incentive noise into NVC for static scene videos, where short-term temporal changes are reinterpreted as positive-incentive noise to facilitate model finetuning. By disentangling transient variations from the persistent background, structured prior information is internalized in the compression model. During inference, the invariant component requires minimal signaling, thus reducing data transmission while maintaining pixel-level fidelity. Preliminary experiments demonstrate a 73% Bjøntegaard delta (BD) rate saving compared to general NVC models. Our method provides an effective solution to trade computation for bandwidth, enabling robust video transmission under adverse network conditions and economic long-term retention of surveillance footage.
Paper Structure (6 sections, 2 figures)

This paper contains 6 sections, 2 figures.

Figures (2)

  • Figure 1: RD performance comparison of the SSF model before and after finetuning. The proposed approach achieves a significant BD-rate reduction of 73%, demonstrating the efficacy of positive-incentive noise in training video compression models.
  • Figure 2: Visual comparison of video frames reconstructed by H264 and SSF models before and after finetuning at a BPP of 0.2. The proposed approach based on positive-incentive noise enhances the PSNR quality from 38.70 dB to 46.27 dB under nearly identical data rates.