Table of Contents
Fetching ...

SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information

Feng Wang, Haihang Ruan, Zhihuang Xie, Ronggang Wang, Xiangyu Yue

TL;DR

This paper proposes Single Stream Neural Video Compression, SS-NVC, which implicitly utilizes temporal information to eliminate temporal redundancy in video sequence and can greatly simplify training and compression process of NVC.

Abstract

Recently, Neural Video Compression (NVC) techniques have achieved remarkable performance, even surpassing the best traditional lossy video codec. However, most existing NVC methods heavily rely on transmitting Motion Vector (MV) to generate accurate contextual features, which has the following drawbacks. (1) Compressing and transmitting MV requires specialized MV encoder and decoder, which makes modules redundant. (2) Due to the existence of MV Encoder-Decoder, the training strategy is complex. In this paper, we present a noval Single Stream NVC framework (SSNVC), which removes complex MV Encoder-Decoder structure and uses a one-stage training strategy. SSNVC implicitly use temporal information by adding previous entropy model feature to current entropy model and using previous two frame to generate predicted motion information at the decoder side. Besides, we enhance the frame generator to generate higher quality reconstructed frame. Experiments demonstrate that SSNVC can achieve state-of-the-art performance on multiple benchmarks, and can greatly simplify compression process as well as training process.

SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information

TL;DR

This paper proposes Single Stream Neural Video Compression, SS-NVC, which implicitly utilizes temporal information to eliminate temporal redundancy in video sequence and can greatly simplify training and compression process of NVC.

Abstract

Recently, Neural Video Compression (NVC) techniques have achieved remarkable performance, even surpassing the best traditional lossy video codec. However, most existing NVC methods heavily rely on transmitting Motion Vector (MV) to generate accurate contextual features, which has the following drawbacks. (1) Compressing and transmitting MV requires specialized MV encoder and decoder, which makes modules redundant. (2) Due to the existence of MV Encoder-Decoder, the training strategy is complex. In this paper, we present a noval Single Stream NVC framework (SSNVC), which removes complex MV Encoder-Decoder structure and uses a one-stage training strategy. SSNVC implicitly use temporal information by adding previous entropy model feature to current entropy model and using previous two frame to generate predicted motion information at the decoder side. Besides, we enhance the frame generator to generate higher quality reconstructed frame. Experiments demonstrate that SSNVC can achieve state-of-the-art performance on multiple benchmarks, and can greatly simplify compression process as well as training process.
Paper Structure (11 sections, 2 equations, 4 figures, 4 tables)

This paper contains 11 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Training strategy of different neural video compression models. Each color block represents a training stage tcmvct.
  • Figure 2: Overview of our proposed video compression scheme. The red solid lines are only used at the encoder side. The blue solid lines are only used at decoder side.
  • Figure 3: Left: Structure of Dense U-Net module. ⓒ stands for concat. Right: Visualization of the features before the last convolutional layer of frame generator. Pictures from HEVC Class C.
  • Figure 4: Rate-distortion performance of SSNVC on the HEVC Class C, D and E datasets.