Table of Contents
Fetching ...

Conditional Neural Video Coding with Spatial-Temporal Super-Resolution

Henan Wang, Xiaohan Pan, Runsen Feng, Zongyu Guo, Zhibo Chen

Abstract

This document is an expanded version of a one-page abstract originally presented at the 2024 Data Compression Conference. It describes our proposed method for the video track of the Challenge on Learned Image Compression (CLIC) 2024. Our scheme follows the typical hybrid coding framework with some novel techniques. Firstly, we adopt Spynet network to produce accurate motion vectors for motion estimation. Secondly, we introduce the context mining scheme with conditional frame coding to fully exploit the spatial-temporal information. As for the low target bitrates given by CLIC, we integrate spatial-temporal super-resolution modules to improve rate-distortion performance. Our team name is IMCLVC.

Conditional Neural Video Coding with Spatial-Temporal Super-Resolution

Abstract

This document is an expanded version of a one-page abstract originally presented at the 2024 Data Compression Conference. It describes our proposed method for the video track of the Challenge on Learned Image Compression (CLIC) 2024. Our scheme follows the typical hybrid coding framework with some novel techniques. Firstly, we adopt Spynet network to produce accurate motion vectors for motion estimation. Secondly, we introduce the context mining scheme with conditional frame coding to fully exploit the spatial-temporal information. As for the low target bitrates given by CLIC, we integrate spatial-temporal super-resolution modules to improve rate-distortion performance. Our team name is IMCLVC.
Paper Structure (7 sections, 1 equation, 2 figures)

This paper contains 7 sections, 1 equation, 2 figures.

Figures (2)

  • Figure 1: Proposed spatial-temporal super resolution scheme. Videos are downsampled spatially and temporally before being coded, while upsampled to original resolution at the decoder side.
  • Figure 2: Our P-frame model architecture. The model is composed of three parts: motion coding, context mining and frame coding. For motion estimation, we adopt the Spynet ranjan2017optical. For temporal context mining, we adopt the TCM module in DCVC-TCM sheng2022temporal.