Exploring Long- and Short-Range Temporal Information for Learned Video Compression

Huairui Wang; Zhenzhong Chen

Exploring Long- and Short-Range Temporal Information for Learned Video Compression

Huairui Wang, Zhenzhong Chen

TL;DR

This work addresses limitations of existing learned video compression by exploiting both long- and short-range temporal information. It introduces a continuously updated temporal prior to capture long-range information and a Progressive Guided Motion Compensation (PGMC) to robustly utilize short-range temporal cues, integrating these into a conditional coding framework with hyperprior-based entropy models. The proposed LSTVC and LSTVC+ demonstrate clear RD improvements over state-of-the-art learned methods and competitive performance against traditional codecs on multiple datasets, while maintaining efficient, parallelizable inference. The approach offers a practical path toward high-efficiency learned video compression with dynamic temporal information handling.

Abstract

Learned video compression methods have gained a variety of interest in the video coding community since they have matched or even exceeded the rate-distortion (RD) performance of traditional video codecs. However, many current learning-based methods are dedicated to utilizing short-range temporal information, thus limiting their performance. In this paper, we focus on exploiting the unique characteristics of video content and further exploring temporal information to enhance compression performance. Specifically, for long-range temporal information exploitation, we propose temporal prior that can update continuously within the group of pictures (GOP) during inference. In that case temporal prior contains valuable temporal information of all decoded images within the current GOP. As for short-range temporal information, we propose a progressive guided motion compensation to achieve robust and effective compensation. In detail, we design a hierarchical structure to achieve multi-scale compensation. More importantly, we use optical flow guidance to generate pixel offsets between feature maps at each scale, and the compensation results at each scale will be used to guide the following scale's compensation. Sufficient experimental results demonstrate that our method can obtain better RD performance than state-of-the-art video compression approaches. The code is publicly available on: https://github.com/Huairui/LSTVC.

Exploring Long- and Short-Range Temporal Information for Learned Video Compression

TL;DR

Abstract

Paper Structure (28 sections, 7 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 28 sections, 7 equations, 6 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Learned Image Compression
Learned Video Compression
Proposed Method
Framework Description
Temporal Prior
Initialization and Updating
Prior for Motion Compression
Prior for Contextual Compression
Discussion
Progressive Guided Motion Compensation
Loss Function
Experiments
Datasets and Implementation Details
...and 13 more sections

Figures (6)

Figure 1: Overview of our proposed video compression framework LSTVC. We extract the temporal information from each decoded frame and supplement the temporal prior with the extracted information. Besides, the temporal prior will be updated explicitly with the decoded motion vectors during compression and should be encoded into the latent representation before participating in the entropy model. The latent representation from motion and contextual compression will be compressed into/decompressed from a bitstream by an arithmetic encoder/decoder. For contextual compression, we use long- and short-range temporal information simultaneously to facilitate distribution parameter prediction.
Figure 2: Visualization of the refine context prior and temporal prior.
Figure 3: Visualization of the motion vector, offset residual, warped feature and compensation results from PGMC.
Figure 4: The illustration of the progressive guided motion compensation. The channel number of feature in each scale decreases scale by scale to reduce operation's complexity.
Figure 5: The rate-distortion performance of our approach compared with H.265 (x265 LDP placebo) and the recent learned video compression approaches on the HEVC, UVG and MCL-JCV sequences.
...and 1 more figures

Exploring Long- and Short-Range Temporal Information for Learned Video Compression

TL;DR

Abstract

Exploring Long- and Short-Range Temporal Information for Learned Video Compression

Authors

TL;DR

Abstract

Table of Contents

Figures (6)