Table of Contents
Fetching ...

Uni-LVC: A Unified Method for Intra- and Inter-Mode Learned Video Compression

Yichi Zhang, Ruoyu Yang, Fengqing Zhu

TL;DR

Uni-LVC is introduced, a unified LVC method that supports both intra and inter coding with low-delay and random-access in a single model, and a reliability-aware classifier is proposed to selectively scale the temporal cues, making Uni-LVC behave closer to intra coding when references are unreliable.

Abstract

Recent advances in learned video compression (LVC) have led to significant performance gains, with codecs such as DCVC-RT surpassing the H.266/VVC low-delay mode in compression efficiency. However, existing LVCs still exhibit key limitations: they often require separate models for intra and inter coding modes, and their performance degrades when temporal references are unreliable. To address this, we introduce Uni-LVC, a unified LVC method that supports both intra and inter coding with low-delay and random-access in a single model. Building on a strong intra-codec, Uni-LVC formulates inter-coding as intra-coding conditioned on temporal information extracted from reference frames. We design an efficient cross-attention adaptation module that integrates temporal cues, enabling seamless support for both unidirectional (low-delay) and bidirectional (random-access) prediction modes. A reliability-aware classifier is proposed to selectively scale the temporal cues, making Uni-LVC behave closer to intra coding when references are unreliable. We further propose a multistage training strategy to facilitate adaptive learning across various coding modes. Extensive experiments demonstrate that Uni-LVC achieves superior rate-distortion performance in intra and inter configurations while maintaining comparable computational efficiency.

Uni-LVC: A Unified Method for Intra- and Inter-Mode Learned Video Compression

TL;DR

Uni-LVC is introduced, a unified LVC method that supports both intra and inter coding with low-delay and random-access in a single model, and a reliability-aware classifier is proposed to selectively scale the temporal cues, making Uni-LVC behave closer to intra coding when references are unreliable.

Abstract

Recent advances in learned video compression (LVC) have led to significant performance gains, with codecs such as DCVC-RT surpassing the H.266/VVC low-delay mode in compression efficiency. However, existing LVCs still exhibit key limitations: they often require separate models for intra and inter coding modes, and their performance degrades when temporal references are unreliable. To address this, we introduce Uni-LVC, a unified LVC method that supports both intra and inter coding with low-delay and random-access in a single model. Building on a strong intra-codec, Uni-LVC formulates inter-coding as intra-coding conditioned on temporal information extracted from reference frames. We design an efficient cross-attention adaptation module that integrates temporal cues, enabling seamless support for both unidirectional (low-delay) and bidirectional (random-access) prediction modes. A reliability-aware classifier is proposed to selectively scale the temporal cues, making Uni-LVC behave closer to intra coding when references are unreliable. We further propose a multistage training strategy to facilitate adaptive learning across various coding modes. Extensive experiments demonstrate that Uni-LVC achieves superior rate-distortion performance in intra and inter configurations while maintaining comparable computational efficiency.
Paper Structure (30 sections, 24 equations, 10 figures, 6 tables)

This paper contains 30 sections, 24 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: PSNR and BPP vs frames. The results are obtained on videoSRC21 sequence from MCL-JCV, which contains a scene change at frame 48. At this frame, DCVC-RT continues to rely on previous time references, causing a sharp drop in PSNR. In contrast, Uni-LVC automatically suppresses unreliable temporal features and switches to intra‐dominant coding ($\alpha_t \approx 0.1$), keeping stable PSNR quality.
  • Figure 2: Overview of the proposed Uni-LVC. "Enc." and "Dec." denote the encoder and decoder, respectively, while "Tem." represents the temporal modeling module. Inter coding (LD and RA) is formulated as intra coding conditioned on auxiliary temporal features extracted from temporal references stored in the buffer.
  • Figure 3: Architecture of the proposed intra codec. "Cross Attn" and "DC Block" denote the cross-attention module and enhanced depthwise convolution block. "Q" indicates quantization. A hybrid buffer $f_t$ is formed from $f_t^d$ and $f_t^r$ and stores temporal references. For inter coding, the temporal feature $f_{t-1}$ (or its bidirectional counterpart) is fed through Cross Attn; for pure intra coding, $f_{t-1}$ is a quality-specific learnable vector $f_p$ to preserve compatibility and learn global priors. HPCM denotes the hierarchical progressive context model.
  • Figure 4: Enhanced depthwise convolution (DC) block.
  • Figure 5: Hierarchical progressive context model.
  • ...and 5 more figures