Table of Contents
Fetching ...

LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression

Wei Jiang, Junru Li, Kai Zhang, Li Zhang

TL;DR

This work tackles the limitation of local-only motion estimation in learned video compression by introducing joint Local and Global Motion Compensation (LGMC). The method fuses flow-based local compensation with cross-attention-based global compensation, employing a linear-time attention approximation to capture global redundancies without extra bits. Integrated with DCVC-TCM to form LVC-LGMC, it yields consistent RD gains across benchmarks, including notable BD-rate reductions on MCL-JCV, while maintaining manageable model size and decoding complexity. The approach provides a practical path to better motion modeling in learned video coding and is readily adaptable to other conditional coding frameworks.

Abstract

Existing learned video compression models employ flow net or deformable convolutional networks (DCN) to estimate motion information. However, the limited receptive fields of flow net and DCN inherently direct their attentiveness towards the local contexts. Global contexts, such as large-scale motions and global correlations among frames are ignored, presenting a significant bottleneck for capturing accurate motions. To address this issue, we propose a joint local and global motion compensation module (LGMC) for leaned video coding. More specifically, we adopt flow net for local motion compensation. To capture global context, we employ the cross attention in feature domain for motion compensation. In addition, to avoid the quadratic complexity of vanilla cross attention, we divide the softmax operations in attention into two independent softmax operations, leading to linear complexity. To validate the effectiveness of our proposed LGMC, we integrate it with DCVC-TCM and obtain learned video compression with joint local and global motion compensation (LVC-LGMC). Extensive experiments demonstrate that our LVC-LGMC has significant rate-distortion performance improvements over baseline DCVC-TCM.

LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression

TL;DR

This work tackles the limitation of local-only motion estimation in learned video compression by introducing joint Local and Global Motion Compensation (LGMC). The method fuses flow-based local compensation with cross-attention-based global compensation, employing a linear-time attention approximation to capture global redundancies without extra bits. Integrated with DCVC-TCM to form LVC-LGMC, it yields consistent RD gains across benchmarks, including notable BD-rate reductions on MCL-JCV, while maintaining manageable model size and decoding complexity. The approach provides a practical path to better motion modeling in learned video coding and is readily adaptable to other conditional coding frameworks.

Abstract

Existing learned video compression models employ flow net or deformable convolutional networks (DCN) to estimate motion information. However, the limited receptive fields of flow net and DCN inherently direct their attentiveness towards the local contexts. Global contexts, such as large-scale motions and global correlations among frames are ignored, presenting a significant bottleneck for capturing accurate motions. To address this issue, we propose a joint local and global motion compensation module (LGMC) for leaned video coding. More specifically, we adopt flow net for local motion compensation. To capture global context, we employ the cross attention in feature domain for motion compensation. In addition, to avoid the quadratic complexity of vanilla cross attention, we divide the softmax operations in attention into two independent softmax operations, leading to linear complexity. To validate the effectiveness of our proposed LGMC, we integrate it with DCVC-TCM and obtain learned video compression with joint local and global motion compensation (LVC-LGMC). Extensive experiments demonstrate that our LVC-LGMC has significant rate-distortion performance improvements over baseline DCVC-TCM.
Paper Structure (12 sections, 6 equations, 7 figures, 1 table)

This paper contains 12 sections, 6 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Overall frame work of the proposed LVC-LGMC. $\boldsymbol{E}_C$ is the contextual encoder, and $\boldsymbol{D}_C$ is the contextual decoder. $\boldsymbol{E}_M$ is the MV encoder, and $\boldsymbol{D}_M$ is the MV decoder. LGMC is the proposed joint local and global motion compensation module.
  • Figure 2: Illustration of the joint local and global motion compensation module (LGMC) at encoder side.
  • Figure 3: Illustration of the joint local and global motion compensation module (LGMC) at decoder side.
  • Figure 4: Illustration of rate-distortion performance of the proposed LVC-LGMC, DCVC-TCM sheng2022temporal, DCVC li2021deep, DVCPro lu2020end, HM-16.20 and x265 codec. The distortion is PSNR. Please zoom in for better view.
  • Figure 5: Illustration of rate-distortion performance of the proposed LVC-LGMC, DCVC-TCM sheng2022temporal, DCVC li2021deep, DVCPro lu2020end, HM-16.20 and x265 codec. The distortion metric is MS-SSIM wang2003multiscale. Please zoom in for better view.
  • ...and 2 more figures