Table of Contents
Fetching ...

Neural B-frame Video Compression with Bi-directional Reference Harmonization

Yuxi Liu, Dengchao Jin, Shuai Huo, Jiawen Gu, Chao Zhou, Huihui Bai, Ming Lu, Zhan Ma

TL;DR

This work identifies unbalanced reference contribution (URC) as a critical bottleneck in neural B-frame video compression and proposes BRHVC, a Bi-directional Reference Harmonization framework. BRHVC introduces Bi-directional Motion Converge (BMC) to fuse multi-scale optical flows and Bi-directional Contextual Fusion (BCF) to adaptively weight bi-directional reference contexts during coding. Empirical results show BRHVC achieving substantial bitrate savings, outperforming prior NBVC methods and surpassing VTM-RA on HEVC datasets, with notable gains on large frame spans. The approach advances NBVC by explicitly harmonizing bi-directional references, enabling more accurate motion compensation and context modeling, thus improving compression efficiency in random-access B-frame coding.

Abstract

Neural video compression (NVC) has made significant progress in recent years, while neural B-frame video compression (NBVC) remains underexplored compared to P-frame compression. NBVC can adopt bi-directional reference frames for better compression performance. However, NBVC's hierarchical coding may complicate continuous temporal prediction, especially at some hierarchical levels with a large frame span, which could cause the contribution of the two reference frames to be unbalanced. To optimize reference information utilization, we propose a novel NBVC method, termed Bi-directional Reference Harmonization Video Compression (BRHVC), with the proposed Bi-directional Motion Converge (BMC) and Bi-directional Contextual Fusion (BCF). BMC converges multiple optical flows in motion compression, leading to more accurate motion compensation on a larger scale. Then BCF explicitly models the weights of reference contexts under the guidance of motion compensation accuracy. With more efficient motions and contexts, BRHVC can effectively harmonize bi-directional references. Experimental results indicate that our BRHVC outperforms previous state-of-the-art NVC methods, even surpassing the traditional coding, VTM-RA (under random access configuration), on the HEVC datasets. The source code is released at https://github.com/kwai/NVC.

Neural B-frame Video Compression with Bi-directional Reference Harmonization

TL;DR

This work identifies unbalanced reference contribution (URC) as a critical bottleneck in neural B-frame video compression and proposes BRHVC, a Bi-directional Reference Harmonization framework. BRHVC introduces Bi-directional Motion Converge (BMC) to fuse multi-scale optical flows and Bi-directional Contextual Fusion (BCF) to adaptively weight bi-directional reference contexts during coding. Empirical results show BRHVC achieving substantial bitrate savings, outperforming prior NBVC methods and surpassing VTM-RA on HEVC datasets, with notable gains on large frame spans. The approach advances NBVC by explicitly harmonizing bi-directional references, enabling more accurate motion compensation and context modeling, thus improving compression efficiency in random-access B-frame coding.

Abstract

Neural video compression (NVC) has made significant progress in recent years, while neural B-frame video compression (NBVC) remains underexplored compared to P-frame compression. NBVC can adopt bi-directional reference frames for better compression performance. However, NBVC's hierarchical coding may complicate continuous temporal prediction, especially at some hierarchical levels with a large frame span, which could cause the contribution of the two reference frames to be unbalanced. To optimize reference information utilization, we propose a novel NBVC method, termed Bi-directional Reference Harmonization Video Compression (BRHVC), with the proposed Bi-directional Motion Converge (BMC) and Bi-directional Contextual Fusion (BCF). BMC converges multiple optical flows in motion compression, leading to more accurate motion compensation on a larger scale. Then BCF explicitly models the weights of reference contexts under the guidance of motion compensation accuracy. With more efficient motions and contexts, BRHVC can effectively harmonize bi-directional references. Experimental results indicate that our BRHVC outperforms previous state-of-the-art NVC methods, even surpassing the traditional coding, VTM-RA (under random access configuration), on the HEVC datasets. The source code is released at https://github.com/kwai/NVC.

Paper Structure

This paper contains 26 sections, 5 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Two main coding schemes with Intra Period 4 as an example. Each square represents a frame, and the frames are temporally ordered from left to right. Arrows represent the reference pipeline. The serial numbers ① ② ③ indicate the coding order.
  • Figure 2: The unbalanced contribution issue between reference frames in B-frame coding. The right reference is notably more significant than the left for the compression of the number plate.
  • Figure 3: Quantitative experiment on unbalanced reference contribution. The results of the "gap of references" indicate the average contribution difference between two reference frames across the frame spans of {32, 16, 8, 4, 2}. Refer to Section \ref{['sec:rethink']} for more explanations.
  • Figure 4: The overall architecture of BRHVC. AE and AD denote arithmetic encoding and arithmetic decoding. MC Encoder and MD Decoder denote Motion Converge Encoder and Motion Diverge Decoder. $C_f^{1,2,3}$ and $C_b^{1,2,3}$ denote $\{C^1_{f},C^2_{f},C^3_{f}\}$ and $\{C^1_{b},C^2_{b},C^3_{b}\}$, respectively. We omit the entropy model and some outputs from the decoded buffer for brevity.
  • Figure 5: The framework of Bi-directional Motion Converge. MF Adapter denotes Motion Feature Adapter. $q_v^{enc}$ and $q_v^{dec}$ are learnable quantization vectors for variable bitrates li2023neural.
  • ...and 8 more figures