Neural Video Compression with Context Modulation
Chuanbo Tang, Zhuoyuan Li, Yifan Bian, Li Li, Dong Liu
TL;DR
This work tackles the challenge of efficiently exploiting temporal redundancy in neural video compression by introducing context modulation, which combines an oriented temporal context derived from the reference frame with the propagated context. The two-step approach—flow orientation to extract inter-frame correlation and context compensation to fuse oriented and propagated contexts under a global-local synergy with decoupling loss—yields a richer temporal context and reduces irrelevant information in the prediction chain. Empirically, the method achieves substantial bitrate savings, up to $22.7\%$ over H.266/VVC and $10.1\%$ over the previous SOTA DCVC-FM, while operating within a conditional coding framework and maintaining competitive complexity. These gains demonstrate the practical potential of refined temporal context modeling for neural video codecs, with future work aimed at learnable warps and more explicit motion priors to further enhance temporal alignment and compression efficiency.
Abstract
Efficient video coding is highly dependent on exploiting the temporal redundancy, which is usually achieved by extracting and leveraging the temporal context in the emerging conditional coding-based neural video codec (NVC). Although the latest NVC has achieved remarkable progress in improving the compression performance, the inherent temporal context propagation mechanism lacks the ability to sufficiently leverage the reference information, limiting further improvement. In this paper, we address the limitation by modulating the temporal context with the reference frame in two steps. Specifically, we first propose the flow orientation to mine the inter-correlation between the reference frame and prediction frame for generating the additional oriented temporal context. Moreover, we introduce the context compensation to leverage the oriented context to modulate the propagated temporal context generated from the propagated reference feature. Through the synergy mechanism and decoupling loss supervision, the irrelevant propagated information can be effectively eliminated to ensure better context modeling. Experimental results demonstrate that our codec achieves on average 22.7% bitrate reduction over the advanced traditional video codec H.266/VVC, and offers an average 10.1% bitrate saving over the previous state-of-the-art NVC DCVC-FM. The code is available at https://github.com/Austin4USTC/DCMVC.
