Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence

Bolin Chen; Hanwei Zhu; Shanzhi Yin; Lingyu Zhu; Jie Chen; Ru-Ling Liao; Shiqi Wang; Yan Ye

Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence

Bolin Chen, Hanwei Zhu, Shanzhi Yin, Lingyu Zhu, Jie Chen, Ru-Ling Liao, Shiqi Wang, Yan Ye

TL;DR

This work tackles the challenge of achieving wide bitrate coverage in generative face video compression. It introduces Pleno-Generation (PGen), a scalable representation and layered reconstruction framework that uses a base GFVC-compatible layer plus an enhancement layer with multi-granularity signals and attention-guided generation to enable bandwidth-aware, high-fidelity reconstruction. Key contributions include multi-granularity feature representation, entropy-based feature compression, and an attention-guided signal generator within a scalable SRLR-based architecture, plus extensive experiments showing RD/perceptual gains over VVC and existing GFVC baselines. The results demonstrate that PGen extends the bitrate range, improves perceptual quality, and provides a universal, plug-and-play enhancement for GFVC, with practical implications for bandwidth-efficient, high-fidelity face video communication.

Abstract

Generative model based compact video compression is typically operated within a relative narrow range of bitrates, and often with an emphasis on ultra-low rate applications. There has been an increasing consensus in the video communication industry that full bitrate coverage should be enabled by generative coding. However, this is an extremely difficult task, largely because generation and compression, although related, have distinct goals and trade-offs. The proposed Pleno-Generation (PGen) framework distinguishes itself through its exceptional capabilities in ensuring the robustness of video coding by utilizing a wider range of bandwidth for generation via bandwidth intelligence. In particular, we initiate our research of PGen with face video coding, and PGen offers a paradigm shift that prioritizes high-fidelity reconstruction over pursuing compact bitstream. The novel PGen framework leverages scalable representation and layered reconstruction for Generative Face Video Compression (GFVC), in an attempt to imbue the bitstream with intelligence in different granularity. Experimental results illustrate that the proposed PGen framework can facilitate existing GFVC algorithms to better deliver high-fidelity and faithful face videos. In addition, the proposed framework can allow a greater space of flexibility for coding applications and show superior RD performance with a much wider bitrate range in terms of various quality evaluations. Moreover, in comparison with the latest Versatile Video Coding (VVC) codec, the proposed scheme achieves competitive Bjøntegaard-delta-rate savings for perceptual-level evaluations.

Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence

TL;DR

Abstract

Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)