Generative Visual Compression: A Review
Bolin Chen, Shanzhi Yin, Peilin Chen, Shiqi Wang, Yan Ye
TL;DR
Generative visual compression leverages deep generative models to achieve high-quality reconstructions at ultra-low bitrates and enables novel machine-vision analytics. The paper surveys human-vision approaches (end-to-end latent representations, cross-modal, conceptual, temporal, and omni-dimensional coding) and machine-vision paradigms (pixel- and feature-domain, single- to scalable architectures). It highlights methods that exploit learned priors, structured representations, and temporal dynamics to improve rate-distortion performance and analytic capabilities, while noting challenges in metrics, robustness, and standardization. The work underscores the potential for generative compression to transform both content delivery and automated analysis in the post-AIGC era, and calls for standardized evaluation, universal schemes, and hardware-software co-design to enable broad adoption.
Abstract
Artificial Intelligence Generated Content (AIGC) is leading a new technical revolution for the acquisition of digital content and impelling the progress of visual compression towards competitive performance gains and diverse functionalities over traditional codecs. This paper provides a thorough review on the recent advances of generative visual compression, illustrating great potentials and promising applications in ultra-low bitrate communication, user-specified reconstruction/filtering, and intelligent machine analysis. In particular, we review the visual data compression methodologies with deep generative models, and summarize how compact representation and high-fidelity reconstruction could be actualized via generative techniques. In addition, we generalize related generative compression technologies for machine vision and intelligent analytics. Finally, we discuss the fundamental challenges on generative visual compression techniques and envision their future research directions.
