Hierarchical Semantic Compression for Consistent Image Semantic Restoration
Shengxi Li, Zifu Zhang, Mai Xu, Lai Jiang, Yufan Liu, Ce Zhu
TL;DR
The paper tackles semantic compression by addressing the inefficiencies of using predefined or high-dimensional semantics. It introduces Hierarchical Semantic Compression (HSC), which operates entirely in intrinsic GAN semantics using a General Inversion Encoder (GIE) to recover middle-level semantic features and core semantics, compressed by a Feature Compression Network (FCN) and a Semantic Compression Network (SCN) with a shared, channel-wise autoregressive entropy model. This hierarchical approach reduces bitrate while preserving semantic fidelity and enables editing directly from compressed bitstreams, achieving state-of-the-art results in both subjective quality and machine-vision tasks, especially at ultra-low bitrates. The work demonstrates a new direction where compression aligns with human visual semantics and machine understanding, potentially impacting future image/video coding paradigms.
Abstract
The emerging semantic compression has been receiving increasing research efforts most recently, capable of achieving high fidelity restoration during compression, even at extremely low bitrates. However, existing semantic compression methods typically combine standard pipelines with either pre-defined or high-dimensional semantics, thus suffering from deficiency in compression. To address this issue, we propose a novel hierarchical semantic compression (HSC) framework that purely operates within intrinsic semantic spaces from generative models, which is able to achieve efficient compression for consistent semantic restoration. More specifically, we first analyse the entropy models for the semantic compression, which motivates us to employ a hierarchical architecture based on a newly developed general inversion encoder. Then, we propose the feature compression network (FCN) and semantic compression network (SCN), such that the middle-level semantic feature and core semantics are hierarchically compressed to restore both accuracy and consistency of image semantics, via an entropy model progressively shared by channel-wise context. Experimental results demonstrate that the proposed HSC framework achieves the state-of-the-art performance on subjective quality and consistency for human vision, together with superior performances on machine vision tasks given compressed bitstreams. This essentially coincides with human visual system in understanding images, thus providing a new framework for future image/video compression paradigms. Our code shall be released upon acceptance.
