Table of Contents
Fetching ...

Hierarchical Semantic Compression for Consistent Image Semantic Restoration

Shengxi Li, Zifu Zhang, Mai Xu, Lai Jiang, Yufan Liu, Ce Zhu

TL;DR

The paper tackles semantic compression by addressing the inefficiencies of using predefined or high-dimensional semantics. It introduces Hierarchical Semantic Compression (HSC), which operates entirely in intrinsic GAN semantics using a General Inversion Encoder (GIE) to recover middle-level semantic features and core semantics, compressed by a Feature Compression Network (FCN) and a Semantic Compression Network (SCN) with a shared, channel-wise autoregressive entropy model. This hierarchical approach reduces bitrate while preserving semantic fidelity and enables editing directly from compressed bitstreams, achieving state-of-the-art results in both subjective quality and machine-vision tasks, especially at ultra-low bitrates. The work demonstrates a new direction where compression aligns with human visual semantics and machine understanding, potentially impacting future image/video coding paradigms.

Abstract

The emerging semantic compression has been receiving increasing research efforts most recently, capable of achieving high fidelity restoration during compression, even at extremely low bitrates. However, existing semantic compression methods typically combine standard pipelines with either pre-defined or high-dimensional semantics, thus suffering from deficiency in compression. To address this issue, we propose a novel hierarchical semantic compression (HSC) framework that purely operates within intrinsic semantic spaces from generative models, which is able to achieve efficient compression for consistent semantic restoration. More specifically, we first analyse the entropy models for the semantic compression, which motivates us to employ a hierarchical architecture based on a newly developed general inversion encoder. Then, we propose the feature compression network (FCN) and semantic compression network (SCN), such that the middle-level semantic feature and core semantics are hierarchically compressed to restore both accuracy and consistency of image semantics, via an entropy model progressively shared by channel-wise context. Experimental results demonstrate that the proposed HSC framework achieves the state-of-the-art performance on subjective quality and consistency for human vision, together with superior performances on machine vision tasks given compressed bitstreams. This essentially coincides with human visual system in understanding images, thus providing a new framework for future image/video compression paradigms. Our code shall be released upon acceptance.

Hierarchical Semantic Compression for Consistent Image Semantic Restoration

TL;DR

The paper tackles semantic compression by addressing the inefficiencies of using predefined or high-dimensional semantics. It introduces Hierarchical Semantic Compression (HSC), which operates entirely in intrinsic GAN semantics using a General Inversion Encoder (GIE) to recover middle-level semantic features and core semantics, compressed by a Feature Compression Network (FCN) and a Semantic Compression Network (SCN) with a shared, channel-wise autoregressive entropy model. This hierarchical approach reduces bitrate while preserving semantic fidelity and enables editing directly from compressed bitstreams, achieving state-of-the-art results in both subjective quality and machine-vision tasks, especially at ultra-low bitrates. The work demonstrates a new direction where compression aligns with human visual semantics and machine understanding, potentially impacting future image/video coding paradigms.

Abstract

The emerging semantic compression has been receiving increasing research efforts most recently, capable of achieving high fidelity restoration during compression, even at extremely low bitrates. However, existing semantic compression methods typically combine standard pipelines with either pre-defined or high-dimensional semantics, thus suffering from deficiency in compression. To address this issue, we propose a novel hierarchical semantic compression (HSC) framework that purely operates within intrinsic semantic spaces from generative models, which is able to achieve efficient compression for consistent semantic restoration. More specifically, we first analyse the entropy models for the semantic compression, which motivates us to employ a hierarchical architecture based on a newly developed general inversion encoder. Then, we propose the feature compression network (FCN) and semantic compression network (SCN), such that the middle-level semantic feature and core semantics are hierarchically compressed to restore both accuracy and consistency of image semantics, via an entropy model progressively shared by channel-wise context. Experimental results demonstrate that the proposed HSC framework achieves the state-of-the-art performance on subjective quality and consistency for human vision, together with superior performances on machine vision tasks given compressed bitstreams. This essentially coincides with human visual system in understanding images, thus providing a new framework for future image/video compression paradigms. Our code shall be released upon acceptance.

Paper Structure

This paper contains 18 sections, 21 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Illustration of restoring image semantic consistency instead of pixel-wise accuracy, by representative methods including VVC for the latest standard codec, HiFiC for the subjective-oriented compression and our HSC for the semantic compression. Note that our HSC method achieves continuous bitrates starting from 0 bit per pixel (bpp), by compressing intrinsic semantics.
  • Figure 2: Main categories of existing image compression paradigms. (a) denotes image compression that aims to optimise pixel-wise accuracy, including VVC bross2021overview and many learned methods balle2017endballe2018variationalhe2022elic. (b) represents image compression assisted by generative models to improve perceptual quality, including HiFiC mentzer2020high and MS-ILLM muckley2023improving. (c) depicts existing semantic compression methods that almost are built upon pre-defined semantics, such as texts lei2023text+sketch and structures chang2022conceptual. (d) represents the proposed HSC method, which is purely based on the intrinsic semantics from unconditional advanced generative models.
  • Figure 3: The overall architecture of the proposed HSC method, which includes the general inversion encoder (GIE), feature compression network (FCN), semantic compression network (SCN), and pre-trained StyleGAN generator. We separate the StyleGAN generator by the proposed intermediate semantic feature, apart from solely compressing the core semantics from the style codes. This way, the proposed HSC compresses the semantics from the GIE network in a hierarchical way, given the mutual semantics between SCN and FCN during compression, guaranteeing both semantic consistency and high fidelity.
  • Figure 4: The proposed semantic context model to autoregressively compress semantic features $\bm{y}$ across channel slices $\{\bm{y_1}, \bm{y_2}, \dots, \bm{y_k}\}$, in a coarse to fine-grained manner. Our semantic context encoder network aims to model the pdf of each slice, by predicting the mean $\bm{\mu}$ and diagonal variance $\bm{\sigma}$ given all the previous semantic information. The core semantics from SCN are employed by the first slice as the prior.
  • Figure 5: The proposed HSC method to edit the compressed images by our compressed middle-level semantic feature $\mathbf{\hat{f}}$ and and core semantics $\bm{\mathcal{\hat{S}}}$. We illustrate the editing by smiling, eyeglasses and makeup as examples, which achieve accurate and realistic editing.
  • ...and 7 more figures