Table of Contents
Fetching ...

EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation

Nikolai Körber, Eduard Kromer, Andreas Siebert, Sascha Hauke, Daniel Mueller-Gritschneder, Björn Schuller

TL;DR

EGIC tackles the problem of high perceptual quality at low bit-rates in neural image compression by enabling traversal of the distortion-perception ($D$-$P$) curve from a single model. It introduces OASIS-C, a semantic segmentation-guided discriminator, and ORP, a lightweight residual-prediction module, to steer GAN-based reconstructions toward the data distribution while enabling interpolation via an adjustable parameter $\alpha$. Through a thorough comparison of GAN-based discriminators and conditioning schemes, EGIC demonstrates superior perceptual performance at low bit-rates and competitive distortion relative to diffusion-based methods, with much smaller model sizes and a single decoding pass. These findings suggest that semantic conditioning and simple residual interpolation can rival more complex diffusion-based approaches for practical, bandwidth-constrained image compression.

Abstract

We introduce EGIC, an enhanced generative image compression method that allows traversing the distortion-perception curve efficiently from a single model. EGIC is based on two novel building blocks: i) OASIS-C, a conditional pre-trained semantic segmentation-guided discriminator, which provides both spatially and semantically-aware gradient feedback to the generator, conditioned on the latent image distribution, and ii) Output Residual Prediction (ORP), a retrofit solution for multi-realism image compression that allows control over the synthesis process by adjusting the impact of the residual between an MSE-optimized and GAN-optimized decoder output on the GAN-based reconstruction. Together, EGIC forms a powerful codec, outperforming state-of-the-art diffusion and GAN-based methods (e.g., HiFiC, MS-ILLM, and DIRAC-100), while performing almost on par with VTM-20.0 on the distortion end. EGIC is simple to implement, very lightweight, and provides excellent interpolation characteristics, which makes it a promising candidate for practical applications targeting the low bit range.

EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation

TL;DR

EGIC tackles the problem of high perceptual quality at low bit-rates in neural image compression by enabling traversal of the distortion-perception (-) curve from a single model. It introduces OASIS-C, a semantic segmentation-guided discriminator, and ORP, a lightweight residual-prediction module, to steer GAN-based reconstructions toward the data distribution while enabling interpolation via an adjustable parameter . Through a thorough comparison of GAN-based discriminators and conditioning schemes, EGIC demonstrates superior perceptual performance at low bit-rates and competitive distortion relative to diffusion-based methods, with much smaller model sizes and a single decoding pass. These findings suggest that semantic conditioning and simple residual interpolation can rival more complex diffusion-based approaches for practical, bandwidth-constrained image compression.

Abstract

We introduce EGIC, an enhanced generative image compression method that allows traversing the distortion-perception curve efficiently from a single model. EGIC is based on two novel building blocks: i) OASIS-C, a conditional pre-trained semantic segmentation-guided discriminator, which provides both spatially and semantically-aware gradient feedback to the generator, conditioned on the latent image distribution, and ii) Output Residual Prediction (ORP), a retrofit solution for multi-realism image compression that allows control over the synthesis process by adjusting the impact of the residual between an MSE-optimized and GAN-optimized decoder output on the GAN-based reconstruction. Together, EGIC forms a powerful codec, outperforming state-of-the-art diffusion and GAN-based methods (e.g., HiFiC, MS-ILLM, and DIRAC-100), while performing almost on par with VTM-20.0 on the distortion end. EGIC is simple to implement, very lightweight, and provides excellent interpolation characteristics, which makes it a promising candidate for practical applications targeting the low bit range.
Paper Structure (30 sections, 10 equations, 29 figures, 8 tables)

This paper contains 30 sections, 10 equations, 29 figures, 8 tables.

Figures (29)

  • Figure 2: Schematic comparison between PatchGAN (l.) and OASIS-C (r.)
  • Figure 3: DIRAC-$n$noor_2023 vs Beta Conditioning (MRIC) Agustsson_2023_CVPR vs ORP
  • Figure 4: Comparing various purely adversarially optimized generative image compression methods at low bit-rate (HiFiC-lo). Rows one and three show examples of reconstructed cropped images ($256\times256$), while rows two and four show the corresponding spectra of the images. Best viewed electronically.
  • Figure 5: Comparison to the state-of-the-art on CLIC 2020
  • Figure 6: Visual comparison of EGIC ($\alpha \in \{0.0, 1.0\}$) with state-of-the-art distortion-oriented (l.) and perception-oriented (r.) codecs. Please visit the supplementary material for more impressions. Best viewed electronically.
  • ...and 24 more figures