Table of Contents
Fetching ...

SQ-GAN: Semantic Image Communications Using Masked Vector Quantization

Francesco Pezone, Sergio Barbarossa, Giuseppe Caire

TL;DR

The paper tackles semantic, task-oriented image compression by jointly encoding an image and its semantic segmentation map. It introduces SQ-GAN, a semantic-conditioned vector-quantized GAN with a novel SAMM that selectively preserves semantically relevant latent vectors, aided by data augmentation and a semantic-aware discriminator. The approach demonstrates superior semantic fidelity at very low bitrates compared with traditional and learned codecs on the Cityscapes dataset, highlighting a significant rate-distortion trade-off improvement for semantics-focused tasks. This work paves the way for efficient, semantics-preserving communications in networks where legacy protocols must be maintained, with potential extensions to video and other domains.

Abstract

This work introduces Semantically Masked Vector Quantized Generative Adversarial Network (SQ-GAN), a novel approach integrating semantically driven image coding and vector quantization to optimize image compression for semantic/task-oriented communications. The method only acts on source coding and is fully compliant with legacy systems. The semantics is extracted from the image computing its semantic segmentation map using off-the-shelf software. A new specifically developed semantic-conditioned adaptive mask module (SAMM) selectively encodes semantically relevant features of the image. The relevance of the different semantic classes is task-specific, and it is incorporated in the training phase by introducing appropriate weights in the loss function. SQ-GAN outperforms state-of-the-art image compression schemes such as JPEG2000, BPG, and deep-learning based methods across multiple metrics, including perceptual quality and semantic segmentation accuracy on the reconstructed image, at extremely low compression rates.

SQ-GAN: Semantic Image Communications Using Masked Vector Quantization

TL;DR

The paper tackles semantic, task-oriented image compression by jointly encoding an image and its semantic segmentation map. It introduces SQ-GAN, a semantic-conditioned vector-quantized GAN with a novel SAMM that selectively preserves semantically relevant latent vectors, aided by data augmentation and a semantic-aware discriminator. The approach demonstrates superior semantic fidelity at very low bitrates compared with traditional and learned codecs on the Cityscapes dataset, highlighting a significant rate-distortion trade-off improvement for semantics-focused tasks. This work paves the way for efficient, semantics-preserving communications in networks where legacy protocols must be maintained, with potential extensions to video and other domains.

Abstract

This work introduces Semantically Masked Vector Quantized Generative Adversarial Network (SQ-GAN), a novel approach integrating semantically driven image coding and vector quantization to optimize image compression for semantic/task-oriented communications. The method only acts on source coding and is fully compliant with legacy systems. The semantics is extracted from the image computing its semantic segmentation map using off-the-shelf software. A new specifically developed semantic-conditioned adaptive mask module (SAMM) selectively encodes semantically relevant features of the image. The relevance of the different semantic classes is task-specific, and it is incorporated in the training phase by introducing appropriate weights in the loss function. SQ-GAN outperforms state-of-the-art image compression schemes such as JPEG2000, BPG, and deep-learning based methods across multiple metrics, including perceptual quality and semantic segmentation accuracy on the reconstructed image, at extremely low compression rates.

Paper Structure

This paper contains 20 sections, 11 equations, 12 figures.

Figures (12)

  • Figure 1: Schematic representation of the overall coding/decoding scheme.
  • Figure 2: Encoder and decoder detailed structure of the proposed SQ-GAN scheme. The "channel" here may represent transmission or storage, depending on the application.
  • Figure 3: Architectural diagram of the amm as in Huang2023MaskedVQ-VAE (left), and the proposed samm employing the spade layer to introduce the ssm conditioning (right).
  • Figure 4: Schematic representation of the semantic generator network $G_\mathbf{s}$ training pipeline.
  • Figure 5: Schematic representation of the image generator network $G_\mathbf{x}$ training pipeline.
  • ...and 7 more figures