SQ-GAN: Semantic Image Communications Using Masked Vector Quantization
Francesco Pezone, Sergio Barbarossa, Giuseppe Caire
TL;DR
The paper tackles semantic, task-oriented image compression by jointly encoding an image and its semantic segmentation map. It introduces SQ-GAN, a semantic-conditioned vector-quantized GAN with a novel SAMM that selectively preserves semantically relevant latent vectors, aided by data augmentation and a semantic-aware discriminator. The approach demonstrates superior semantic fidelity at very low bitrates compared with traditional and learned codecs on the Cityscapes dataset, highlighting a significant rate-distortion trade-off improvement for semantics-focused tasks. This work paves the way for efficient, semantics-preserving communications in networks where legacy protocols must be maintained, with potential extensions to video and other domains.
Abstract
This work introduces Semantically Masked Vector Quantized Generative Adversarial Network (SQ-GAN), a novel approach integrating semantically driven image coding and vector quantization to optimize image compression for semantic/task-oriented communications. The method only acts on source coding and is fully compliant with legacy systems. The semantics is extracted from the image computing its semantic segmentation map using off-the-shelf software. A new specifically developed semantic-conditioned adaptive mask module (SAMM) selectively encodes semantically relevant features of the image. The relevance of the different semantic classes is task-specific, and it is incorporated in the training phase by introducing appropriate weights in the loss function. SQ-GAN outperforms state-of-the-art image compression schemes such as JPEG2000, BPG, and deep-learning based methods across multiple metrics, including perceptual quality and semantic segmentation accuracy on the reconstructed image, at extremely low compression rates.
