Generative Semantic Communication for Joint Image Transmission and Segmentation
Weiwen Yuan, Jinke Ren, Chongjie Wang, Ruichen Zhang, Jun Wei, Dong In Kim, Shuguang Cui
TL;DR
This paper tackles the challenge of efficient, multi-task image transmission by introducing a generative semantic communication framework with semantic knowledge bases (KBs) at both ends. The transmitter employs a Swin-Transformer-based source KB to extract hierarchical features, while the receiver's source KB generates task-specific features; task KBs map natural language task requests to discrete instructions via semantic similarity. A unified JSCC encoder (ResNet-based) processes the input together with semantic features, and two task-specific decoders operate in parallel: a diffusion-model-based decoder for image reconstruction and a ResNet-based decoder for image segmentation. Experimental results on DIV2K and PASCAL VOC demonstrate superior PSNR and IoU over baselines, with reduced overhead and improved generalization for multi-task semantic transmission.
Abstract
Semantic communication has emerged as a promising technology for enhancing communication efficiency. However, most existing research emphasizes single-task reconstruction, neglecting model adaptability and generalization across multi-task systems. In this paper, we propose a novel generative semantic communication system that supports both image reconstruction and segmentation tasks. Our approach builds upon semantic knowledge bases (KBs) at both the transmitter and receiver, with each semantic KB comprising a source KB and a task KB. The source KB at the transmitter leverages a hierarchical Swin-Transformer, a generative AI scheme, to extract multi-level features from the input image. Concurrently, the counterpart source KB at the receiver utilizes hierarchical residual blocks to generate task-specific knowledge. Furthermore, the task KBs adopt a semantic similarity model to map different task requirements into pre-defined task instructions, thereby facilitating the feature selection of the source KBs. Additionally, we develop a unified residual block-based joint source and channel (JSCC) encoder and two task-specific JSCC decoders to achieve the two image tasks. In particular, a generative diffusion model is adopted to construct the JSCC decoder for the image reconstruction task. Experimental results show that our multi-task generative semantic communication system outperforms previous single-task communication systems in terms of peak signal-to-noise ratio and segmentation accuracy.
