Table of Contents
Fetching ...

Generative Semantic Communication for Joint Image Transmission and Segmentation

Weiwen Yuan, Jinke Ren, Chongjie Wang, Ruichen Zhang, Jun Wei, Dong In Kim, Shuguang Cui

TL;DR

This paper tackles the challenge of efficient, multi-task image transmission by introducing a generative semantic communication framework with semantic knowledge bases (KBs) at both ends. The transmitter employs a Swin-Transformer-based source KB to extract hierarchical features, while the receiver's source KB generates task-specific features; task KBs map natural language task requests to discrete instructions via semantic similarity. A unified JSCC encoder (ResNet-based) processes the input together with semantic features, and two task-specific decoders operate in parallel: a diffusion-model-based decoder for image reconstruction and a ResNet-based decoder for image segmentation. Experimental results on DIV2K and PASCAL VOC demonstrate superior PSNR and IoU over baselines, with reduced overhead and improved generalization for multi-task semantic transmission.

Abstract

Semantic communication has emerged as a promising technology for enhancing communication efficiency. However, most existing research emphasizes single-task reconstruction, neglecting model adaptability and generalization across multi-task systems. In this paper, we propose a novel generative semantic communication system that supports both image reconstruction and segmentation tasks. Our approach builds upon semantic knowledge bases (KBs) at both the transmitter and receiver, with each semantic KB comprising a source KB and a task KB. The source KB at the transmitter leverages a hierarchical Swin-Transformer, a generative AI scheme, to extract multi-level features from the input image. Concurrently, the counterpart source KB at the receiver utilizes hierarchical residual blocks to generate task-specific knowledge. Furthermore, the task KBs adopt a semantic similarity model to map different task requirements into pre-defined task instructions, thereby facilitating the feature selection of the source KBs. Additionally, we develop a unified residual block-based joint source and channel (JSCC) encoder and two task-specific JSCC decoders to achieve the two image tasks. In particular, a generative diffusion model is adopted to construct the JSCC decoder for the image reconstruction task. Experimental results show that our multi-task generative semantic communication system outperforms previous single-task communication systems in terms of peak signal-to-noise ratio and segmentation accuracy.

Generative Semantic Communication for Joint Image Transmission and Segmentation

TL;DR

This paper tackles the challenge of efficient, multi-task image transmission by introducing a generative semantic communication framework with semantic knowledge bases (KBs) at both ends. The transmitter employs a Swin-Transformer-based source KB to extract hierarchical features, while the receiver's source KB generates task-specific features; task KBs map natural language task requests to discrete instructions via semantic similarity. A unified JSCC encoder (ResNet-based) processes the input together with semantic features, and two task-specific decoders operate in parallel: a diffusion-model-based decoder for image reconstruction and a ResNet-based decoder for image segmentation. Experimental results on DIV2K and PASCAL VOC demonstrate superior PSNR and IoU over baselines, with reduced overhead and improved generalization for multi-task semantic transmission.

Abstract

Semantic communication has emerged as a promising technology for enhancing communication efficiency. However, most existing research emphasizes single-task reconstruction, neglecting model adaptability and generalization across multi-task systems. In this paper, we propose a novel generative semantic communication system that supports both image reconstruction and segmentation tasks. Our approach builds upon semantic knowledge bases (KBs) at both the transmitter and receiver, with each semantic KB comprising a source KB and a task KB. The source KB at the transmitter leverages a hierarchical Swin-Transformer, a generative AI scheme, to extract multi-level features from the input image. Concurrently, the counterpart source KB at the receiver utilizes hierarchical residual blocks to generate task-specific knowledge. Furthermore, the task KBs adopt a semantic similarity model to map different task requirements into pre-defined task instructions, thereby facilitating the feature selection of the source KBs. Additionally, we develop a unified residual block-based joint source and channel (JSCC) encoder and two task-specific JSCC decoders to achieve the two image tasks. In particular, a generative diffusion model is adopted to construct the JSCC decoder for the image reconstruction task. Experimental results show that our multi-task generative semantic communication system outperforms previous single-task communication systems in terms of peak signal-to-noise ratio and segmentation accuracy.

Paper Structure

This paper contains 16 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Generative semantic communication system for joint image reconstruction and segmentation.
  • Figure 2: An illustration of the KB at the transmitter.
  • Figure 3: An illustration of the KB at the receiver.
  • Figure 4: PSNR v.s. SNR in (a) low-resolution image case; (b) high-resolution image case.
  • Figure 5: Visualization of reconstructed images (SNR = 18dB).
  • ...and 1 more figures