Table of Contents
Fetching ...

Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency

Xijun Wang, Dongshan Ye, Chenyuan Feng, Howard H. Yang, Xiang Chen, Tony Q. S. Quek

TL;DR

This work addresses the interpretability, operability, and compatibility gaps in image semantic communication by proposing a trustworthy ISC framework that decouples transmit and receive processes and leverages explainable semantics in the form of image semantic text and segmentation maps. The receiver employs GenAI based reconstruction and multitask processing guided by a semantic level multi rate transmission protocol, with a correlation driven feedback loop, a policy controller, and a shared vector database to adapt data transmission to task requirements. Experimental results on COCO show substantial improvements in image captioning quality, competitive or superior reconstruction fidelity, and significant transmission efficiency gains, including up to 90% data reduction in semantic transmission. The framework demonstrates strong potential for flexible, task-aware, and efficient ISC in future 6G scenarios, while outlining open issues such as device constraints, privacy, and personalized transmission for real-world deployment.

Abstract

Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. However, existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. To address these limitations, we propose a novel trustworthy ISC framework. This approach leverages text extraction and segmentation mapping techniques to convert images into explainable semantics, while employing Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks. We also introduce a multi-rate ISC transmission protocol that dynamically adapts to both the received explainable semantic content and specific task requirements at the receiver. Simulation results demonstrate that our framework achieves explainable learning, decoupled training, and compatible transmission in various application scenarios. Finally, some intriguing research directions and application scenarios are identified.

Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency

TL;DR

This work addresses the interpretability, operability, and compatibility gaps in image semantic communication by proposing a trustworthy ISC framework that decouples transmit and receive processes and leverages explainable semantics in the form of image semantic text and segmentation maps. The receiver employs GenAI based reconstruction and multitask processing guided by a semantic level multi rate transmission protocol, with a correlation driven feedback loop, a policy controller, and a shared vector database to adapt data transmission to task requirements. Experimental results on COCO show substantial improvements in image captioning quality, competitive or superior reconstruction fidelity, and significant transmission efficiency gains, including up to 90% data reduction in semantic transmission. The framework demonstrates strong potential for flexible, task-aware, and efficient ISC in future 6G scenarios, while outlining open issues such as device constraints, privacy, and personalized transmission for real-world deployment.

Abstract

Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. However, existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. To address these limitations, we propose a novel trustworthy ISC framework. This approach leverages text extraction and segmentation mapping techniques to convert images into explainable semantics, while employing Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks. We also introduce a multi-rate ISC transmission protocol that dynamically adapts to both the received explainable semantic content and specific task requirements at the receiver. Simulation results demonstrate that our framework achieves explainable learning, decoupled training, and compatible transmission in various application scenarios. Finally, some intriguing research directions and application scenarios are identified.
Paper Structure (22 sections, 4 figures, 2 tables)

This paper contains 22 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: An end-to-end trustworthy ISC framework based on system-compatible explainable semantics.
  • Figure 2: Details of key components in the trustworthy ISC framework, where A-seg and B-seg denote the segmentation maps based on semantic segmentation and Segment Anything, respectively, $T_N$ and $I_N$ denote the generated text and image feature vector, $Z_N$ denotes the additional noise for generative model training, NER stands for Named Entity Recognition, and DFS-block stands for Deep text-image-segmentation Fusion Block.
  • Figure 3: Visualization of image reconstruction quality for various schemes. The first column contains the original images, and the remaining columns consist of the reconstructed images generated by JPEG-q1 (refer to JPEG compression algorithm at its lowest quality), SPADE SPADE (refer to the semantic image synthesis algorithm with a spatially-adaptive normalization), GALIP GALIP (refer to the ingenious combination of CLIP and generative adversarial networks), ES-IRM-A (refer to an invariant algorithm that integrates ES-IRM with image semantic text and exclusively utilizes A-seg maps for reconstruction), and ES-IRM-B (refer to a similar approach but employs B-seg maps instead of A-seg), respectively.
  • Figure 4: Schematic diagram for single-user, single-task, and multi-rate communication.