Table of Contents
Fetching ...

SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework

Yubo Peng, Luping Xiang, Kun Yang, Feibo Jiang, Kezhi Wang, Dapeng Oliver Wu

TL;DR

This work tackles the limitations of single-modality sensing and decoupled sensing-communication by proposing SIMAC, a semantic-driven framework for integrated multimodal sensing and communication. It combines a Multimodal Semantic Fusion network, an LLM-based semantic encoder, and a task-oriented semantic decoder to jointly sense and transmit meaning, rather than raw data, over wireless channels. A multi-task learning objective enables diversified sensing services (distance, angle, velocity, and image reconstruction) while maintaining low communication overhead. Experimental results on VIRAT-derived data demonstrate improved sensing accuracy and robust multimodal reconstruction across varying SNRs, highlighting SIMAC's potential for real-time, bandwidth-constrained sensing applications.

Abstract

Traditional single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency in bandwidth-constrained environments. Additionally, single-task-oriented sensing systems fail to address users' diverse demands. To overcome these challenges, we propose a semantic-driven integrated multimodal sensing and communication (SIMAC) framework. This framework leverages a joint source-channel coding architecture to achieve simultaneous sensing decoding and transmission of sensing results. Specifically, SIMAC first introduces a multimodal semantic fusion (MSF) network, which employs two extractors to extract semantic information from radar signals and images, respectively. MSF then applies cross-attention mechanisms to fuse these unimodal features and generate multimodal semantic representations. Secondly, we present a large language model (LLM)-based semantic encoder (LSE), where relevant communication parameters and multimodal semantics are mapped into a unified latent space and input to the LLM, enabling channel-adaptive semantic encoding. Thirdly, a task-oriented sensing semantic decoder (SSD) is proposed, in which different decoded heads are designed according to the specific needs of tasks. Simultaneously, a multi-task learning strategy is introduced to train the SIMAC framework, achieving diverse sensing services. Finally, experimental simulations demonstrate that the proposed framework achieves diverse sensing services and higher accuracy.

SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework

TL;DR

This work tackles the limitations of single-modality sensing and decoupled sensing-communication by proposing SIMAC, a semantic-driven framework for integrated multimodal sensing and communication. It combines a Multimodal Semantic Fusion network, an LLM-based semantic encoder, and a task-oriented semantic decoder to jointly sense and transmit meaning, rather than raw data, over wireless channels. A multi-task learning objective enables diversified sensing services (distance, angle, velocity, and image reconstruction) while maintaining low communication overhead. Experimental results on VIRAT-derived data demonstrate improved sensing accuracy and robust multimodal reconstruction across varying SNRs, highlighting SIMAC's potential for real-time, bandwidth-constrained sensing applications.

Abstract

Traditional single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency in bandwidth-constrained environments. Additionally, single-task-oriented sensing systems fail to address users' diverse demands. To overcome these challenges, we propose a semantic-driven integrated multimodal sensing and communication (SIMAC) framework. This framework leverages a joint source-channel coding architecture to achieve simultaneous sensing decoding and transmission of sensing results. Specifically, SIMAC first introduces a multimodal semantic fusion (MSF) network, which employs two extractors to extract semantic information from radar signals and images, respectively. MSF then applies cross-attention mechanisms to fuse these unimodal features and generate multimodal semantic representations. Secondly, we present a large language model (LLM)-based semantic encoder (LSE), where relevant communication parameters and multimodal semantics are mapped into a unified latent space and input to the LLM, enabling channel-adaptive semantic encoding. Thirdly, a task-oriented sensing semantic decoder (SSD) is proposed, in which different decoded heads are designed according to the specific needs of tasks. Simultaneously, a multi-task learning strategy is introduced to train the SIMAC framework, achieving diverse sensing services. Finally, experimental simulations demonstrate that the proposed framework achieves diverse sensing services and higher accuracy.

Paper Structure

This paper contains 41 sections, 57 equations, 13 figures, 1 table, 5 algorithms.

Figures (13)

  • Figure 1: The illustration of the integrated SC and multimodal sensing system model.
  • Figure 2: The illustration of the motion parameters of the ST.
  • Figure 3: The network design of the proposed SIMAC framework.
  • Figure 4: Visualization of the SIMAC framework's running process.
  • Figure 5: Comparison results of raw sensing images and reconstructed images.
  • ...and 8 more figures