Table of Contents
Fetching ...

M4SC: An MLLM-based Multi-modal, Multi-task and Multi-user Semantic Communication System

Feibo Jiang, Siwei Tu, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan

TL;DR

This work addresses the inefficiencies of traditional bit-centric transmission by introducing M4SC, an MLLM-based framework that unifies multi-modal data into a shared semantic space. The approach combines a Kolmogorov-Arnold Network (KAN) for cross-modal alignment, task-instruction following for robust multi-task performance, and a semantic sharing mechanism to optimize multi-user transmissions, all under a joint KAN-LLM-channel encoding scheme. Experimental validation across multi-modal, multi-task, and multi-user settings demonstrates improved semantic accuracy, reduced data transmission, and resilience to channel variations. The results suggest that MLLM-driven semantic communications can substantially enhance efficiency and scalability for next-generation wireless systems.

Abstract

Multi-modal Large Language Models (MLLMs) are capable of precisely extracting high-level semantic information from multi-modal data, enabling multi-task understanding and generation. This capability facilitates more efficient and intelligent data transmission in semantic communications. In this paper, we design a tailored MLLM for semantic communication and propose an MLLM-based Multi-modal, Multi-task and Multi-user Semantic Communication (M4SC) system. First, we utilize the Kolmogorov-Arnold Network (KAN) to achieve multi-modal alignment in MLLMs, thereby enhancing the accuracy of semantics representation in the semantic space across different modalities. Next, we introduce a multi-task fine-tuning approach based on task instruction following, which leverages a unified task instruction template to describe various semantic communication tasks, improving the MLLM's ability to follow instructions across multiple tasks. Additionally, by designing a semantic sharing mechanism, we transmit the public and private semantic information of multiple users separately, thus increasing the efficiency of semantic communication. Finally, we employ a joint KAN-LLM-channel coding strategy to comprehensively enhance the performance of the semantic communication system in complex communication environments. Experimental results validate the effectiveness and robustness of the proposed M4SC in multi-modal, multi-task, and multi-user scenarios.

M4SC: An MLLM-based Multi-modal, Multi-task and Multi-user Semantic Communication System

TL;DR

This work addresses the inefficiencies of traditional bit-centric transmission by introducing M4SC, an MLLM-based framework that unifies multi-modal data into a shared semantic space. The approach combines a Kolmogorov-Arnold Network (KAN) for cross-modal alignment, task-instruction following for robust multi-task performance, and a semantic sharing mechanism to optimize multi-user transmissions, all under a joint KAN-LLM-channel encoding scheme. Experimental validation across multi-modal, multi-task, and multi-user settings demonstrates improved semantic accuracy, reduced data transmission, and resilience to channel variations. The results suggest that MLLM-driven semantic communications can substantially enhance efficiency and scalability for next-generation wireless systems.

Abstract

Multi-modal Large Language Models (MLLMs) are capable of precisely extracting high-level semantic information from multi-modal data, enabling multi-task understanding and generation. This capability facilitates more efficient and intelligent data transmission in semantic communications. In this paper, we design a tailored MLLM for semantic communication and propose an MLLM-based Multi-modal, Multi-task and Multi-user Semantic Communication (M4SC) system. First, we utilize the Kolmogorov-Arnold Network (KAN) to achieve multi-modal alignment in MLLMs, thereby enhancing the accuracy of semantics representation in the semantic space across different modalities. Next, we introduce a multi-task fine-tuning approach based on task instruction following, which leverages a unified task instruction template to describe various semantic communication tasks, improving the MLLM's ability to follow instructions across multiple tasks. Additionally, by designing a semantic sharing mechanism, we transmit the public and private semantic information of multiple users separately, thus increasing the efficiency of semantic communication. Finally, we employ a joint KAN-LLM-channel coding strategy to comprehensively enhance the performance of the semantic communication system in complex communication environments. Experimental results validate the effectiveness and robustness of the proposed M4SC in multi-modal, multi-task, and multi-user scenarios.

Paper Structure

This paper contains 45 sections, 6 figures.

Figures (6)

  • Figure 1: The structure of three different semantic communications.
  • Figure 2: The structure of the proposed MLLM.
  • Figure 3: The structure of the proposed M4SC.
  • Figure 4: The three-stage training process of M4SC.
  • Figure 5: Comparison of multi-modal and multi-task performance.
  • ...and 1 more figures