M4SC: An MLLM-based Multi-modal, Multi-task and Multi-user Semantic Communication System
Feibo Jiang, Siwei Tu, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan
TL;DR
This work addresses the inefficiencies of traditional bit-centric transmission by introducing M4SC, an MLLM-based framework that unifies multi-modal data into a shared semantic space. The approach combines a Kolmogorov-Arnold Network (KAN) for cross-modal alignment, task-instruction following for robust multi-task performance, and a semantic sharing mechanism to optimize multi-user transmissions, all under a joint KAN-LLM-channel encoding scheme. Experimental validation across multi-modal, multi-task, and multi-user settings demonstrates improved semantic accuracy, reduced data transmission, and resilience to channel variations. The results suggest that MLLM-driven semantic communications can substantially enhance efficiency and scalability for next-generation wireless systems.
Abstract
Multi-modal Large Language Models (MLLMs) are capable of precisely extracting high-level semantic information from multi-modal data, enabling multi-task understanding and generation. This capability facilitates more efficient and intelligent data transmission in semantic communications. In this paper, we design a tailored MLLM for semantic communication and propose an MLLM-based Multi-modal, Multi-task and Multi-user Semantic Communication (M4SC) system. First, we utilize the Kolmogorov-Arnold Network (KAN) to achieve multi-modal alignment in MLLMs, thereby enhancing the accuracy of semantics representation in the semantic space across different modalities. Next, we introduce a multi-task fine-tuning approach based on task instruction following, which leverages a unified task instruction template to describe various semantic communication tasks, improving the MLLM's ability to follow instructions across multiple tasks. Additionally, by designing a semantic sharing mechanism, we transmit the public and private semantic information of multiple users separately, thus increasing the efficiency of semantic communication. Finally, we employ a joint KAN-LLM-channel coding strategy to comprehensively enhance the performance of the semantic communication system in complex communication environments. Experimental results validate the effectiveness and robustness of the proposed M4SC in multi-modal, multi-task, and multi-user scenarios.
