Generative AI-Enhanced Multi-Modal Semantic Communication in Internet of Vehicles: System Design and Methodologies
Jiayi Lu, Wanting Yang, Zehui Xiong, Chengwen Xing, Rahim Tafazolli, Tony Q. S. Quek, Merouane Debbah
TL;DR
The paper tackles the challenge of efficient, reliable multi-modal semantic communication in highly dynamic IoV networks. It introduces G-MSC, a Generative AI-enhanced SemCom framework that optimizes semantic encoding, channel modeling/estimation, and decoding while supporting both analog and digital transmission modes. A case study on predictive BEV tasks demonstrates that diffusion-model refinement significantly improves IoU and visual quality, and enables forward-looking BEV predictions for multiple future frames. The findings illustrate the potential of integrating GAI into SemCom to reduce data transmission while preserving semantic integrity, with promising directions toward hybrid transmission, semantic scheduling, and cross-task coordination.
Abstract
Vehicle-to-everything (V2X) communication supports numerous tasks, from driving safety to entertainment services. To achieve a holistic view, vehicles are typically equipped with multiple sensors to compensate for undetectable blind spots. However, processing large volumes of multi-modal data increases transmission load, while the dynamic nature of vehicular networks adds to transmission instability. To address these challenges, we propose a novel framework, Generative Artificial intelligence (GAI)-enhanced multi-modal semantic communication (SemCom), referred to as G-MSC, designed to handle various vehicular network tasks by employing suitable analog or digital transmission. GAI presents a promising opportunity to transform the SemCom framework by significantly enhancing semantic encoding to facilitate the optimized integration of multi-modal information, enhancing channel robustness, and fortifying semantic decoding against noise interference. To validate the effectiveness of the G-MSC framework, we conduct a case study showcasing its performance in vehicular communication networks for predictive tasks. The experimental results show that the design achieves reliable and efficient communication in V2X networks. In the end, we present future research directions on G-MSC.
