Wireless Agentic AI with Retrieval-Augmented Multimodal Semantic Perception
Guangyuan Liu, Yinqiu Liu, Ruichen Zhang, Hongyang Du, Dusit Niyato, Zehui Xiong, Sumei Sun, Abbas Jamalipour
TL;DR
RAMSemCom addresses the challenge of exchanging semantically rich multimodal information in bandwidth-limited wireless multi-agent systems. It combines retrieval-augmented perception with a semantic communication framework and a DRL-based scheduler to adaptively select and transmit only the most relevant multimodal content. The approach employs iterative semantic refinement, top-$k$ patch retrieval, and centralized scheduling to balance semantic fidelity against bandwidth usage. A case study in multi-agent autonomous driving demonstrates faster task completion and reduced communication overhead compared with baselines, underscoring practical value for real-time cooperative AI.
Abstract
The rapid development of multimodal AI and Large Language Models (LLMs) has greatly enhanced real-time interaction, decision-making, and collaborative tasks. However, in wireless multi-agent scenarios, limited bandwidth poses significant challenges to exchanging semantically rich multimodal information efficiently. Traditional semantic communication methods, though effective, struggle with redundancy and loss of crucial details. To overcome these challenges, we propose a Retrieval-Augmented Multimodal Semantic Communication (RAMSemCom) framework. RAMSemCom incorporates iterative, retrieval-driven semantic refinement tailored for distributed multi-agent environments, enabling efficient exchange of critical multimodal elements through local caching and selective transmission. Our approach dynamically optimizes retrieval using deep reinforcement learning (DRL) to balance semantic fidelity with bandwidth constraints. A comprehensive case study on multi-agent autonomous driving demonstrates that our DRL-based retrieval strategy significantly improves task completion efficiency and reduces communication overhead compared to baseline methods.
