When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and Alignment
Minrui Xu, Dusit Niyato, Jiawen Kang, Zehui Xiong, Shiwen Mao, Zhu Han, Dong In Kim, Khaled B. Letaief
TL;DR
The paper addresses the challenge of running multimodal LLM agents on resource-constrained mobile devices within 6G networks while meeting latency and privacy needs. It introduces a split-learning architecture that distributes perception, grounding, and alignment modules between mobile devices and edge servers to enable collaborative, long-horizon interactions, paired with a novel model caching algorithm to improve context utilization. Key contributions include the architecture design, mapping of inter-module communications to 6G network functions (e.g., integrated sensing and communication, digital twins, task-oriented communications), and the caching strategy to reduce network costs. This approach promises democratized access to AI assistants on mobile devices, reducing delay and privacy concerns, and scaling AI-enabled services in 6G-enabled environments.
Abstract
AI agents based on multimodal large language models (LLMs) are expected to revolutionize human-computer interaction and offer more personalized assistant services across various domains like healthcare, education, manufacturing, and entertainment. Deploying LLM agents in 6G networks enables users to access previously expensive AI assistant services via mobile devices democratically, thereby reducing interaction latency and better preserving user privacy. Nevertheless, the limited capacity of mobile devices constrains the effectiveness of deploying and executing local LLMs, which necessitates offloading complex tasks to global LLMs running on edge servers during long-horizon interactions. In this article, we propose a split learning system for LLM agents in 6G networks leveraging the collaboration between mobile devices and edge servers, where multiple LLMs with different roles are distributed across mobile devices and edge servers to perform user-agent interactive tasks collaboratively. In the proposed system, LLM agents are split into perception, grounding, and alignment modules, facilitating inter-module communications to meet extended user requirements on 6G network functions, including integrated sensing and communication, digital twins, and task-oriented communications. Furthermore, we introduce a novel model caching algorithm for LLMs within the proposed system to improve model utilization in context, thus reducing network costs of the collaborative mobile and edge LLM agents.
