Agentic AI Reasoning for Mobile Edge General Intelligence: Fundamentals, Approaches, and Directions
Mingyi Luo, Ruichen Zhang, Xiangwang Hou, Jun Du, Chunxiao Jiang, Yong Ren, Dusit Niyato, Shiwen Mao
TL;DR
The paper tackles the challenge of deploying agentic LLM reasoning at the mobile edge by proposing a joint optimization framework that blends adaptive Chain-of-Thought prompting with a distributed Mixture of Experts architecture. It models reasoning depth as a dynamic resource and optimizes token routing, transmission power, and reasoning depth using a DPPO-based approach, enabling real-time, energy-efficient edge inference with high accuracy. Through local deployments and system-level simulations, the approach demonstrates substantial energy savings and latency satisfaction, achieving near real-time performance in resource-constrained MEGI environments. The work advances practical MEGI by providing a principled, scalable architecture for edge reasoning and outlining directions for security, multimodal reasoning, and decentralized collaboration.
Abstract
The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has led to the development of Mobile Edge General Intelligence (MEGI), which brings real-time, privacy-preserving reasoning to the network edge. However, deploying LLM-based agentic AI reasoning in MEGI environments poses significant challenges due to the high computational demands of reasoning and the limited resources of edge devices. To address these challenges, we propose a joint optimization framework for efficient LLM reasoning deployment in MEGI. First, we systematically review enhancement methods to identify mechanisms suitable for edge adaptation. Subsequently, we present a distributed framework that synergizes reasoning enhancement via adaptive CoT prompting with scalable deployment through a distributed MoE architecture. An important innovation of this approach involves modeling reasoning depth as a dynamic network resource variable, which is optimized jointly with expert activation and transmission power. This mechanism allows the system to dynamically regulate expert networks and reasoning complexity according to task requirements and device capabilities. Experimental evaluations in mobile edge environments demonstrate that the proposed framework effectively balances reasoning quality and resource efficiency. The results show that with less than one second of additional inference time, both accuracy and latency satisfaction rate can reach 90\%, validating the practical viability of deploying sophisticated LLM reasoning in resource-constrained MEGI systems.
