Table of Contents
Fetching ...

Agentic AI Reasoning for Mobile Edge General Intelligence: Fundamentals, Approaches, and Directions

Mingyi Luo, Ruichen Zhang, Xiangwang Hou, Jun Du, Chunxiao Jiang, Yong Ren, Dusit Niyato, Shiwen Mao

TL;DR

The paper tackles the challenge of deploying agentic LLM reasoning at the mobile edge by proposing a joint optimization framework that blends adaptive Chain-of-Thought prompting with a distributed Mixture of Experts architecture. It models reasoning depth as a dynamic resource and optimizes token routing, transmission power, and reasoning depth using a DPPO-based approach, enabling real-time, energy-efficient edge inference with high accuracy. Through local deployments and system-level simulations, the approach demonstrates substantial energy savings and latency satisfaction, achieving near real-time performance in resource-constrained MEGI environments. The work advances practical MEGI by providing a principled, scalable architecture for edge reasoning and outlining directions for security, multimodal reasoning, and decentralized collaboration.

Abstract

The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has led to the development of Mobile Edge General Intelligence (MEGI), which brings real-time, privacy-preserving reasoning to the network edge. However, deploying LLM-based agentic AI reasoning in MEGI environments poses significant challenges due to the high computational demands of reasoning and the limited resources of edge devices. To address these challenges, we propose a joint optimization framework for efficient LLM reasoning deployment in MEGI. First, we systematically review enhancement methods to identify mechanisms suitable for edge adaptation. Subsequently, we present a distributed framework that synergizes reasoning enhancement via adaptive CoT prompting with scalable deployment through a distributed MoE architecture. An important innovation of this approach involves modeling reasoning depth as a dynamic network resource variable, which is optimized jointly with expert activation and transmission power. This mechanism allows the system to dynamically regulate expert networks and reasoning complexity according to task requirements and device capabilities. Experimental evaluations in mobile edge environments demonstrate that the proposed framework effectively balances reasoning quality and resource efficiency. The results show that with less than one second of additional inference time, both accuracy and latency satisfaction rate can reach 90\%, validating the practical viability of deploying sophisticated LLM reasoning in resource-constrained MEGI systems.

Agentic AI Reasoning for Mobile Edge General Intelligence: Fundamentals, Approaches, and Directions

TL;DR

The paper tackles the challenge of deploying agentic LLM reasoning at the mobile edge by proposing a joint optimization framework that blends adaptive Chain-of-Thought prompting with a distributed Mixture of Experts architecture. It models reasoning depth as a dynamic resource and optimizes token routing, transmission power, and reasoning depth using a DPPO-based approach, enabling real-time, energy-efficient edge inference with high accuracy. Through local deployments and system-level simulations, the approach demonstrates substantial energy savings and latency satisfaction, achieving near real-time performance in resource-constrained MEGI environments. The work advances practical MEGI by providing a principled, scalable architecture for edge reasoning and outlining directions for security, multimodal reasoning, and decentralized collaboration.

Abstract

The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has led to the development of Mobile Edge General Intelligence (MEGI), which brings real-time, privacy-preserving reasoning to the network edge. However, deploying LLM-based agentic AI reasoning in MEGI environments poses significant challenges due to the high computational demands of reasoning and the limited resources of edge devices. To address these challenges, we propose a joint optimization framework for efficient LLM reasoning deployment in MEGI. First, we systematically review enhancement methods to identify mechanisms suitable for edge adaptation. Subsequently, we present a distributed framework that synergizes reasoning enhancement via adaptive CoT prompting with scalable deployment through a distributed MoE architecture. An important innovation of this approach involves modeling reasoning depth as a dynamic network resource variable, which is optimized jointly with expert activation and transmission power. This mechanism allows the system to dynamically regulate expert networks and reasoning complexity according to task requirements and device capabilities. Experimental evaluations in mobile edge environments demonstrate that the proposed framework effectively balances reasoning quality and resource efficiency. The results show that with less than one second of additional inference time, both accuracy and latency satisfaction rate can reach 90\%, validating the practical viability of deploying sophisticated LLM reasoning in resource-constrained MEGI systems.

Paper Structure

This paper contains 18 sections, 4 figures.

Figures (4)

  • Figure 1: Comparison of methods for enhancing LLM reasoning capabilities across different phases of model developmentxu2025towards. The diagram categorizes approaches into three phases: Pre-Training (e.g., Model Scaling and MoE), fine-tuning (e.g., SFT and RLHF), and Inference (e.g., CoT Prompting and Self-Consistency).For each method, the corresponding workflow, advantages, and limitations are illustrated.
  • Figure 2: An overview of the proposed joint optimization framework for LLM reasoning for MEGI. The architecture consists of three main components: (A) BS Control Unit, (B) distributed edge devices hosting MoE-based expert networks, and (C) integrated CoT reasoning modules. It illustrates the end-to-end inference workflow, including expert selection, token assignment, parallel edge inference, and result aggregation.
  • Figure 3: Summary of output correctness and inference time for Qwen3-0.6B and Qwen3-0.6B-Base with and without CoT prompts on a mobile edge device.
  • Figure 4: Experimental results of comparing energy consumption, accuracy, and latency satisfaction for four inference schemes.