Table of Contents
Fetching ...

Towards Edge General Intelligence: Knowledge Distillation for Mobile Agentic AI

Yuxuan Wu, Linghan Ma, Ruichen Zhang, Yinqiu Liu, Dusit Niyato, Shunpu Tang, Zehui Xiong, Zhu Han, Zhaohui Yang, Kaibin Huang, Zhaoyang Zhang, Kai-Kit Wong

TL;DR

This paper defines Edge General Intelligence (EGI) as cloud-like general cognition at the wireless edge and identifies the deployment chasm imposed by resource constraints. It advocates Knowledge Distillation (KD) as a central strategy to transfer the reasoning and control capabilities of large teachers to compact edge models, enabling on-device perception, planning, action, and memory. The authors systematically categorize KD techniques (response-, feature-, and relation-based) and map them to agentic AI components, while discussing architectures like Mamba and RWKV and cross-architecture distillation to close the performance gap. They survey KD-enabled wireless tasks (e.g., channel estimation, CSI feedback, modulation classification) and domain applications (UAVs, autonomous vehicles, robotics, IoT), and highlight challenges around benchmarking, robustness, and ethics with future directions toward safety-aware, modality-agnostic, collaborative KD at the edge. The work provides a comprehensive reference for advancing KD-driven mobile agentic AI toward practical, edge-resident general intelligence.

Abstract

Edge General Intelligence (EGI) represents a paradigm shift in mobile edge computing, where intelligent agents operate autonomously in dynamic, resource-constrained environments. However, the deployment of advanced agentic AI models on mobile and edge devices faces significant challenges due to limited computation, energy, and storage resources. To address these constraints, this survey investigates the integration of Knowledge Distillation (KD) into EGI, positioning KD as a key enabler for efficient, communication-aware, and scalable intelligence at the wireless edge. In particular, we emphasize KD techniques specifically designed for wireless communication and mobile networking, such as channel-aware self-distillation, cross-model Channel State Information (CSI) feedback distillation, and robust modulation/classification distillation. Furthermore, we review novel architectures natively suited for KD and edge deployment, such as Mamba, RWKV (Receptance, Weight, Key, Value) and Cross-Architecture distillation, which enhance generalization capabilities. Subsequently, we examine diverse applications in which KD-driven architectures enable EGI across vision, speech, and multimodal tasks. Finally, we highlight the key challenges and future directions for KD in EGI. This survey aims to provide a comprehensive reference for researchers exploring KD-driven frameworks for mobile agentic AI in the era of EGI.

Towards Edge General Intelligence: Knowledge Distillation for Mobile Agentic AI

TL;DR

This paper defines Edge General Intelligence (EGI) as cloud-like general cognition at the wireless edge and identifies the deployment chasm imposed by resource constraints. It advocates Knowledge Distillation (KD) as a central strategy to transfer the reasoning and control capabilities of large teachers to compact edge models, enabling on-device perception, planning, action, and memory. The authors systematically categorize KD techniques (response-, feature-, and relation-based) and map them to agentic AI components, while discussing architectures like Mamba and RWKV and cross-architecture distillation to close the performance gap. They survey KD-enabled wireless tasks (e.g., channel estimation, CSI feedback, modulation classification) and domain applications (UAVs, autonomous vehicles, robotics, IoT), and highlight challenges around benchmarking, robustness, and ethics with future directions toward safety-aware, modality-agnostic, collaborative KD at the edge. The work provides a comprehensive reference for advancing KD-driven mobile agentic AI toward practical, edge-resident general intelligence.

Abstract

Edge General Intelligence (EGI) represents a paradigm shift in mobile edge computing, where intelligent agents operate autonomously in dynamic, resource-constrained environments. However, the deployment of advanced agentic AI models on mobile and edge devices faces significant challenges due to limited computation, energy, and storage resources. To address these constraints, this survey investigates the integration of Knowledge Distillation (KD) into EGI, positioning KD as a key enabler for efficient, communication-aware, and scalable intelligence at the wireless edge. In particular, we emphasize KD techniques specifically designed for wireless communication and mobile networking, such as channel-aware self-distillation, cross-model Channel State Information (CSI) feedback distillation, and robust modulation/classification distillation. Furthermore, we review novel architectures natively suited for KD and edge deployment, such as Mamba, RWKV (Receptance, Weight, Key, Value) and Cross-Architecture distillation, which enhance generalization capabilities. Subsequently, we examine diverse applications in which KD-driven architectures enable EGI across vision, speech, and multimodal tasks. Finally, we highlight the key challenges and future directions for KD in EGI. This survey aims to provide a comprehensive reference for researchers exploring KD-driven frameworks for mobile agentic AI in the era of EGI.

Paper Structure

This paper contains 48 sections, 7 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The structure of this survey
  • Figure 2: An overview of the workflow of Agentic AI. Perception gathers and interprets multi-model information. Planning devises a sequence of actions to achieve a high-level goal. Action interacts with and affects its environment. Memory enables an agent to retain information over time.
  • Figure 3: Knowledge Distillation compresses large models into lightweight ones for deployment, enabling Mobile Agentic AI to accumulate experiences and adapt strategies. These capabilities collectively foster Edge General Intelligence, which empowers IoT edge systems such as UAVs, autonomous vehicles, and smart healthcare.
  • Figure 4: Strategic Recommendations of Policy Distillation, Interpretable Strategic Teaching and CoT-Based KD. Policy Distillation for stable environments with well-defined tasks PKD; Interpretable Strategy Teaching for high-stakes domains requiring human-computer interaction int; CoT-Based KD for scenarios requiring the compression of general-purpose agents CoTKD
  • Figure 5: An overview of KD techniques across different modalities. (A) Language: distillation from large language models such as BERT into compact models (e.g., DistilBERT). (B) Vision: KD mechanisms including distillation tokens, multi-granularity distillation, and cross-KD applied to light classifiers, detectors, and segmentors. (C) Auditory: specific KD strategies such as sequence-level KD, label-free KD, grouped KD, multi-representation KD, and temporal KD for tasks like ASR, speaker identification, and audio classification. (D) Multimodal: integration of KD across modalities with multi-teacher settings, robust handling of missing modalities, and cross-modal knowledge transfer for multimodal student models.
  • ...and 1 more figures