Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment
Xubin Wang, Qing Li, Weijia Jia
TL;DR
This paper addresses the challenge of deploying reasoning-capable large language models and autonomous AI agents on resource-constrained edge devices. It proposes a unified Cognitive Edge Computing framework that spans data, model, and system optimization, plus cross-layer collaboration with cloud and edge resources to preserve cognitive fidelity under tight budgets. The authors synthesize advances in efficient Transformer design, small language models, and agent tool use, and offer a standardized evaluation protocol for latency, energy, accuracy, robustness, privacy, and sustainability. The work highlights practical deployment blueprints, the emergence of edge-native architectures, and future research directions, illustrating how edge intelligence can achieve real-time, privacy-preserving, and scalable cognition across diverse domains. Overall, the survey emphasizes principled cross-layer co-design and standardized benchmarking as essential for making pervasive cognitive edge computing a reliable, sustainable reality.
Abstract
This article surveys Cognitive Edge Computing as a practical and methodical pathway for deploying reasoning-capable Large Language Models (LLMs) and autonomous AI agents on resource-constrained devices at the network edge. We present a unified, cognition-preserving framework spanning: (1) model optimization (quantization, sparsity, low-rank adaptation, distillation) aimed at retaining multi-step reasoning under tight memory/compute budgets; (2) system architecture (on-device inference, elastic offloading, cloud-edge collaboration) that trades off latency, energy, privacy, and capacity; and (3) adaptive intelligence (context compression, dynamic routing, federated personalization) that tailors computation to task difficulty and device constraints. We synthesize advances in efficient Transformer design, multimodal integration, hardware-aware compilation, privacy-preserving learning, and agentic tool use, and map them to edge-specific operating envelopes. We further outline a standardized evaluation protocol covering latency, throughput, energy per token, accuracy, robustness, privacy, and sustainability, with explicit measurement assumptions to enhance comparability. Remaining challenges include modality-aware reasoning benchmarks, transparent and reproducible energy reporting, edge-oriented safety/alignment evaluation, and multi-agent testbeds. We conclude with practitioner guidelines for cross-layer co-design of algorithms, runtime, and hardware to deliver reliable, efficient, and privacy-preserving cognitive capabilities on edge devices.
