Table of Contents
Fetching ...

Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment

Xubin Wang, Qing Li, Weijia Jia

TL;DR

This paper addresses the challenge of deploying reasoning-capable large language models and autonomous AI agents on resource-constrained edge devices. It proposes a unified Cognitive Edge Computing framework that spans data, model, and system optimization, plus cross-layer collaboration with cloud and edge resources to preserve cognitive fidelity under tight budgets. The authors synthesize advances in efficient Transformer design, small language models, and agent tool use, and offer a standardized evaluation protocol for latency, energy, accuracy, robustness, privacy, and sustainability. The work highlights practical deployment blueprints, the emergence of edge-native architectures, and future research directions, illustrating how edge intelligence can achieve real-time, privacy-preserving, and scalable cognition across diverse domains. Overall, the survey emphasizes principled cross-layer co-design and standardized benchmarking as essential for making pervasive cognitive edge computing a reliable, sustainable reality.

Abstract

This article surveys Cognitive Edge Computing as a practical and methodical pathway for deploying reasoning-capable Large Language Models (LLMs) and autonomous AI agents on resource-constrained devices at the network edge. We present a unified, cognition-preserving framework spanning: (1) model optimization (quantization, sparsity, low-rank adaptation, distillation) aimed at retaining multi-step reasoning under tight memory/compute budgets; (2) system architecture (on-device inference, elastic offloading, cloud-edge collaboration) that trades off latency, energy, privacy, and capacity; and (3) adaptive intelligence (context compression, dynamic routing, federated personalization) that tailors computation to task difficulty and device constraints. We synthesize advances in efficient Transformer design, multimodal integration, hardware-aware compilation, privacy-preserving learning, and agentic tool use, and map them to edge-specific operating envelopes. We further outline a standardized evaluation protocol covering latency, throughput, energy per token, accuracy, robustness, privacy, and sustainability, with explicit measurement assumptions to enhance comparability. Remaining challenges include modality-aware reasoning benchmarks, transparent and reproducible energy reporting, edge-oriented safety/alignment evaluation, and multi-agent testbeds. We conclude with practitioner guidelines for cross-layer co-design of algorithms, runtime, and hardware to deliver reliable, efficient, and privacy-preserving cognitive capabilities on edge devices.

Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment

TL;DR

This paper addresses the challenge of deploying reasoning-capable large language models and autonomous AI agents on resource-constrained edge devices. It proposes a unified Cognitive Edge Computing framework that spans data, model, and system optimization, plus cross-layer collaboration with cloud and edge resources to preserve cognitive fidelity under tight budgets. The authors synthesize advances in efficient Transformer design, small language models, and agent tool use, and offer a standardized evaluation protocol for latency, energy, accuracy, robustness, privacy, and sustainability. The work highlights practical deployment blueprints, the emergence of edge-native architectures, and future research directions, illustrating how edge intelligence can achieve real-time, privacy-preserving, and scalable cognition across diverse domains. Overall, the survey emphasizes principled cross-layer co-design and standardized benchmarking as essential for making pervasive cognitive edge computing a reliable, sustainable reality.

Abstract

This article surveys Cognitive Edge Computing as a practical and methodical pathway for deploying reasoning-capable Large Language Models (LLMs) and autonomous AI agents on resource-constrained devices at the network edge. We present a unified, cognition-preserving framework spanning: (1) model optimization (quantization, sparsity, low-rank adaptation, distillation) aimed at retaining multi-step reasoning under tight memory/compute budgets; (2) system architecture (on-device inference, elastic offloading, cloud-edge collaboration) that trades off latency, energy, privacy, and capacity; and (3) adaptive intelligence (context compression, dynamic routing, federated personalization) that tailors computation to task difficulty and device constraints. We synthesize advances in efficient Transformer design, multimodal integration, hardware-aware compilation, privacy-preserving learning, and agentic tool use, and map them to edge-specific operating envelopes. We further outline a standardized evaluation protocol covering latency, throughput, energy per token, accuracy, robustness, privacy, and sustainability, with explicit measurement assumptions to enhance comparability. Remaining challenges include modality-aware reasoning benchmarks, transparent and reproducible energy reporting, edge-oriented safety/alignment evaluation, and multi-agent testbeds. We conclude with practitioner guidelines for cross-layer co-design of algorithms, runtime, and hardware to deliver reliable, efficient, and privacy-preserving cognitive capabilities on edge devices.
Paper Structure (42 sections, 8 figures, 4 tables)

This paper contains 42 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Cognitive Edge Computing Framework: Integrated approach for deploying reasoning-capable LLMs and autonomous Agents on resource-constrained edge devices. The framework consists of four interconnected components: (1) Cognitive Challenges (red) addressing reasoning preservation and context management under resource constraints; (2) Cognitive Framework (blue) implementing context compression and agent optimization techniques; (3) Cognitive Applications (purple) enabling conversational AI and autonomous reasoning capabilities; and (4) Cognitive Evaluation (teal) assessing reasoning quality and agent autonomy. The closed-loop design ensures continuous improvement through feedback mechanisms where evaluation results inform challenge identification and application validation drives framework refinement.
  • Figure 2: Illustrative resource comparison (cloud accelerator cluster vs. edge server vs. mobile SoC). Values approximate public peak specs; actual usable throughput depends on workload, precision, and batching nvidia-h100-specjetson-agx-orinapple-a17-pro.
  • Figure 3: Cognitive Edge AI Optimization Framework for LLMs and Agents
  • Figure 4: Cognitive Edge Computing Pipeline: Multi-modal input processing through edge-based reasoning and autonomous agent action generation with cognitive security and context management.
  • Figure 5: Distributed Cognitive Computing Architecture: Cognitive Agents collaborate through distributed reasoning, knowledge sharing, and cloud-assisted complex cognitive tasks.
  • ...and 3 more figures