Table of Contents
Fetching ...

GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices

Mozhgan Navardi, Romina Aalishah, Yuzhe Fu, Yueqian Lin, Hai Li, Yiran Chen, Tinoosh Mohsenin

TL;DR

Deploying GenAI on resource-constrained edge devices can dramatically reduce latency and protect privacy, but requires advances in software optimization, hardware acceleration, and deployment frameworks. The paper categorizes approaches into three pillars—model compression and NAS for software; accelerators and optimized attention for hardware; and compiler/engine frameworks for on-device inference—and details concrete techniques, open-source edge models, and NAS-driven architectures. Key contributions include a consolidated view of PTQ/QAT, pruning, distillation, diffusion quantization, CIM-based accelerators, FlashAttention variants, and edge-oriented frameworks, forming a practical roadmap for real-world edge GenAI systems. The work highlights challenges in personalization and security across distributed edge nodes and argues for integrated hardware-software co-design to realize private, real-time GenAI at the edge with broad application potential in drones, wearables, and autonomous systems.

Abstract

Generative Artificial Intelligence (GenAI) applies models and algorithms such as Large Language Model (LLM) and Foundation Model (FM) to generate new data. GenAI, as a promising approach, enables advanced capabilities in various applications, including text generation and image processing. In current practice, GenAI algorithms run mainly on the cloud server, leading to high latency and raising security concerns. Consequently, these challenges encourage the deployment of GenAI algorithms directly on edge devices. However, the large size of such models and their significant computational resource requirements pose obstacles when deploying them in resource-constrained systems. This survey provides a comprehensive overview of recent proposed techniques that optimize GenAI for efficient deployment on resource-constrained edge devices. For this aim, this work highlights three main categories for bringing GenAI to the edge: software optimization, hardware optimization, and frameworks. The main takeaways for readers of this survey will be a clear roadmap to design, implement, and refine GenAI systems for real-world implementation on edge devices.

GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices

TL;DR

Deploying GenAI on resource-constrained edge devices can dramatically reduce latency and protect privacy, but requires advances in software optimization, hardware acceleration, and deployment frameworks. The paper categorizes approaches into three pillars—model compression and NAS for software; accelerators and optimized attention for hardware; and compiler/engine frameworks for on-device inference—and details concrete techniques, open-source edge models, and NAS-driven architectures. Key contributions include a consolidated view of PTQ/QAT, pruning, distillation, diffusion quantization, CIM-based accelerators, FlashAttention variants, and edge-oriented frameworks, forming a practical roadmap for real-world edge GenAI systems. The work highlights challenges in personalization and security across distributed edge nodes and argues for integrated hardware-software co-design to realize private, real-time GenAI at the edge with broad application potential in drones, wearables, and autonomous systems.

Abstract

Generative Artificial Intelligence (GenAI) applies models and algorithms such as Large Language Model (LLM) and Foundation Model (FM) to generate new data. GenAI, as a promising approach, enables advanced capabilities in various applications, including text generation and image processing. In current practice, GenAI algorithms run mainly on the cloud server, leading to high latency and raising security concerns. Consequently, these challenges encourage the deployment of GenAI algorithms directly on edge devices. However, the large size of such models and their significant computational resource requirements pose obstacles when deploying them in resource-constrained systems. This survey provides a comprehensive overview of recent proposed techniques that optimize GenAI for efficient deployment on resource-constrained edge devices. For this aim, this work highlights three main categories for bringing GenAI to the edge: software optimization, hardware optimization, and frameworks. The main takeaways for readers of this survey will be a clear roadmap to design, implement, and refine GenAI systems for real-world implementation on edge devices.

Paper Structure

This paper contains 10 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Illustration of the flow of GenAI at the edge