Diffusion Models on the Edge: Challenges, Optimizations, and Applications
Dongqi Zheng
TL;DR
Diffusion models offer high-fidelity generative capabilities but are hindered by heavy computation on edge devices. The paper surveys foundational diffusion concepts alongside edge-specific constraints and a broad set of optimization and hardware-software co-design strategies, including sampling acceleration, latent-space diffusion, quantization, pruning, distillation, and operator fusion. It catalogs platform-specific optimization case studies, runtime frameworks, profiling tools, and standardized benchmarks, highlighting practical pipelines for on-device deployment. By mapping techniques to concrete edge scenarios and metrics, the work outlines a path toward private, low-energy, real-time diffusion on smartphones, wearables, and IoT devices with broad societal and economic impact.
Abstract
Diffusion models have shown remarkable capabilities in generating high-fidelity data across modalities such as images, audio, and video. However, their computational intensity makes deployment on edge devices a significant challenge. This survey explores the foundational concepts of diffusion models, identifies key constraints of edge platforms, and synthesizes recent advancements in model compression, sampling efficiency, and hardware-software co-design to make diffusion models viable on edge devices. We also review promising applications and suggest future research directions.
