Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion

Weiye Chen; Qingen Zhu; Qian Long

Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion

Weiye Chen, Qingen Zhu, Qian Long

TL;DR

Muon-AD addresses the bottleneck of real-time visual synthesis on edge devices by co-designing gradient optimization, architecture, and training. It integrates the Muon optimizer with attention distillation and entropy-driven dynamic pruning, guided by a three-phase curriculum, to decouple style and content and reduce computation while preserving quality. The approach yields significant gains in convergence speed, memory efficiency, and inference frame rates, achieving Pareto-optimal trade-offs across diffusion-based generation tasks and cross-domain applications (2D style transfer, 3D texture synthesis, and VR rendering). Its edge-friendly deployment pipeline and demonstrated scalability suggest substantial practical impact for mobile AR, industrial digital twins, and immersive VR on resource-constrained hardware, with open-source resources planned. $$L_{tot} = L_{distill} + \lambda L_{content}$$ and orthogonal gradient updates further reduce conflicts in high-dimensional latent spaces, enabling robust real-time synthesis on devices like Jetson platforms.

Abstract

Recent advances in visual synthesis have leveraged diffusion models and attention mechanisms to achieve high-fidelity artistic style transfer and photorealistic text-to-image generation. However, real-time deployment on edge devices remains challenging due to computational and memory constraints. We propose Muon-AD, a co-designed framework that integrates the Muon optimizer with attention distillation for real-time edge synthesis. By eliminating gradient conflicts through orthogonal parameter updates and dynamic pruning, Muon-AD achieves 3.2 times faster convergence compared to Stable Diffusion-TensorRT, while maintaining synthesis quality (15% lower FID, 4% higher SSIM). Our framework reduces peak memory to 7GB on Jetson Orin and enables 24FPS real-time generation through mixed-precision quantization and curriculum learning. Extensive experiments on COCO-Stuff and ImageNet-Texture demonstrate Muon-AD's Pareto-optimal efficiency-quality trade-offs. Here, we show a 65% reduction in communication overhead during distributed training and real-time 10s/image generation on edge GPUs. These advancements pave the way for democratizing high-quality visual synthesis in resource-constrained environments.

Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion

TL;DR

Abstract

Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)