Semantic-Aware Caching for Efficient Image Generation in Edge Computing
Hanshuai Cui, Zhiqing Tang, Zhi Yao, Weijia Jia, Wei Zhao
TL;DR
This work tackles the latency challenges of diffusion-based image generation on edge devices by introducing CacheGenius, a semantic-aware caching system that fuses text-to-image and image-to-image workflows using cached references. It deploys a semantic storage classifier, a request scheduler, and a novel Least Correlation Used (LCU) cache maintenance policy to maintain relevant, semantically aligned caches across distributed edge nodes. The approach yields substantial latency (41%) and cost (48%) reductions while preserving competitive image quality and similarity metrics compared to baseline diffusion methods. Together, these components enable efficient, scalable image synthesis in resource-constrained environments with practical implications for mobile and immersive applications.
Abstract
Text-to-image generation employing diffusion models has attained significant popularity due to its capability to produce high-quality images that adhere to textual prompts. However, the integration of diffusion models faces critical challenges into resource-constrained mobile and edge environments because it requires multiple denoising steps from the original random noise. A practical way to speed up denoising is to initialize the process with a noised reference image that is similar to the target, since both images share similar layouts, structures, and details, allowing for fewer denoising steps. Based on this idea, we present CacheGenius, a hybrid image generation system in edge computing that accelerates generation by combining text-toimage and image-to-image workflows. It generates images from user text prompts using cached reference images. CacheGenius introduces a semantic-aware classified storage scheme and a request-scheduling algorithm that ensures semantic alignment between references and targets. To ensure sustained performance, it employs a cache maintenance policy that proactively evicts obsolete entries via correlation analysis. Evaluated in a distributed edge computing system, CacheGenius reduces generation latency by 41% and computational costs by 48% relative to baselines, while maintaining competitive evaluation metrics.
