SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
Weilong Chai, DanDan Zheng, Jiajiong Cao, Zhiquan Chen, Changbao Wang, Chenguang Ma
TL;DR
SpeedUpNet (SUN) introduces a universal, plug‑in adapter for cross‑attention in diffusion models to accelerate text‑to‑image generation while preserving content fidelity and negative‑prompt controllability. It learns a negative–positive prompt offset and utilizes Attention Normalization, enabling a single forward pass to approximate the CFG‑guided output and a Multi‑Step Consistency (MSC) distillation to stabilize outputs across multi‑step acceleration. Trained on base Stable Diffusion v1.5, SUN can be freely plugged into various fine‑tuned SD models without further training, delivering the equivalent of a 4‑step inference with over a 10× speedup and achieving competitive or state‑of‑the‑art FID/CLIP scores on the LAION‑Aesthetic‑6+ dataset. The approach also integrates with Inpainting, Image‑to‑Image, and ControlNet, offering a practical, training‑free path to universal acceleration across stylized diffusion models with stable, controllable outputs.
Abstract
Text-to-image diffusion models (SD) exhibit significant advancements while requiring extensive computational resources. Existing acceleration methods usually require extensive training and are not universally applicable. LCM-LoRA, trainable once for diverse models, offers universality but rarely considers ensuring the consistency of generated content before and after acceleration. This paper proposes SpeedUpNet (SUN), an innovative acceleration module, to address the challenges of universality and consistency. Exploiting the role of cross-attention layers in U-Net for SD models, we introduce an adapter specifically designed for these layers, quantifying the offset in image generation caused by negative prompts relative to positive prompts. This learned offset demonstrates stability across a range of models, enhancing SUN's universality. To improve output consistency, we propose a Multi-Step Consistency (MSC) loss, which stabilizes the offset and ensures fidelity in accelerated content. Experiments on SD v1.5 show that SUN leads to an overall speedup of more than 10 times compared to the baseline 25-step DPM-solver++, and offers two extra advantages: (1) training-free integration into various fine-tuned Stable-Diffusion models and (2) state-of-the-art FIDs of the generated data set before and after acceleration guided by random combinations of positive and negative prompts. Code is available: https://williechai.github.io/speedup-plugin-for-stable-diffusions.github.io.
