SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Trung Dao, Thuan Hoang Nguyen, Thanh Le, Duc Vu, Khoi Nguyen, Cuong Pham, Anh Tran
TL;DR
SwiftBrush v2 tackles the challenge of surpassing its multi-step teacher in one-step diffusion by analyzing the quality-diversity trade-off, initializing with SD Turbo weights, and integrating a clamped CLIP loss along with data-scale and resource-efficient training. By fusing two training schemes through simple weight interpolation and applying post-training image regularization, the method achieves state-of-the-art one-step FID on COCO-2014 (FID = 8.14) while maintaining near real-time inference. The approach demonstrates strong image quality, textual alignment, and diversity, outperforming GAN-based and prior one-step methods, with robust ablations and practical training strategies (LoRA, TinyVAE, ScaleCrafter). The work also provides insights into robustness, compositional improvements, and scalable data utilization, offering a practical path toward accessible, high-quality on-device text-to-image synthesis. Overall, SwiftBrush v2 extends the capabilities of one-step diffusion models, enabling faster, more diverse, and higher-fidelity image generation with scalable training and post-hoc fusion techniques.
Abstract
In this paper, we aim to enhance the performance of SwiftBrush, a prominent one-step text-to-image diffusion model, to be competitive with its multi-step Stable Diffusion counterpart. Initially, we explore the quality-diversity trade-off between SwiftBrush and SD Turbo: the former excels in image diversity, while the latter excels in image quality. This observation motivates our proposed modifications in the training methodology, including better weight initialization and efficient LoRA training. Moreover, our introduction of a novel clamped CLIP loss enhances image-text alignment and results in improved image quality. Remarkably, by combining the weights of models trained with efficient LoRA and full training, we achieve a new state-of-the-art one-step diffusion model, achieving an FID of 8.14 and surpassing all GAN-based and multi-step Stable Diffusion models. The project page is available at https://swiftbrushv2.github.io.
