Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation
Quan Dao, Hao Phung, Trung Dao, Dimitris Metaxas, Anh Tran
TL;DR
Self-Corrected Flow Distillation addresses sampling inefficiency in flow matching by integrating consistency distillation with adversarial training in latent space. The approach introduces a truncated consistency loss, a GAN-based one-step refinement, a reflow loss to align one-step and few-step trajectories, and a bidirectional consistency objective to stabilize cross-step generation. Empirical results on CelebA-HQ and zero-shot COCO demonstrate superior one-step and few-step FID scores and competitive CLIP metrics, with fast inference times. The work provides a practical pathway to real-time, consistent text-to-image and unconditional generation with public code release.
Abstract
Flow matching has emerged as a promising framework for training generative models, demonstrating impressive empirical performance while offering relative ease of training compared to diffusion-based models. However, this method still requires numerous function evaluations in the sampling process. To address these limitations, we introduce a self-corrected flow distillation method that effectively integrates consistency models and adversarial training within the flow-matching framework. This work is a pioneer in achieving consistent generation quality in both few-step and one-step sampling. Our extensive experiments validate the effectiveness of our method, yielding superior results both quantitatively and qualitatively on CelebA-HQ and zero-shot benchmarks on the COCO dataset. Our implementation is released at https://github.com/VinAIResearch/SCFlow
