Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression
Dohyun Kim, Sehwan Park, Geonhee Han, Seung Wook Kim, Paul Hongsuck Seo
TL;DR
The paper tackles data efficiency in distilling conditional diffusion models by introducing Random Conditioning, which pairs noised inputs with randomly chosen text prompts to expand the conditioning space without generating large image-text datasets. By integrating losses on noise prediction and intermediate features, the method distills teacher diffusion models into smaller student models in an image-free setting, enabling generation of concepts outside the training prompts. Empirical results across LAION-derived and MS-COCO data show substantial gains in FID, IS, and CLIP scores over naïve baselines, with comparable performance to teacher models when using random conditioning and data augmentation, and even enabling unseen-concept generation without real images. The approach yields data-efficient, resource-friendly diffusion model compression, achieving strong performance with block- and channel-based architectures and opening avenues for deploying diffusion models in data-constrained environments and diverse modalities.
Abstract
Diffusion models generate high-quality images through progressive denoising but are computationally intensive due to large model sizes and repeated sampling. Knowledge distillation, which transfers knowledge from a complex teacher to a simpler student model, has been widely studied in recognition tasks, particularly for transferring concepts unseen during student training. However, its application to diffusion models remains underexplored, especially in enabling student models to generate concepts not covered by the training images. In this work, we propose Random Conditioning, a novel approach that pairs noised images with randomly selected text conditions to enable efficient, image-free knowledge distillation. By leveraging this technique, we show that the student can generate concepts unseen in the training images. When applied to conditional diffusion model distillation, our method allows the student to explore the condition space without generating condition-specific images, resulting in notable improvements in both generation quality and efficiency. This promotes resource-efficient deployment of generative diffusion models, broadening their accessibility for both research and real-world applications. Code, models, and datasets are available at https://dohyun-as.github.io/Random-Conditioning .
