Uni-DAD: Unified Distillation and Adaptation of Diffusion Models for Few-step Few-shot Image Generation

Yara Bahram; Melodie Desbos; Mohammadhadi Shateri; Eric Granger

Uni-DAD: Unified Distillation and Adaptation of Diffusion Models for Few-step Few-shot Image Generation

Yara Bahram, Melodie Desbos, Mohammadhadi Shateri, Eric Granger

TL;DR

Uni-DAD proposes a single-stage framework that jointly distills and adapts diffusion models for fast, few-shot image generation in novel domains. By coupling a dual-domain distribution-matching objective with a multi-head GAN, and optionally leveraging a target teacher, it preserves source-domain diversity while sharpening target realism. Empirical results on few-shot image generation and subject-driven personalization show superior quality and diversity with as few as 3 denoising steps, offering a practical path to real-time personalized diffusion-based generation. The method is checkpoint-agnostic, enabling distillation of adapted models or adaptation of distilled ones without changing the training loop, and demonstrates strong potential for fast, high-fidelity, domain-shifted generation.

Abstract

Diffusion models (DMs) produce high-quality images, yet their sampling remains costly when adapted to new domains. Distilled DMs are faster but typically remain confined within their teacher's domain. Thus, fast and high-quality generation for novel domains relies on two-stage training pipelines: Adapt-then-Distill or Distill-then-Adapt. However, both add design complexity and suffer from degraded quality or diversity. We introduce Uni-DAD, a single-stage pipeline that unifies distillation and adaptation of DMs. It couples two signals during training: (i) a dual-domain distribution-matching distillation objective that guides the student toward the distributions of the source teacher and a target teacher, and (ii) a multi-head generative adversarial network (GAN) loss that encourages target realism across multiple feature scales. The source domain distillation preserves diverse source knowledge, while the multi-head GAN stabilizes training and reduces overfitting, especially in few-shot regimes. The inclusion of a target teacher facilitates adaptation to more structurally distant domains. We perform evaluations on a variety of datasets for few-shot image generation (FSIG) and subject-driven personalization (SDP). Uni-DAD delivers higher quality than state-of-the-art (SoTA) adaptation methods even with less than 4 sampling steps, and outperforms two-stage training pipelines in both quality and diversity.

Uni-DAD: Unified Distillation and Adaptation of Diffusion Models for Few-step Few-shot Image Generation

TL;DR

Abstract

Uni-DAD: Unified Distillation and Adaptation of Diffusion Models for Few-step Few-shot Image Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)