Table of Contents
Fetching ...

Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model

Jincheng Zhong, Xiangcheng Zhang, Jianmin Wang, Mingsheng Long

TL;DR

Domain Guidance (DoG) introduces a simple transfer mechanism for pre-trained diffusion models by treating domain transfer as domain-conditioned generation. It keeps the original model as an unconditional guide while training a domain-specific conditional branch, enabling sampling to be steered toward the target domain without retraining the entire network. Empirical and theoretical analyses show DoG leverages pre-trained knowledge to improve domain alignment and reduce out-of-domain sampling, outperforming standard CFG-based fine-tuning across seven benchmarks and enabling seamless integration with CFG-finetuned or LoRA-enhanced models. The approach offers practical gains in generation quality (FID and FD_DINOv2) and remains computationally efficient during sampling, highlighting its utility for rapid, robust domain adaptation of diffusion models.

Abstract

Recent advancements in diffusion models have revolutionized generative modeling. However, the impressive and vivid outputs they produce often come at the cost of significant model scaling and increased computational demands. Consequently, building personalized diffusion models based on off-the-shelf models has emerged as an appealing alternative. In this paper, we introduce a novel perspective on conditional generation for transferring a pre-trained model. From this viewpoint, we propose *Domain Guidance*, a straightforward transfer approach that leverages pre-trained knowledge to guide the sampling process toward the target domain. Domain Guidance shares a formulation similar to advanced classifier-free guidance, facilitating better domain alignment and higher-quality generations. We provide both empirical and theoretical analyses of the mechanisms behind Domain Guidance. Our experimental results demonstrate its substantial effectiveness across various transfer benchmarks, achieving over a 19.6% improvement in FID and a 23.4% improvement in FD$_\text{DINOv2}$ compared to standard fine-tuning. Notably, existing fine-tuned models can seamlessly integrate Domain Guidance to leverage these benefits, without additional training.

Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model

TL;DR

Domain Guidance (DoG) introduces a simple transfer mechanism for pre-trained diffusion models by treating domain transfer as domain-conditioned generation. It keeps the original model as an unconditional guide while training a domain-specific conditional branch, enabling sampling to be steered toward the target domain without retraining the entire network. Empirical and theoretical analyses show DoG leverages pre-trained knowledge to improve domain alignment and reduce out-of-domain sampling, outperforming standard CFG-based fine-tuning across seven benchmarks and enabling seamless integration with CFG-finetuned or LoRA-enhanced models. The approach offers practical gains in generation quality (FID and FD_DINOv2) and remains computationally efficient during sampling, highlighting its utility for rapid, robust domain adaptation of diffusion models.

Abstract

Recent advancements in diffusion models have revolutionized generative modeling. However, the impressive and vivid outputs they produce often come at the cost of significant model scaling and increased computational demands. Consequently, building personalized diffusion models based on off-the-shelf models has emerged as an appealing alternative. In this paper, we introduce a novel perspective on conditional generation for transferring a pre-trained model. From this viewpoint, we propose *Domain Guidance*, a straightforward transfer approach that leverages pre-trained knowledge to guide the sampling process toward the target domain. Domain Guidance shares a formulation similar to advanced classifier-free guidance, facilitating better domain alignment and higher-quality generations. We provide both empirical and theoretical analyses of the mechanisms behind Domain Guidance. Our experimental results demonstrate its substantial effectiveness across various transfer benchmarks, achieving over a 19.6% improvement in FID and a 23.4% improvement in FD compared to standard fine-tuning. Notably, existing fine-tuned models can seamlessly integrate Domain Guidance to leverage these benefits, without additional training.

Paper Structure

This paper contains 37 sections, 3 theorems, 19 equations, 9 figures, 11 tables.

Key Result

Proposition 1

Figures (9)

  • Figure 1: Conceptual comparisons between Domain Guidance and standard classifier-free guidance. (a) shows standard CFG modeling both conditional density and unconditional guiding signals for the target domain simultaneously. (b) illustrates the proposed Domain Guidance, which focuses on building conditional density and guides the sampling process from the pre-trained model to the target domain. (c) to (e) depict conceptual examples of the mechanism differences between CFG and DoG, highlighting how DoG leverages pre-trained knowledge to enhance generation for the target domain.
  • Figure 2: A mixture of Gaussians synthetic dataset with different colored dots represent modes of different classes. In (a), the target domain is defined by the orange area, while the pre-training distribution forms the blue background. Green and red dots represent two classes, with filled dots indicating in-domain real data.Sampling results from these classes after model fine-tuning are denoted by circles with corresponding color. (b) illustrates how CFG leads to out-of-domain samples by disregarding pre-trained knowledge, while (c) demonstrates how DoG maintains domain consistency by effectively utilizing pre-trained data. (d) contrasts the directional guidance provided by DoG (red arrows) against CFG (blue arrows) for intermediate samples $\mathbf{x}_{\text{mid}}$, showing how DoG steers samples towards the domain-specific regions, unlike CFG which may lead samples towards outliers.
  • Figure 3: Results of CFG and DoG on varying sampling steps. FID $\downarrow$
  • Figure 3: Component analysis of DoG. (a) illustrates that a separately fine-tuned unconditional guiding model degrades generation performance as training steps increase. (b) shows the sensitivity of FID to guidance parameters in DoG.
  • Figure 4: Qualitative showcases for DoG across downstream tasks. Best viewed zoomed in. Each nine-grid case compares CFG (left column) and DoG (right column), with the middle column blending the two. Rows increase guidance weights from $\{2, 3, 4\}$.
  • ...and 4 more figures

Theorems & Definitions (5)

  • Proposition 1
  • Theorem 1
  • Theorem 2: Full version of Proposition \ref{['prop:dog']}
  • proof
  • proof : Proof of Theorem \ref{['thm:convergence']}