Table of Contents
Fetching ...

Coffee: Controllable Diffusion Fine-tuning

Ziyao Zeng, Jingcheng Ni, Ruyi Liu, Alex Wong

TL;DR

The paper tackles controllable fine-tuning of text-to-image diffusion models by preventing learning of undesired concepts during adaptation. It introduces Coffee, a training-free regularization that uses natural language to specify undesired concepts and preserves alignment between user prompts and target concepts via a cosine-similarity constraint, incorporated as L = L_diffusion + λ L_reg with L_reg = | (v_i dot v_m)/(||v_i|| ||v_m||) - (v' dot v_m)/(||v'|| ||v_m||) |. Coffee requires no architectural changes and generalizes across diffusion backbones, enabling dynamic updates of undesired concepts by re-encoding text without retraining. Empirical results on eight concept pairs show significant reductions in undesired concept leakage (MCS) while maintaining image quality (IS), outperforming direct fine-tuning and IMMA. This approach has practical implications for bias mitigation, privacy, and robust customization of diffusion models in real-world deployments.

Abstract

Text-to-image diffusion models can generate diverse content with flexible prompts, which makes them well-suited for customization through fine-tuning with a small amount of user-provided data. However, controllable fine-tuning that prevents models from learning undesired concepts present in the fine-tuning data, and from entangling those concepts with user prompts, remains an open challenge. It is crucial for downstream tasks like bias mitigation, preventing malicious adaptation, attribute disentanglement, and generalizable fine-tuning of diffusion policy. We propose Coffee that allows using language to specify undesired concepts to regularize the adaptation process. The crux of our method lies in keeping the embeddings of the user prompt from aligning with undesired concepts. Crucially, Coffee requires no additional training and enables flexible modification of undesired concepts by modifying textual descriptions. We evaluate Coffee by fine-tuning on images associated with user prompts paired with undesired concepts. Experimental results demonstrate that Coffee can prevent text-to-image models from learning specified undesired concepts during fine-tuning and outperforms existing methods. Code will be released upon acceptance.

Coffee: Controllable Diffusion Fine-tuning

TL;DR

The paper tackles controllable fine-tuning of text-to-image diffusion models by preventing learning of undesired concepts during adaptation. It introduces Coffee, a training-free regularization that uses natural language to specify undesired concepts and preserves alignment between user prompts and target concepts via a cosine-similarity constraint, incorporated as L = L_diffusion + λ L_reg with L_reg = | (v_i dot v_m)/(||v_i|| ||v_m||) - (v' dot v_m)/(||v'|| ||v_m||) |. Coffee requires no architectural changes and generalizes across diffusion backbones, enabling dynamic updates of undesired concepts by re-encoding text without retraining. Empirical results on eight concept pairs show significant reductions in undesired concept leakage (MCS) while maintaining image quality (IS), outperforming direct fine-tuning and IMMA. This approach has practical implications for bias mitigation, privacy, and robust customization of diffusion models in real-world deployments.

Abstract

Text-to-image diffusion models can generate diverse content with flexible prompts, which makes them well-suited for customization through fine-tuning with a small amount of user-provided data. However, controllable fine-tuning that prevents models from learning undesired concepts present in the fine-tuning data, and from entangling those concepts with user prompts, remains an open challenge. It is crucial for downstream tasks like bias mitigation, preventing malicious adaptation, attribute disentanglement, and generalizable fine-tuning of diffusion policy. We propose Coffee that allows using language to specify undesired concepts to regularize the adaptation process. The crux of our method lies in keeping the embeddings of the user prompt from aligning with undesired concepts. Crucially, Coffee requires no additional training and enables flexible modification of undesired concepts by modifying textual descriptions. We evaluate Coffee by fine-tuning on images associated with user prompts paired with undesired concepts. Experimental results demonstrate that Coffee can prevent text-to-image models from learning specified undesired concepts during fine-tuning and outperforms existing methods. Code will be released upon acceptance.

Paper Structure

This paper contains 9 sections, 6 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Preventing learning undesired concepts during adaptation. By simply specifying undesired concepts in natural language, Coffee controls the fine-tuning of the text-to-image diffusion models to prevent the model from learning undesired concepts, and from entangling those concepts with user prompts, when models are fine-tuned with images containing such concepts.
  • Figure 2: Prompt steering alone cannot prevent the model from learning undesired concepts ("Sunglasses"). Negative Prompting is implemented through classifier-free guidance ho2022classifier. Undesired Concept Removal is to remove undesired concepts from the input prompt in inference using the format "<user prompt> without <undesired concept>". Details can be found in Table \ref{['tab:prompt_steering']}.
  • Figure 3: Pipeline of Coffee. An undesired concept, $\textbf{c}_m$, is pre-specified in natural language. During the fine-tuning process of the text-to-image diffusion model, for the input user prompt, $\textbf{c}_i$, besides the standard diffusion training objective, Coffee regularizes the cosine similarity between $\textbf{c}_i$ and $\textbf{c}_m$ to prevent substantial changes during fine-tuning. It ensures the distance between $\textbf{c}_i$ and $\textbf{c}_m$ in the latent space remains relatively stable, preventing learning undesired concepts from training images. Additionally, $\textbf{c}_m$ can be re-specified at any time to change undesired concepts without retraining. In inference, with $\textbf{c}_i$ as input prompt, the model can generate images that align with the distribution of fine-tuning images while not containing $\textbf{c}_m$.
  • Figure 4: Visualization. Compared with direct inference, after a direct fine-tuning or IMMA IMMA, undesired concepts (like "rifle", "sunglasses", "smoking", etc.) appear in generated images since the text-to-image model learned undesired concepts while modeling the distribution of fine-tuning images. On the other hand, Coffee captures the style of fine-tuning images and generates visually similar outputs, while effectively preventing the learning of undesired concepts pre-specified in text.
  • Figure 5: Sensitivity to Regularization Magnitude. A high $\lambda$ hinders the model's ability to learn the distribution of fine-tuning images, while a low $\lambda$ may provide insufficient regularization.
  • ...and 2 more figures