Coffee: Controllable Diffusion Fine-tuning
Ziyao Zeng, Jingcheng Ni, Ruyi Liu, Alex Wong
TL;DR
The paper tackles controllable fine-tuning of text-to-image diffusion models by preventing learning of undesired concepts during adaptation. It introduces Coffee, a training-free regularization that uses natural language to specify undesired concepts and preserves alignment between user prompts and target concepts via a cosine-similarity constraint, incorporated as L = L_diffusion + λ L_reg with L_reg = | (v_i dot v_m)/(||v_i|| ||v_m||) - (v' dot v_m)/(||v'|| ||v_m||) |. Coffee requires no architectural changes and generalizes across diffusion backbones, enabling dynamic updates of undesired concepts by re-encoding text without retraining. Empirical results on eight concept pairs show significant reductions in undesired concept leakage (MCS) while maintaining image quality (IS), outperforming direct fine-tuning and IMMA. This approach has practical implications for bias mitigation, privacy, and robust customization of diffusion models in real-world deployments.
Abstract
Text-to-image diffusion models can generate diverse content with flexible prompts, which makes them well-suited for customization through fine-tuning with a small amount of user-provided data. However, controllable fine-tuning that prevents models from learning undesired concepts present in the fine-tuning data, and from entangling those concepts with user prompts, remains an open challenge. It is crucial for downstream tasks like bias mitigation, preventing malicious adaptation, attribute disentanglement, and generalizable fine-tuning of diffusion policy. We propose Coffee that allows using language to specify undesired concepts to regularize the adaptation process. The crux of our method lies in keeping the embeddings of the user prompt from aligning with undesired concepts. Crucially, Coffee requires no additional training and enables flexible modification of undesired concepts by modifying textual descriptions. We evaluate Coffee by fine-tuning on images associated with user prompts paired with undesired concepts. Experimental results demonstrate that Coffee can prevent text-to-image models from learning specified undesired concepts during fine-tuning and outperforms existing methods. Code will be released upon acceptance.
