Table of Contents
Fetching ...

Semantic Anchoring for Robust Personalization in Text-to-Image Diffusion Models

Seoyun Yang, Gihoon Kim, Taesup Kim

TL;DR

Personalization in text-to-image diffusion is hindered by semantic drift when learning rare subjects from few references. The authors introduce Semantic Anchoring Personalization, a training-time objective that anchors rare subject learning to the pretrained frequent semantics, effectively blending guidance from both concepts. The method yields consistent improvements in subject fidelity and text-image alignment across multiple backbones and through comprehensive ablations. This anchoring strategy enables stable expansion of the pretrained distribution toward personalized regions with preserved semantic structure, offering robust, generalizable personalization for diffusion models.

Abstract

Text-to-image diffusion models have achieved remarkable progress in generating diverse and realistic images from textual descriptions. However, they still struggle with personalization, which requires adapting a pretrained model to depict user-specific subjects from only a few reference images. The key challenge lies in learning a new visual concept from a limited number of reference images while preserving the pretrained semantic prior that maintains text-image alignment. When the model focuses on subject fidelity, it tends to overfit the limited reference images and fails to leverage the pretrained distribution. Conversely, emphasizing prior preservation maintains semantic consistency but prevents the model from learning new personalized attributes. Building on these observations, we propose the personalization process through a semantic anchoring that guides adaptation by grounding new concepts in their corresponding distributions. We therefore reformulate personalization as the process of learning a rare concept guided by its frequent counterpart through semantic anchoring. This anchoring encourages the model to adapt new concepts in a stable and controlled manner, expanding the pretrained distribution toward personalized regions while preserving its semantic structure. As a result, the proposed method achieves stable adaptation and consistent improvements in both subject fidelity and text-image alignment compared to baseline methods. Extensive experiments and ablation studies further demonstrate the robustness and effectiveness of the proposed anchoring strategy.

Semantic Anchoring for Robust Personalization in Text-to-Image Diffusion Models

TL;DR

Personalization in text-to-image diffusion is hindered by semantic drift when learning rare subjects from few references. The authors introduce Semantic Anchoring Personalization, a training-time objective that anchors rare subject learning to the pretrained frequent semantics, effectively blending guidance from both concepts. The method yields consistent improvements in subject fidelity and text-image alignment across multiple backbones and through comprehensive ablations. This anchoring strategy enables stable expansion of the pretrained distribution toward personalized regions with preserved semantic structure, offering robust, generalizable personalization for diffusion models.

Abstract

Text-to-image diffusion models have achieved remarkable progress in generating diverse and realistic images from textual descriptions. However, they still struggle with personalization, which requires adapting a pretrained model to depict user-specific subjects from only a few reference images. The key challenge lies in learning a new visual concept from a limited number of reference images while preserving the pretrained semantic prior that maintains text-image alignment. When the model focuses on subject fidelity, it tends to overfit the limited reference images and fails to leverage the pretrained distribution. Conversely, emphasizing prior preservation maintains semantic consistency but prevents the model from learning new personalized attributes. Building on these observations, we propose the personalization process through a semantic anchoring that guides adaptation by grounding new concepts in their corresponding distributions. We therefore reformulate personalization as the process of learning a rare concept guided by its frequent counterpart through semantic anchoring. This anchoring encourages the model to adapt new concepts in a stable and controlled manner, expanding the pretrained distribution toward personalized regions while preserving its semantic structure. As a result, the proposed method achieves stable adaptation and consistent improvements in both subject fidelity and text-image alignment compared to baseline methods. Extensive experiments and ablation studies further demonstrate the robustness and effectiveness of the proposed anchoring strategy.

Paper Structure

This paper contains 22 sections, 12 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Conceptual illustration of the proposed method. pretrained semantics provide stable guidance, whereas guidance in newly introduced regions remains unstable. Our approach anchors personalization to the pretrained semantics, enabling stable guidance as the model expands toward novel concepts.
  • Figure 2: $L_2$ distance between subject prediction $\epsilon_{\theta}(z_t, c^{\text{sbj}}, t)$ and $\epsilon_{\theta'}(z^{\text{cls}}_{t}, c^{\text{cls}}, t)$ anchor prediction measured over adaptation steps.
  • Figure 3: Visual comparison of few-shot personalization and encoder-based methods on the SD1.5 backbone.
  • Figure 4: Qualitative comparison on SDXL and SD3 backbones among DreamBooth, Beyond-Finetuning, and our method.
  • Figure 6: Comparison of semantic alignment (CLIP-T) using two different anchoring strategies. Pretrained anchoring attains higher CLIP-T scores than finetuned anchoring.
  • ...and 5 more figures