Table of Contents
Fetching ...

Foundation Cures Personalization: Improving Personalized Models' Prompt Consistency via Hidden Foundation Knowledge

Yiyang Cai, Zhengkai Jiang, Yulong Liu, Chunyang Jiang, Wei Xue, Yike Guo, Wenhan Luo

TL;DR

This work investigates the conflict between identity fidelity and prompt consistency in facial personalization and finds that identity embeddings can degrade other prompt tokens via cross-attention. It introduces FreeCure, a training-free framework that leverages latent foundation knowledge through a dual-inference setup, a foundation-aware self-attention module (FASA), and inversion-based asymmetric prompt guidance (APG) to improve attribute-level control while preserving identity. Empirically, FreeCure yields significant gains in prompt consistency across multiple personalization baselines and foundation models (Stable Diffusion and FLUX), with only modest losses in identity fidelity, and demonstrates robustness to noise and mask inaccuracies. The approach offers a practical, modular solution to enhance controllability in personalized generative faces, and it integrates seamlessly with existing diffusion-model pipelines and segmentation tools like BiSeNet and Segment-Anything.

Abstract

Facial personalization faces challenges to maintain identity fidelity without disrupting the foundation model's prompt consistency. The mainstream personalization models employ identity embedding to integrate identity information within the attention mechanisms. However, our preliminary findings reveal that identity embeddings compromise the effectiveness of other tokens in the prompt, thereby limiting high prompt consistency and attribute-level controllability. Moreover, by deactivating identity embedding, personalization models still demonstrate the underlying foundation models' ability to control facial attributes precisely. It suggests that such foundation models' knowledge can be leveraged to cure the ill-aligned prompt consistency of personalization models. Building upon these insights, we propose FreeCure, a framework that improves the prompt consistency of personalization models with their latent foundation models' knowledge. First, by setting a dual inference paradigm with/without identity embedding, we identify attributes (e.g., hair, accessories, etc.) for enhancements. Second, we introduce a novel foundation-aware self-attention module, coupled with an inversion-based process to bring well-aligned attribute information to the personalization process. Our approach is training-free, and can effectively enhance a wide array of facial attributes; and it can be seamlessly integrated into existing popular personalization models based on both Stable Diffusion and FLUX. FreeCure has consistently shown significant improvements in prompt consistency across these facial personalization models while maintaining the integrity of their original identity fidelity.

Foundation Cures Personalization: Improving Personalized Models' Prompt Consistency via Hidden Foundation Knowledge

TL;DR

This work investigates the conflict between identity fidelity and prompt consistency in facial personalization and finds that identity embeddings can degrade other prompt tokens via cross-attention. It introduces FreeCure, a training-free framework that leverages latent foundation knowledge through a dual-inference setup, a foundation-aware self-attention module (FASA), and inversion-based asymmetric prompt guidance (APG) to improve attribute-level control while preserving identity. Empirically, FreeCure yields significant gains in prompt consistency across multiple personalization baselines and foundation models (Stable Diffusion and FLUX), with only modest losses in identity fidelity, and demonstrates robustness to noise and mask inaccuracies. The approach offers a practical, modular solution to enhance controllability in personalized generative faces, and it integrates seamlessly with existing diffusion-model pipelines and segmentation tools like BiSeNet and Segment-Anything.

Abstract

Facial personalization faces challenges to maintain identity fidelity without disrupting the foundation model's prompt consistency. The mainstream personalization models employ identity embedding to integrate identity information within the attention mechanisms. However, our preliminary findings reveal that identity embeddings compromise the effectiveness of other tokens in the prompt, thereby limiting high prompt consistency and attribute-level controllability. Moreover, by deactivating identity embedding, personalization models still demonstrate the underlying foundation models' ability to control facial attributes precisely. It suggests that such foundation models' knowledge can be leveraged to cure the ill-aligned prompt consistency of personalization models. Building upon these insights, we propose FreeCure, a framework that improves the prompt consistency of personalization models with their latent foundation models' knowledge. First, by setting a dual inference paradigm with/without identity embedding, we identify attributes (e.g., hair, accessories, etc.) for enhancements. Second, we introduce a novel foundation-aware self-attention module, coupled with an inversion-based process to bring well-aligned attribute information to the personalization process. Our approach is training-free, and can effectively enhance a wide array of facial attributes; and it can be seamlessly integrated into existing popular personalization models based on both Stable Diffusion and FLUX. FreeCure has consistently shown significant improvements in prompt consistency across these facial personalization models while maintaining the integrity of their original identity fidelity.

Paper Structure

This paper contains 36 sections, 5 equations, 24 figures, 12 tables.

Figures (24)

  • Figure 1: Personalization models (a) demonstrate strong capability in preserving identity fidelity, albeit at the cost of reduced prompt consistency. A prevalent feature in most personalization models is that when their identity embedding inputs are deactivated, they regain the ability to exhibit highly accurate prompt consistency with respect to facial attributes (b), a characteristic closely aligned to their foundation models. Our proposed FreeCure effectively leverages the latent foundational knowledge inherent in personalized models, enhancing prompt consistency in scenarios involving complex facial attribute control while preserving the identity fidelity (c).
  • Figure 2: Analysis on cross-attention maps of facial personalization models.Left: token-wise attention map visualization. Right: interpolation experiment on PD and FD's cross-attention maps.
  • Figure 3: Overview of FreeCure. For a personalization model $\epsilon_{\theta}$, we first introduce (a): dual inference paradigm to generate faces with/without identity ($I_p$ and $I_f$), where $I_f$ without identity embedding shows better prompt consistency. Next, we leverage a segmentation model $\Psi(\cdot)$ to derive related masks of target attributes with clear spatial information (hair, sunglasses, etc.) and merge them into a mask $\mathcal{M}$. In (b): we modify the original self-attention modules with our proposed FASA (c), which concatenates key and value matrices of FD process and PD process, together with a scaling mask to achieve the attribute injection. Finally, we utilize a simple yet effective strategy (d): asymmetric prompt guidance (APG) to restore abstract attributes (e.g., expressions).
  • Figure 4: Fine-grained attribute enhancement via masks. Extracting masks from the FD results makes the FASA module only focus on enhancement for target attributes, minimizing its negative effect on identity fidelity.
  • Figure 5: Qualitative comparison with facial personalization baselines (including baselines built upon SDv1.5 and SDXL). Different attributes in prompts are highlighted in various colors. Comparison of corresponding FD outputs is provided in the Appendix.\ref{['supp_sec:suppl_comparsion_for_main_paper']}.
  • ...and 19 more figures