Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face Personalization
Lianyu Pang, Ji Zhou, Qiping Wang, Baoquan Zhao, Zhenguo Yang, Qing Li, Xudong Mao
TL;DR
<3-5 sentence high-level summary> UniID tackles the trade-off between identity fidelity and text controllability in tuning-free face personalization by unifying text-embedding and adapter-based approaches. It introduces a dual-branch architecture where each branch learns identity-relevant features through identity-focused training, and employs a layer-wise normalized rescaling during inference to preserve the diffusion model's controllability. The method demonstrates superior identity preservation and prompt alignment compared to six baselines on both synthetic and real portraits, supported by qualitative, quantitative, and user-study evidence. This work offers a practical, principled path toward high-fidelity, controllable, tuning-free personalizations in diffusion-based generation systems.
Abstract
Tuning-free face personalization methods have developed along two distinct paradigms: text embedding approaches that map facial features into the text embedding space, and adapter-based methods that inject features through auxiliary cross-attention layers. While both paradigms have shown promise, existing methods struggle to simultaneously achieve high identity fidelity and flexible text controllability. We introduce UniID, a unified tuning-free framework that synergistically integrates both paradigms. Our key insight is that when merging these approaches, they should mutually reinforce only identity-relevant information while preserving the original diffusion prior for non-identity attributes. We realize this through a principled training-inference strategy: during training, we employ an identity-focused learning scheme that guides both branches to capture identity features exclusively; at inference, we introduce a normalized rescaling mechanism that recovers the text controllability of the base diffusion model while enabling complementary identity signals to enhance each other. This principled design enables UniID to achieve high-fidelity face personalization with flexible text controllability. Extensive experiments against six state-of-the-art methods demonstrate that UniID achieves superior performance in both identity preservation and text controllability. Code will be available at https://github.com/lyuPang/UniID
