FlashFace: Human Image Personalization with High-fidelity Identity Preservation
Shilong Zhang, Lianghua Huang, Xi Chen, Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo
TL;DR
FlashFace tackles zero-shot human image personalization with high-fidelity identity preservation and accurate language following. It achieves this by encoding reference faces as spatial feature maps via a dedicated Face ReferenceNet and by injecting reference and text controls through disentangled attention within a diffusion-based framework. A large, multi-identity dataset and a novel training pipeline support robust identity guidance, while flexible inference controls allow balancing prompts and references. Experimental results show superior target-face fidelity and plausible prompt-driven variations, with applications ranging from age/gender editing to artwork-real transformations and face inpainting. The work advances practical subject-driven synthesis while addressing potential misuse and societal impacts.
Abstract
This work presents FlashFace, a practical tool with which users can easily personalize their own photos on the fly by providing one or a few reference face images and a text prompt. Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following, benefiting from two subtle designs. First, we encode the face identity into a series of feature maps instead of one image token as in prior arts, allowing the model to retain more details of the reference faces (e.g., scars, tattoos, and face shape ). Second, we introduce a disentangled integration strategy to balance the text and image guidance during the text-to-image generation process, alleviating the conflict between the reference faces and the text prompts (e.g., personalizing an adult into a "child" or an "elder"). Extensive experimental results demonstrate the effectiveness of our method on various applications, including human image personalization, face swapping under language prompts, making virtual characters into real people, etc. Project Page: https://jshilong.github.io/flashface-page.
