FaceMe: Robust Blind Face Restoration with Personal Identification
Siyu Liu, Zheng-Peng Duan, Jia OuYang, Jiayi Fu, Hyunhee Park, Zikun Liu, Chun-Le Guo, Chongyi Li
TL;DR
FaceMe addresses the challenge of identity-consistent blind face restoration by leveraging a diffusion model guided by an identity encoder that fuses CLIP and ArcFace features. A two-stage training scheme, combined with a synthetic pose-expression reference pool and a simple identity-driven prompt replacement, enables personalized restoration without fine-tuning for new identities. FaceMe delivers high-fidelity, identity-preserving restorations on synthetic and real-world data, outperforming state-of-the-art methods in both quality and robustness. The approach enables practical deployment for large-scale, identity-consistent face restoration with flexible reference inputs.
Abstract
Blind face restoration is a highly ill-posed problem due to the lack of necessary context. Although existing methods produce high-quality outputs, they often fail to faithfully preserve the individual's identity. In this paper, we propose a personalized face restoration method, FaceMe, based on a diffusion model. Given a single or a few reference images, we use an identity encoder to extract identity-related features, which serve as prompts to guide the diffusion model in restoring high-quality and identity-consistent facial images. By simply combining identity-related features, we effectively minimize the impact of identity-irrelevant features during training and support any number of reference image inputs during inference. Additionally, thanks to the robustness of the identity encoder, synthesized images can be used as reference images during training, and identity changing during inference does not require fine-tuning the model. We also propose a pipeline for constructing a reference image training pool that simulates the poses and expressions that may appear in real-world scenarios. Experimental results demonstrate that our FaceMe can restore high-quality facial images while maintaining identity consistency, achieving excellent performance and robustness.
