Table of Contents
Fetching ...

FaceMe: Robust Blind Face Restoration with Personal Identification

Siyu Liu, Zheng-Peng Duan, Jia OuYang, Jiayi Fu, Hyunhee Park, Zikun Liu, Chun-Le Guo, Chongyi Li

TL;DR

FaceMe addresses the challenge of identity-consistent blind face restoration by leveraging a diffusion model guided by an identity encoder that fuses CLIP and ArcFace features. A two-stage training scheme, combined with a synthetic pose-expression reference pool and a simple identity-driven prompt replacement, enables personalized restoration without fine-tuning for new identities. FaceMe delivers high-fidelity, identity-preserving restorations on synthetic and real-world data, outperforming state-of-the-art methods in both quality and robustness. The approach enables practical deployment for large-scale, identity-consistent face restoration with flexible reference inputs.

Abstract

Blind face restoration is a highly ill-posed problem due to the lack of necessary context. Although existing methods produce high-quality outputs, they often fail to faithfully preserve the individual's identity. In this paper, we propose a personalized face restoration method, FaceMe, based on a diffusion model. Given a single or a few reference images, we use an identity encoder to extract identity-related features, which serve as prompts to guide the diffusion model in restoring high-quality and identity-consistent facial images. By simply combining identity-related features, we effectively minimize the impact of identity-irrelevant features during training and support any number of reference image inputs during inference. Additionally, thanks to the robustness of the identity encoder, synthesized images can be used as reference images during training, and identity changing during inference does not require fine-tuning the model. We also propose a pipeline for constructing a reference image training pool that simulates the poses and expressions that may appear in real-world scenarios. Experimental results demonstrate that our FaceMe can restore high-quality facial images while maintaining identity consistency, achieving excellent performance and robustness.

FaceMe: Robust Blind Face Restoration with Personal Identification

TL;DR

FaceMe addresses the challenge of identity-consistent blind face restoration by leveraging a diffusion model guided by an identity encoder that fuses CLIP and ArcFace features. A two-stage training scheme, combined with a synthetic pose-expression reference pool and a simple identity-driven prompt replacement, enables personalized restoration without fine-tuning for new identities. FaceMe delivers high-fidelity, identity-preserving restorations on synthetic and real-world data, outperforming state-of-the-art methods in both quality and robustness. The approach enables practical deployment for large-scale, identity-consistent face restoration with flexible reference inputs.

Abstract

Blind face restoration is a highly ill-posed problem due to the lack of necessary context. Although existing methods produce high-quality outputs, they often fail to faithfully preserve the individual's identity. In this paper, we propose a personalized face restoration method, FaceMe, based on a diffusion model. Given a single or a few reference images, we use an identity encoder to extract identity-related features, which serve as prompts to guide the diffusion model in restoring high-quality and identity-consistent facial images. By simply combining identity-related features, we effectively minimize the impact of identity-irrelevant features during training and support any number of reference image inputs during inference. Additionally, thanks to the robustness of the identity encoder, synthesized images can be used as reference images during training, and identity changing during inference does not require fine-tuning the model. We also propose a pipeline for constructing a reference image training pool that simulates the poses and expressions that may appear in real-world scenarios. Experimental results demonstrate that our FaceMe can restore high-quality facial images while maintaining identity consistency, achieving excellent performance and robustness.
Paper Structure (32 sections, 6 equations, 5 figures, 2 tables)

This paper contains 32 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Make the people in the photo look like you and those you are familiar with. Using a single or a few reference images, we can restore realistic images without any fine-tuning for identity. Zoom in for best view.
  • Figure 2: Overview of proposed FaceMe (left) and training data construction pipeline (right). For the proposed FaceMe, identity-related features from the reference image are extracted by the identity encoder, by simply combining to support multi-reference image inputs. We first use a fixed text, i.e., a photo of face. and then apply the combined identity-related features to replace the face embedding. The updated embeddings are sent to the cross-attention layer of the diffusion model to guide personalized face image restoration.
  • Figure 3: Qualitative comparison on CelebRef-HQ. In comparison to the state-of-the-art methods, our FaceMe can restore high-quality faces while maintaining identity consistency. Zoom in for best view.
  • Figure 4: Qualitative comparison on real-world faces. The first row is from the LFW-Test; the second row is from the WebPhoto-Test; and the third row is from the Wider-Test. Our method can restore high-fidelity and high-quality images, while previous methods produce unrealistic artifacts. Zoom in for best view.
  • Figure 5: Qualitative comparison of ablation studies.