Table of Contents
Fetching ...

Reference-Guided Identity Preserving Face Restoration

Mo Zhou, Keren Ye, Viraj Shah, Kangfu Mei, Mauricio Delbracio, Peyman Milanfar, Vishal M. Patel, Hossein Talebi

TL;DR

This work tackles the challenge of preserving face identity in diffusion-based restoration by leveraging high-quality reference faces. It introduces Composite Context, which fuses multi-level reference features (including $\text{ArcFace}$ identity and FaRL semantic/appearance cues) into a fixed-length context for cross-attention conditioning, and Hard Example Identity Loss, defined as $\mathcal{L}_\text{HID}$ to address learning inefficiency by incorporating a hard reference example via $\mathcal{L}_\text{HID}(\mathbf{x}_\text{HQ}, \mathbf{x}_\text{REF}, \hat{\mathbf{x}}) = (1-\lambda)\mathcal{L}_\text{ID}(\mathbf{x}_\text{HQ}, \hat{\mathbf{x}}) + \lambda\mathcal{L}_\text{ID}(\mathbf{x}_\text{REF}, \hat{\mathbf{x}})$. A training-free extension using classifier-free guidance enables multi-reference inference at test time with two guidance scales $s_i$ and $s_c$, and, for $N$ references, $\tilde{\boldsymbol{\epsilon}} = (1 - s_i) \boldsymbol{\epsilon}(\cdot, \varnothing, \varnothing, t) + (s_i - s_c) \boldsymbol{\epsilon}(\cdot, \boldsymbol{z}_{\text{LQ}}, \varnothing, t) + \frac{s_c}{N} \sum_i \boldsymbol{\epsilon}(\cdot, \boldsymbol{z}_{\text{LQ}}, \boldsymbol{c}_i, t)$. Experiments on FFHQ-Ref and CelebA-Ref-Test show state-of-the-art identity preservation with competitive image quality, and ablations confirm the effectiveness of CC and HID, including robustness analyses with incorrect references. Overall, the method substantially improves identity fidelity in reference-based face restoration and supports scalable multi-reference inference without additional training.

Abstract

Preserving face identity is a critical yet persistent challenge in diffusion-based image restoration. While reference faces offer a path forward, existing reference-based methods often fail to fully exploit their potential. This paper introduces a novel approach that maximizes reference face utility for improved face restoration and identity preservation. Our method makes three key contributions: 1) Composite Context, a comprehensive representation that fuses multi-level (high- and low-level) information from the reference face, offering richer guidance than prior singular representations. 2) Hard Example Identity Loss, a novel loss function that leverages the reference face to address the identity learning inefficiencies found in the existing identity loss. 3) A training-free method to adapt the model to multi-reference inputs during inference. The proposed method demonstrably restores high-quality faces and achieves state-of-the-art identity preserving restoration on benchmarks such as FFHQ-Ref and CelebA-Ref-Test, consistently outperforming previous work.

Reference-Guided Identity Preserving Face Restoration

TL;DR

This work tackles the challenge of preserving face identity in diffusion-based restoration by leveraging high-quality reference faces. It introduces Composite Context, which fuses multi-level reference features (including identity and FaRL semantic/appearance cues) into a fixed-length context for cross-attention conditioning, and Hard Example Identity Loss, defined as to address learning inefficiency by incorporating a hard reference example via . A training-free extension using classifier-free guidance enables multi-reference inference at test time with two guidance scales and , and, for references, . Experiments on FFHQ-Ref and CelebA-Ref-Test show state-of-the-art identity preservation with competitive image quality, and ablations confirm the effectiveness of CC and HID, including robustness analyses with incorrect references. Overall, the method substantially improves identity fidelity in reference-based face restoration and supports scalable multi-reference inference without additional training.

Abstract

Preserving face identity is a critical yet persistent challenge in diffusion-based image restoration. While reference faces offer a path forward, existing reference-based methods often fail to fully exploit their potential. This paper introduces a novel approach that maximizes reference face utility for improved face restoration and identity preservation. Our method makes three key contributions: 1) Composite Context, a comprehensive representation that fuses multi-level (high- and low-level) information from the reference face, offering richer guidance than prior singular representations. 2) Hard Example Identity Loss, a novel loss function that leverages the reference face to address the identity learning inefficiencies found in the existing identity loss. 3) A training-free method to adapt the model to multi-reference inputs during inference. The proposed method demonstrably restores high-quality faces and achieves state-of-the-art identity preserving restoration on benchmarks such as FFHQ-Ref and CelebA-Ref-Test, consistently outperforming previous work.

Paper Structure

This paper contains 11 sections, 6 equations, 12 figures, 13 tables.

Figures (12)

  • Figure 1: Overview of our proposed method. The Composite Context and Hard Example Identity Loss are designed for fully exploiting the reference face and hence boost identity preservation. The $\bm{z}_t$ is the noisy latent, $\bm{x}_\text{LQ}$ is the low-quality face image input ($\bm{z}_\text{LQ}$ is its corresponding VAE latent), $\bm{x}_\text{REF}$ is the high-quality reference face, $\bm{x}_\text{HQ}$ is the high-quality ground-truth face image, $\hat{\bm{z}}$ is the direct estimate of the denoised result (i.e., Eq. (15) in DDPM ddpm), and $\hat{\bm{x}}$ is the VAE decoded direct estimate. All pre-trained modules are frozen. The UNet unet and projection matrices for Composite Context are trained. The total loss includes the MAE loss and the Hard Example Identity Loss.
  • Figure 2: Loss curves of $\mathcal{L}_\text{ID}$ and $\mathcal{L}_\text{HID}$ during the training process. The curves are truncated to the beginning part of the training process.
  • Figure 3: Qualitative comparison with other state-of-the-art face restoration methods on FFHQ-Ref Moderate refldm test set. The "REF" column is the reference face. Please zoom in for the face details. For instance, the black moles are well-preserved in our result on the sixth row.
  • Figure 4: Qualitative comparison with other state-of-the-art face restoration methods on FFHQ-Ref Severe refldm test set. The "REF" column is the high-quality reference face image.
  • Figure 5: Demonstration of the impact of reference face image, by deliberately supplying the model with a reference face of a wrong identity. The first row is from FFHQ-Ref Moderate, and the second row is from FFHQ-Ref Severe.
  • ...and 7 more figures