Reference-Guided Identity Preserving Face Restoration
Mo Zhou, Keren Ye, Viraj Shah, Kangfu Mei, Mauricio Delbracio, Peyman Milanfar, Vishal M. Patel, Hossein Talebi
TL;DR
This work tackles the challenge of preserving face identity in diffusion-based restoration by leveraging high-quality reference faces. It introduces Composite Context, which fuses multi-level reference features (including $\text{ArcFace}$ identity and FaRL semantic/appearance cues) into a fixed-length context for cross-attention conditioning, and Hard Example Identity Loss, defined as $\mathcal{L}_\text{HID}$ to address learning inefficiency by incorporating a hard reference example via $\mathcal{L}_\text{HID}(\mathbf{x}_\text{HQ}, \mathbf{x}_\text{REF}, \hat{\mathbf{x}}) = (1-\lambda)\mathcal{L}_\text{ID}(\mathbf{x}_\text{HQ}, \hat{\mathbf{x}}) + \lambda\mathcal{L}_\text{ID}(\mathbf{x}_\text{REF}, \hat{\mathbf{x}})$. A training-free extension using classifier-free guidance enables multi-reference inference at test time with two guidance scales $s_i$ and $s_c$, and, for $N$ references, $\tilde{\boldsymbol{\epsilon}} = (1 - s_i) \boldsymbol{\epsilon}(\cdot, \varnothing, \varnothing, t) + (s_i - s_c) \boldsymbol{\epsilon}(\cdot, \boldsymbol{z}_{\text{LQ}}, \varnothing, t) + \frac{s_c}{N} \sum_i \boldsymbol{\epsilon}(\cdot, \boldsymbol{z}_{\text{LQ}}, \boldsymbol{c}_i, t)$. Experiments on FFHQ-Ref and CelebA-Ref-Test show state-of-the-art identity preservation with competitive image quality, and ablations confirm the effectiveness of CC and HID, including robustness analyses with incorrect references. Overall, the method substantially improves identity fidelity in reference-based face restoration and supports scalable multi-reference inference without additional training.
Abstract
Preserving face identity is a critical yet persistent challenge in diffusion-based image restoration. While reference faces offer a path forward, existing reference-based methods often fail to fully exploit their potential. This paper introduces a novel approach that maximizes reference face utility for improved face restoration and identity preservation. Our method makes three key contributions: 1) Composite Context, a comprehensive representation that fuses multi-level (high- and low-level) information from the reference face, offering richer guidance than prior singular representations. 2) Hard Example Identity Loss, a novel loss function that leverages the reference face to address the identity learning inefficiencies found in the existing identity loss. 3) A training-free method to adapt the model to multi-reference inputs during inference. The proposed method demonstrably restores high-quality faces and achieves state-of-the-art identity preserving restoration on benchmarks such as FFHQ-Ref and CelebA-Ref-Test, consistently outperforming previous work.
